### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# Class Note for STAT 528 at OSU 17

### View Full Document

## 19

## 0

## Popular in Course

## Popular in Department

This 20 page Class Notes was uploaded by an elite notetaker on Friday February 6, 2015. The Class Notes belongs to a course at Ohio State University taught by a professor in Fall. Since its upload, it has received 19 views.

## Popular in Subject

## Reviews for Class Note for STAT 528 at OSU 17

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 02/06/15

Statistics 528 Data Analysis Lecture 3 June 27 2006 Christopher Holloman The Ohio spam Univers ty Summer 2006 Overview of Today s Lecture o IPS Sections 22 25 Correlation Regression Cautions about Correlation and Regression Causation Christopher Holloman The Ohio spam Univers ty Summer 2006 Correlation 0 Correlation is a numerical measure of the strength of association between two variables 0 Correlation supplements a scatterplot 0 To calculate a correlation you need data on variables X and y for n individuals Christopher Holloman The Ohio spam Univers ty Summer 2006 0 Correlation r measures the direction and strength of linear association between two variables mm o In this equation 7 and yrepresent the sample means and 5X and sy represent the sample standard deviations Christopher Holloman The Ohio spam Univers ty Summer 2006 Interpreting a correlation o r is always between 1 and 1 o The sign of r gives the direction of the relationship R gt O 9 positive association R lt O 9 negative association 0 The strength of the relationship is given by the absolute value of r o r has no units Christopher Holloman The Ohio State Univers ty Summer 2006 Correlation r 0 Correlation r 03 39 I T 39 39 g y 3 I a n I u 5 39 n I Correlation r 05 Correlation r 07 y quota I o 15 3 39L Correlation r 09 Correlation r 099 State Univers ty Summer 2006 Properties of Correlation 1 Correlation does not distinguish explanatory and response variables corrx y corry X 2 Variables must be quantitative to calculate a correlation 3 Correlation is not affected by taking a linear transformation of one or both of the variables Christopher Holloman The Ohio spam Univers ty Summer 2006 4 Correlation only measures the strength of linear association between variables 5 Just like the mean and standard deviation correlation is not resistant to outliers One point can induce or remove correlation Christopher Holloman The Ohio spam Univers ty Summer 2006 Leastsquares Regression 0 Regression is another way to summarize information in a scatterplot by drawing a line that represents the linear pattern of the data 0 The regression line describes how The regression line describes how y changes as a function of X 0 Regression requires specification of an explanatory variables and a response Christopher Holloman The Ohio spam Univers ty Summer 2006 0 Simple Linear Regression Mathematical model for the linear relationship between two variables Similar concept to how a density is a mathematical model for a data distribution yabx intercept slope Christopher Holloman The Ohio spam Univers ty Summer 2006 0 Example The police want to put out a description of a robbery suspect and they know the shoe size of the suspect from footprints left at the scene of the crime Their guess at the robber s height may be improved by using his shoe size 4 5 6 7 8 9101112131415 ShoeSize Christopher Holloman The Ohio spam Univers ty Summer 2006 Goals 0 There are two general goals when using a regression Prediction Use the regression line to predict the response y for a specific value of the explanatory variable x Explanation Use the regression line to explain the association between the x and y variables Christopher Holloman The Ohio spam Univers ty Summer 2006 Extrapolation o Extrapolation means predicting outside the range of X values 0 Example Using the shoe sizeheight data to predict the height of a person with shoe size 15 o Extrapolations are often inaccurate 0 Related idea extrapolating out of the population the model was built on Christopher Holloman The Ohio State Univers ty Summer 2006 Example Times for the 1mile run Year 7 lQOO Christopher Holloman The Ohio State Univers ty Summer 2006 Leastsquares Regression 0 We ve established that a line can be useful but what line do we pick 0 No line will pass through all the points unless the correlation is 1 or 1 so we must pick one that is somehow close to as many points as possible 0 Least Squares regression Make prediction errors in the squared sense as small as possible by choosing a line that is as close as possible to the points in the vertical direction Residuals errors are the vertical distances between the y values we observe and the y values predicted by the regression line residuall observed yl predicted yl yl t Christopher Holloman The Ohio State Univers ty Summer 2006 815 Terminology The regression of y on x E 810 OJ E E 8 C 393 805 C 9quot OJ 39E 4 Predicted y7 a 24b 4 Error 2 A g 800 Ob y y I 4 served y7 799 v E I 24 795 1 X7 I I I I I I I 225 230 235 240 245 250 255 Age in months Christopher Holloman The Ohio State Univers ty Summer 2006 o The leastsquares regression line is the line that makes the sum of the squares of the vertical distances of the data points from the line as small as possible 0 Choose a and b to minimize Zuluesiduaill2 yl y yl abxl2 11 Christopher Holloman The Ohio spam Univers ty Summer 2006 o How do you find a and b to satisfy this property Using some summary statistics of the data Slope b r x sysx Intercept a y bf o Minitab and all other statistical software will calculate this for you automatically Christopher Holloman The Ohio spam Univers ty Summer 2006 o The least squares line passes through 17 o Interpreting the slope of the leastsquares regression line A change of one standard deviation in X corresponds to a change of r standard deviations in y When the variables are perfectly correlated the change in predicted response is the same in standard units as the change in X As the correlation between the variables becomes smaller the prediction does not respond as much to changes in X Christopher Holloman The Ohio spam Univers ty Summer 2006 o For the running example the statistics are X year y time Mean 5552 397 Standard 2157 015 Deviation 0 Correlation 989 Christopher Holloman The Ohio spam Univers ty Summer 2006 10 Scatterplot of newtime vs year I 10 20 30 40 50 60 70 80 90 100 year Christopher Holloman The Ohio State Univers ty Summer 2006 Linear Regression Facts o The distinction between the explanatory and response variables is important bxyrsysx l rsXsy byx A 1000 800 600 400 200 Velocity in kilometers per second f I l l I 0 05 10 15 20 Distance in millions of parsecs Christopher Hol oman lhe Ohio State Univers ty Summer 2006 11 0 There is a close relationship between the correlation and slope since b r sysx 0 Since the standard deviations are always positive the sign of the correlation and the sign of the slope are always the same 0 The residuals always add up to zero 9 The average of the residuals is also zero Christopher Holloman The Ohio spam Univers ty Summer 2006 Correlation and F12 o The square of the correlation R2 is a measure of the amount of variation in y that is explained by the regression o R2 is always between 0 and 1 0 Total variation 52y o Left over variation variance of residuals Christopher Holloman The Ohio spam Univers ty Summer 2006 12 Sum of Squares Total SST Sum of Squares Error SSE Zltyi if 2071 55y i1 Sum of Squares Regression SSR SST SSE Also happens to be the sum of squared deviations of the predicted values from their mean R2 SST SSESST SSRSST Christopher Holloman The Ohio State Univers ty Summer 2006 Height in centimeters Height in centimeters 2390 A22 24 2h6 28 3o 32 1395 1398 20 2392 2394 2396 28 3390 3392 t geln men s b Agein months Christopher Holloman The Ohio State Univers ty Summer 2006 13 0 Example Suppose we have the regression with these values 9 03 12x r2 025 o Solve for r Christopher Holloman The Ohio spam Univers ty Summer 2006 Regression is a dangerous thing 0 There are many things that can go wrong with regression that make it invalid Section 24 is a mix of general warnings and methods for finding out if something is wrong Christopher Holloman The Ohio spam Univers ty Summer 2006 14 Residuals o Residuals are good to check to find problems with a regression 0 Residual plot Scatterplot of the residuals against the explanatory variable Since the mean of the residuals is zero the points should be evenly distributed on both sides of the zero line Always add a zero line to a residual plot If the regression line catches the overall pattern of the data the residual plot will show no pattern Christopher Holloman The Ohio spam Univers ty Summer 2006 Scatterplot of RESII vs year 0 050 0025 0000 REsn 0 0 025 0 050 Christopher Holloman The Ohio spam Univers ty Summer 2006 15 o No Pattern good J Lluvlvlo wa o Curved Pattern evidence that the relationship between x and y is something other than linear llll 4 wN xo xmw III IIII b 0 Fan Shape variation in y increases as x increases heteroskedasticity Rdl J Loanb Christopher Holloman The Ohio State Univers ty Summer 2006 Lurking variables 0 While you have those residuals calculated you should plot them by other available variables and look for patterns Residuals by time collected indicates problems with measurement system Residuals by other variables a pattern may indicate an important variable to include in the analysis Christopher Holloman The Ohio State Univers ty Summer 2006 16 Outliers and Influential Points O O Outliers An outlier is an observation which lies outside the pattern of the rest of the data Residual plots are good for finding outliers Influential Point An influential point is an observation which affects the re ression results if that value is removed rom the dataset Not all influential observations are outliers How can ou test if an observation is influentia Fit the model with and without the observation Look for changes in the regression line and the R2 value Christopher Holloman The Ohio State Univers ty Summer 2006 Noninfluential Influential Christopher Holloman The Ohio State Univers ty Summer 2006 17 Correlation and Causation 0 An association between an explanatory variable and a response variable even if it is very strong is not good evidence that changes in X actually cause changes in y o Lurking variables can create nonsense correlations 0 Example Drownings and icecream sales Christopher Holloman The Ohio State Univers ty Summer 2006 Causation and Correlation 0 There are three ways you can get association I a I p Causation Common response Confounding a b C Christopher Holloman The Ohio State Univers ty Summer 2006 18 Establishing Causation 0 Best way Design an experiment that controls for lurking variables 0 In observational studies we assume causation exists when Association is strong Association is consistent Higher doses are associated with stronger responses The alleged cause precedes the effect in time The alleged cause is plausible Christopher Holloman The Ohio spam Univers ty Summer 2006 Regression EffectFallacy 0 Regression Effect In virtually all test retest situations the bottom group on the first test will on average show some improvement on the second test and the to grou will on average fall back This ef ect is nown as the regression effect 0 Re ression Fallacy The regression fal acy is thinking that the regression effect must be due to something important not lJust due to spread about the regression ine Christopher Holloman The Ohio spam Univers ty Summer 2006 19 Ecological FallacySimpson s Paradox o A correlation based on averages over many individuals is usually higher than the correlation between the same variables based on data for individuals Artificially inflated R2 value Doesn t apply to individuals Christopher Holloman The Ohio spam Univers ty Summer 2006 Restricted Range Problem 0 When data are only observed over a restricted range their R2 and r are lower than they would be if the whole range were observed Christopher Holloman The Ohio spam Univers ty Summer 2006 20

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "I signed up to be an Elite Notetaker with 2 of my sorority sisters this semester. We just posted our notes weekly and were each making over $600 per month. I LOVE StudySoup!"

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.