Popular in Statistics 1350
verified elite notetaker
Popular in Department
This 8 page Class Notes was uploaded by Alyssa Leathers on Friday March 6, 2015. The Class Notes belongs to Stats 1350 at Ohio State University taught by Ali Miller in Winter2015. Since its upload, it has received 133 views.
Reviews for STATSWeek8.pdf
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 03/06/15
STATS 1350 3615 1105 AM Week Eight Chapter 1415 Describing Relationships scatterplots correlation regression and prediction Scatterplots We are displaying the relationship between two QUANTITATIVE VARIABLES We might want to simply want to see how the variables relate to each other We might also want to use one variable to help us explain or make a prediction about the other variable Describing your scatterplot Form Linear Nonlinear No obvious pattern shape Direction Positive up left to right or Negative down left to right No association slope how steep Strength strong moderate pretty close or weak diffused from line pattern dots falling close to a line strong correlation dots fall close on line Association Direction Positive association as x increases so does y Scatterplot moves uphill from left to right Negative association as x decreases so does y Scatterplot moves downhill from left to right Examples obesity rates and time spent eating Relationship between time and the average person in a given country spends eating and that country s obesity rate as measured by the of the national population with a body mass index gt 30 Shape somewhat linear Direction negative Strength weak to moderate Association 0 X explanatory or independent variable 0 Y response or dependent variable Association relationship between x and y correlation Association is NOT causation correlation does NOT equal causation Correlation numerical description of the strength and direction of the linear relationship between x and y Correlation r X and Y both quantitative Sign on r matches direction of the scatterplot data 1 lt r lt 1 r is always between 1 and 1 If r 1 that means perfectpositive correlation all data lies perfectly on the same line If r 1 that means perfectnegative correlation all data lies perfectly on the same line Outliers will have an impact on the correlation r is not resistant to outliers it is not robust Correlation does not change when we change the units of measurement Correlation ignores the distinction between x and y variables correlation between x y correlation between y x order doesn t matter Only for straightline relationships Has no unites of its own just a number Category of strength based on absolute value of correlation 02 very weak to negligible correlation 24 weaklow correlation 47 moderate correlation 79 stronghigh correlation 910 very strong correlation Example correlations Corn and Rain example data Calculating Correlation The Sigma means that you would add up these terms for every x an y value If you have 10 data points you will have 10 terms to add up Use zscore Correlation r 1n1 sigmaadd all standard score for xstandard score for y DO NOT NEED TO DO THIS BY HAND use JMP to find correlation When do we use leastsquares regression 0 When it is fairly linear Drawing in that line when see fit General procedure for leastsquare regression o Create scatterplot describe three things want a fairly linear form to use correlation and regression 0 Compute correlation coefficient r 0 Obtain the equation of the regression line r x1 y1 x2 y2 Equation of the line 0 Goal is to predict score of based on attendance example Regression line 0 Y a bx Y intercept slope times x Intercept a the y value when x is 0 where the line crosses the y vertical axis predicted value when x 0 0 Warning sometimes nonsensical extrapolation Slope b amount y changes when x increases by 1 The bigger the absolute value of the slope the steeper the line The slope and the correlation always have the same sign Example 0 The slope of 13257 tells us that o Slope is NOT the correlation o On the average total semester score increase by 13257 for each additional day that the student attends class 0 The intercept of 42057 tells us that o A student who attends no classes will have a total semester score of 42057 of classes 0 R squared The percent of variation in the y variable that is explained by the regression line 0 We find the correlation r square this value and then multiply that value by 100 o R squared is always between 0 and 100 The closer it is to 100 the stronger the linear relationship between x and y good Example o If the correlation is 6525 then of the variation in total semester score is explained by the linear relationship with attendance 0 R 6525 square 4258 0 Answer 4258 Scatterplot with least square regression line Prediction 0 Corn and rain example 0 Predicting x3 0 Y 5083 963 3 Prediction works best when 0 Linear model fits the data well Strong linear patterns are best high r squared closer to 100 0 You use new x values that are within the range of the original set of xvalues Extrapolation predicting using xvalues outside of the range of the original data Prediction vs Extrapolation 0 Use your leastsquares regression line to find y for other x values 0 Prediction with in the range of the original data is great Extrapolation BAD not useful can lead to false confusions Outliers Outliers can have a big effect on both correlation and the regression line 0 Watch for outliers o Is there a valid reason for removing an outlier o Obvious response biaserror 0 Error in how we entered our data 0 Try running the regression with and without the outlier to see how big a difference it makes to your results but you can t just drop it More on outliers Strength of line changes 0 Higherlower r Causa on When can we conclude that one variable x causes another y 0 Only a well designed controlled experiment can show you causation Still difficult to show Common Response Lurking variable How can you show causation Association is strong Association is consistent happens in all samples Higher doses are associated with stronger responses Alleged cause precedes the effect The alleged cause is plausible Need ALL of these things to make a case for causation Breast Cancer Example Antibiotics causing cancer Plausible Cofounding Example Gene that made people simultaneously more susceptible to lung cancer and smoking 0 Common response both things caused by same thing genetics Sometimes it s just coincidence Normal Distribution above thanquot Forward problem 32 305 4 zscore look at chart 6554 loo6554 3440 less thanquot 18305 24 look at chart 82 82 Normal Distribution Backward problem Look at the chart Top 15 bottom 85 Zscore 10 30105 35 minutes of yoga 3615 1105 AM 3615 1105 AM
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'