### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# 585 Class Note for STAT 30100 with Professor Zhao at Purdue

### View Full Document

## 15

## 0

## Popular in Course

## Popular in Department

This 16 page Class Notes was uploaded by an elite notetaker on Friday February 6, 2015. The Class Notes belongs to a course at Purdue University taught by a professor in Fall. Since its upload, it has received 15 views.

## Similar to Course at Purdue

## Reviews for 585 Class Note for STAT 30100 with Professor Zhao at Purdue

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 02/06/15

Chapters 2 and 10 Least Squares Regression When you have 2 quantitative variables and you want to look at the relationship between them you look at a scatter plot If the scatter plot looks linear then you can do least squares regression to get an equation of a line that uses x to explain what happens with y The general procedure 1 Make a scatter plot of the data from the x and y variables Describe the form direction and strength Look for outliers 2 Look at the correlation to get a numerical value for the direction and strength 3 If the data is reasonably linear get an equation of the line using least squares regression 4 Look at the residual plot to see if there are any outliers or the possibility of lurking variables Patterns bad randomness good 5 Look at the normal probability plot to determine whether the residuals are normally distributed The dots sticking close to the 45degree line is good 6 Look at hypothesis tests for the correlation slope and intercept Look at con dence intervals for the slope intercept and mean response and at the prediction intervals 7 If you had an outlier you should rework the data without the outlier and comment on the differences in your results Association 0 Positive negative or no association 0 Remember ASSOCIATON or CORRELATION is NOT the same thing as CAUSATION See chapter 325 notes Response variable 0 Y o Dependent variable 0 measures an outcome of a study Explanatogx variable 0 X 0 Independent variable 0 explains or is related to changes in the response variables p 105 Scatterplots 0 Show the relationship between 2 quantitative variables measured on the same individuals 0 Dots onlyidon t connect them with a line or a curve 0 Form Linear Nonlinear No obvious pattern 0 Direction Positive or negative association No association 0 Strength how closely do the points follow a clear form Strong or weak or moderate 0 Look for OUTLIERS Correlation measures the direction and strength of the linear relationship between 2 quantitative variables r It is the standardized value for each observation with respect to the mean and standard deviation n l SK sy r L where we have data on variables x and y for 71 individuals Using SPSS to get correlation Use the Pearson Correlation output Analyze gt Correlate gt Bivarz39ate see page 55 in the SPSS manual The SPSS manual tells you where to nd r using the least squares regression output but this r is actually the ABSOLUTE VALUE OF r so you need to pay attention to the direction yourself The Pearson Correlation gives you the actual r with the correct sign Properties of correlation X and Y both have to be quantitative It makes no difference Which you callX and Which you call Y Does not change When you change the units of measurement If r is positive there is a positive association betvveenX and Y AsX increases Y increases If r is negative there is a negative association betvveenX and Y Ainncreases Y decreases 71 S r S l The closer r is to 71 or to l the stronger the linear relationship The closer r is to 0 the weaker the linear relationship Outliers strongly affect r Use r With caution if outliers are present wulrdnxllvvmul lu Correlation r 2 09 Correlation r 099 Figure 21 lnlmdum39nn m the mama5mm mm Edition M zoos WNmemu and Campava Example We want to examine whether the amount of rainfall per year increases or decreases corn bushel output A sample of 10 observations was taken and the amount of rainfall in inches was measured as was the subsequent growth of corn Amount of Rain Bushels of Com 303 80 347 84 421 90 444 95 495 97 511 102 563 105 634 112 656 115 682 115 The scatterplot 120 u u u 110 u n 100 u u E 90 u D w u 3 e E 80 I Q gt E 8 7 2 3 4 5 6 7 amount of rain in a What does the scatterplot tell us What is the form Direction Strength What do we expect the correlation to be Correlations amount of corn yield rain in bushels amount of rain in Pearson Correlation 1 l Sig 2tailecl N 10 com yield bushels Pearson Correlation 1 Sig 2tailecl N 10 Correlation is significant at the 001 level 2tailed Inference for Correlation R correlation R2 of variation in Y explained by the regression line the closer to 100 the better p Greek letter rho correlation for the population When p 0 there is no linear association in the population so X and Y are independent if X and Y are both normally distributed Hypothesis test for correlation To test the null hypothesis H g p 0 compute the t statistic t r degrees offreedom n 7 2 for simple linear regression b Are corn yield and rain independent Perform a test of significance to determine this c D0 corn yield and rain have a positive correlation Perform a test of signi cance to determine this Another SPSS trick if you re interested in the test statistic but not the actual correlation This test statistic is numerically identical to the t statistic used to test H 0 0 Can we do better than just a scatter plot and the correlation in describing how x and y are related What if we want to predict y for other values of x Least Squares Regression ts a straight line through the data points that will minimize the sum of the vertical distances of the data points from the line Minimizes Zel2 11 Equation of the line is f b0 bx with 3 the predicted ylmz s Slope of the line is b1 r y where the slope measures the amount of change caused in the predicted response variable when the explanatory variable is increased by one unit Intercept of the line is b0 y blx where the intercept is the value of the predicted response variable when the explanatory variable 0 Type of line Least Squares Regression slope y intercept equation of line Ch 2 General 7 a bx b a Ch 10 Sample j b0 bx b1 b0 Ch 10 Population y o lx 81 u model Using the corn example nd the least squares regression line Tell SPSS to do Analyze 9Regression9Linear Put rain into the independent box and corn into the dependent box Click OK Your results will look like Model SummanP Adjusted Std Error of Model R R Square R Square the Estimate 1 9953 991 989 1290 3 Predictors Constant amount of rain in b Dependent Variable corn yield bushels ANOVAb Sum of Model Squares df Mean Square F Sig 1 Regression 1397195 1 1397195 840070 0003 Residual 13305 8 1663 Total 1410500 9 3 Predictors Constant amount of rain in b Dependent Variable corn yield bushels Coef cientsa Unstandardized Standardized Coef cients Coef cients 95 Con dence Interval for B Model B Std Error Beta t Sig Lower Bound Upper Bound 1 Constant 50 835 1728 29421 000 46851 54 819 amount of rain in 9625 332 995 28984 000 8859 10 391 3 Dependent Variable corn yield bushels d What is the leastsquares regression line equation The scatterplot with the least squares regression line looks like 120 u R2 is the percent of no variation in corn yield explained by the regression 1 line with rain 9906 100 n u A 90 n 2 4 w 5 E 80 2 gta E 8 70 Rat 0 9906 2 3 4 5 6 7 amount of rain in Con dence Intervals and Signi cance Tests for Regression Slope and Intercept Level C con dence interval for the intercept g is be r tquot SE17 Level C con dence interval for the slope is b1 i tquot SEbl SPSS will also give you these con dence intervals for 95 but you may have to use the estimates for the coef cients and their standard errors to nd other con dence intervals use ttable and n 2 degrees offreedom to get I Hypothesis testing for Ho 1 0 b Test statistic I 1 w1th df n 2 171 SPSS will give you the test statistic under t and the 2sided Pvalue under Sig e Give a 95 con dence interval for the slope f Give a 90 con dence interval for the slope g Is the slope positive Perform a test of signi cance h What of the variability in corn yield is explained by the least squares regression line i What is the estimate of the standard error of the model What do we mean by prediction or extrapolation Use your leastsquares regression line to nd y for other xvalues Prediction using the line to nd yvalues corresponding to xvalues that are Within the range of your data xvalues Extrapolation using the line to nd yvalues corresponding to xvalues that are outside the range of your data xvalues Be careful about extrapolating yvalues for xvalues that are far away from the x data you currently have The line may not be valid for Wide ranges of x Example On the raincorn data above predict the com yield for a 5 inches of rain b 72 inches ofrain c 0 inches of rain d 100 inches ofrain e For Which amounts of rainfall above do you think the line does a good job of predicting actual com yield Why Deadly Sins Cartoon by J B Landers on wwwcauseweborg used With permission Assumptions for Regression 0 Repeated responses y are independent of each other 0 For any fixed value of x the response y varies according to a normal distribution 0 The mean response y has a straightline relationship with x o The standard deviation of y o is the same for all values of x The value of o is unknown How do you check these assumptions 0 Scatterplot and R2 Do you have a straightline relationship between X and Y How strong is it How close to 100 is R2 Hopefully no outliers 0 Normal probability plot Are the residuals approximately normally distributed Do the dots fall fairly close to the diagonal line which is always there in the same spot Normal PP Plot of Regression Standardized Residual Dependent Variable corn yield bushels 1E E g a I I I 0 Expected cum Prob U E I I I I El El El 2 U4 El 6 El E 1 El Observed Cum Prob 0 Residual plot Do you have constant variability Do the dots on your residual plot look random and fairly evenly distributed above and below the 0 line Hopefully no outliers Residual is the vertical difference between the observed yvalue and the regression line yvalue residual 8 M M M 61 bx Mm MW 10 Residual plot scatterplot of the regression residuals against the explanatory variable e vs x eaxis has both negative and positive values but centered about e 0 the mean of the leastsquares residuals is always zero 3 0 Good total randomness no pattern approximately the same number of points above and below the e 0 line 0 Bad obvious pattern funnel shape parabola more points above 0 than below or vice versa 0 if you have a pattern your data does not necessarily t the model line well an m 5 Suhianls g 10 is um i E E i so 739 Subjln la39 750 47 a an an so an s m u 14 5 1a 20 Mammal hmymmmu name Residual m n m Ann no son mm Nunueuistit vil n hnu Unaandarm d esmual amaum anaer an Outliers 0 Outliers are observations that lie outside the overall pattern of the other observations Outliers in the y direction of a scatterplot have large regression residuals e 11 o Outliers in the x direction of a scatterplot are often in uential for the regression line 0 An observation is in uential if removing it would markedly change the result of the calculation 0 Outliers can drastically affect regression line correlation means and standard deviations 0 You can draw a second regression line that doesn t include the outliersiif the second line moves more than a small amount when the point is deleted or if R2 changes much the point is in uential Which hypothesis test 10 you use when If you re not sure whether to use by or p here are some guidelines The test statistics and P values are identical for either symbol Use If the words are 31 Slope regression coef cient p Correlation independence Either 1 01 p linear relationship Review 0fSPSS instructions for Regression When you set up your regression you click on Analyze gtRegression gtLinear Put in your y variable for quotdependentquot and your x variable for quotindependentquot on the gray screen Don39t hit quotokquot yet though At the bottom of that gray screen click on quotStatisticsquot and then click on quotconfidence intervalsquot if you will want the confidence intervals for any part of the problem You can also click on quotdescriptivesquot if you want information like the mean and standard deviation for each variable Click quotcontinuequot to the Statistics gray screen Back on the regression gray screen click on quotPlotsquot and then click on quotnormal probability plotquot Click quotcontinuequot on the Plots gray screen Back on the regression gray screen click on quotSavequot and then click on quotunstandardized residualsquot Click quotcontinuequot on the Save gray screen and then quotokquot to the big Regression gray screen You still won39t have a residual plot yet If you click back to your data input screen you now have a new column called quotRes 1quot To make the residual plot you follow the same steps for making a scatterplot go to graphs gtscatter gtsimple then put quotres lquot in for y and your x variable in for x Click quotokquot Once you see your residual plot you39ll need to double click on it to go to Chart Editor On the Chart Editor tool bar you can see a button that shows a graph with a horizontal line Click on that button Make sure that the y axis is set to O 12 H 0w will I ever use this stuff again in my future career Testimonial from a Former Student Emails received June 9 2005 Ellen I hope that all is well It is your favorite student here Eric from your Fall 04 stat 301 class I need some help Believe it or not you were right and I am using stat everyday all day long but I am drawing a blank I am trying to determine a linear regression line and I can t remember the equation YmXb or course but on the regression analysis output what doluseasmampb I am very disappointed in myself because I can t remember but alas I am asking for help I tried looking for the notes on your home page but I couldn t nd it any longer and I think it was taken down for the summer If you could help me that would be great Hope your summer months are spent by the pool Regards Eric Ellen As for the project It essentially was a regression to determine the amount of out of state cotton seed a crushing plant would need when their own states current production increased or decreased It is not very statistically sound since we are only using 5 years worth of databut it is just a tool in price analysis that we are using to determine a spread between plants that we can buy the cotton seed at As for using methat would be great Let those impressionable young students see that we are using what they learn on an everyday basissome more than othersespecially me since I deal with prices and trading Eric 13 Example Problem 260 p 169 Table 19 p 59 gives the calories and sodium content for each of 17 brands of meat hot dogs a Make a scatterplot of sodium content y against calories x Describe the main features of the relationship 600 n 500 n u u n u u 400 quotu n u u n n u 300 E 2 8 o 200 E 2 a quotU 8 100 100 120 140 160 130 200 Calories b What is the correlation between calories and sodium Correlations Sodium Calories content Calories Pearson Correlation 1 863 Sig 2tailed 000 N 17 17 Sodium content Pearson Correlation 863 1 Sig 2tailed 000 N 17 17 Correlation is significant at the 001 level 2tailed c Report the leastsquares regression line Model SummanP Adjusted Std Error of Model R R Square R Square the Estimate 1 8633 745 728 48913 3 Predictors Constant Calories b Dependent Variable Sodium content 14 Coefficients Unstandardized Standardized Coef cients Coef cients 95 Con dence Interval for B Model B Std Error Beta t Sig Lower Bound Upper Bound 1 Constant 91185 77812 1172 260 257038 74668 Calories 3212 485 863 6628 000 2179 4245 3 Dependent Van39able Sodium content Unstandardlzed Resldual d Show a residual plot and comment on its features 1ED u a a u u u u U u n u n u u n u 7100 u VZEIEI 1EE 120 140 160 1EE ZED Calories e Is there an outlier If so where is it D Normal PP Plot of Regression Standardized Residual Dependent Variable Sodium content 1B DE DE u4 0 Expected Cum Prob EIZ D I I I I u u u 2 u 4 u E u 8 Observed Cum Prob 15 Show a normal probability plot and comment on its features g Leave off the outlier and recalculate the correlation and another least squares regression line Is your outlier in uential Explain your answer Correlations cal2 sod2 cal2 Pearson Correlation 1 834 Sig 2tailecl 000 N 16 16 sod2 Pearson Correlation 834 l Sig 2tailecl 000 N 16 16 Correlation is significant at the 001 level Model Summary Adjusted Std Error of Model R R Square R Square the Estimate 1 8343 695 674 36406 3 Predictors Constant cal2 b Dependent Variable sod2 Coefficients Unstandardized Standardized Coef cients Coef cients 95 Con dence Interval for B Model B Std Error Beta t Sig Lower Bound Upper Bound 1 Constant 46900 69371 676 510 101886 195686 ca2 2401 425 834 5653 000 1490 3312 3 Dependent Variable sod2 h If there is a new brand of meat hot dog with 150 calories per frank how many milligrams of sodium do you estimate that one of these hotdogs contains 16

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

#### "I signed up to be an Elite Notetaker with 2 of my sorority sisters this semester. We just posted our notes weekly and were each making over $600 per month. I LOVE StudySoup!"

#### "There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

#### "It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.