### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# 432 Class Note for STAT 30100 with Professor Gundlach at Purdue

### View Full Document

## 16

## 0

## Popular in Course

## Popular in Department

This 16 page Class Notes was uploaded by an elite notetaker on Friday February 6, 2015. The Class Notes belongs to a course at Purdue University taught by a professor in Fall. Since its upload, it has received 16 views.

## Similar to Course at Purdue

## Reviews for 432 Class Note for STAT 30100 with Professor Gundlach at Purdue

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 02/06/15

Chapters 2 and 10 Least Squares Regression Learning goals for this chapter Describe the form direction and strength of a scatterplot Use SPSS output to find the following leastsquares regression line correlation r2 and estimate for o Interpret a scatterplot residual plot and Normal probability plot Calculate the predicted response and residual for a particular xvalue Understand that leastsquares regression is only appropriate if there is a linear relationship between x and y Determine explanatory and response variables from a story Use SPSS and the t table to nd the confidence interval for the regression slope and intercept Perform a hypothesis test for the regression slope and for zero population correlation independence including stating the null and alternative hypotheses obtaining the test statistic and Pvalue from SPSS and stating the conclusions in terms of the story Understand that correlation and causation are not the same thing Estimate correlation for a scatterplot display of data Distinguish between prediction and extrapolation Check for differences between outliers and in uential outliers by rerunning the regression Know that scatterplots and regression lines are based on sample data but hypothesis tests and confidence intervals give you information about the population parameter When you have 2 quantitative variables and you want to look at the relationship between them use a scatterplot Ifthe scatter plot looks linear then you can do least squares regression to get an equation of a line that uses x to explain what happens with y The general procedure 1 Make a scatter plot of the data from the x and y variables Describe the form direction and strength Look for outliers Look at the correlation to get a numerical value for the direction and strength If the data is reasonably linear get an equation of the line using least squares regression Look at the residual plot to see if there are any outliers or the possibility of lurking variables Patterns bad randomness good 5 Look at the normal probability plot to determine whether the residuals are normally distributed The dots sticking close to the 45degree line is good 6 Look at hypothesis tests for the correlation slope and intercept Look at con dence intervals for the slope intercept and mean response and at the prediction intervals 7 If you had an outlier you should rework the data without the outlier and comment on the differences in your results Association 0 Positive negative 01 no association I Remember ASSOCIATON or CORRELATION is NOT the same thing as CAUSATION See chapter 325 notes Response variable 0 Y I Dependent variable 0 measures an outcome of a study Explanatory variable 0 X 0 Independent variable 0 explains or is related to changes in the response variables p 105 Scattemlots I Show the relationship between 2 quantitative variables measured on the same individuals I Dots onlyidon t connect them with a line or a curve I Form Linear Nonlinear No obvious pattern Direction Positive or negative association No association 0 Strength how closely do the points follow a clear form Strong or weak or moderate I Look for OUTLIERS Correlation measures the direction and strength of the linear relationship between 2 quantitative variables r It is the standardized value for each observation with respect to the mean and standard deviation where we have data on variables x and y for n individuals sx sy You won 7 need to use this formula but SPSS will Using SPSS to get correlation Use the Pearson Correlation output Analyze gt Correlate gt Bivariate see page 55 in the SPSS manual The SPSS manual tells you where to find r using the least squares regression output but this r is actually the ABSOLUTE VALUE OF r so you need to pay attention to the direction yourself The Pearson Correlation gives you the actual r with the correct sign Properties of correlation X and Y both have to be quantitative It makes no difference which you callX and which you call Y Does not change when you change the units of measurement Ifr is positive there is a positive association betweenX and Y AsX increases Y increases Ifr is negative there is a negative association betweenX and Y AsX increases Y decreases l S r S 1 The closer r is to l or to l the stronger the linear relationship The closer r is to 0 the weaker the linear relationship Outliers strongly affect r Use r with caution if outliers are present m gm kilograms zao o zoo zoo soc son 1000 Nonexmlsuulv ycalories Correlation r 09 Correlation r 099 Fig ure 21 0 Introduction to the Practite ufStatislicS Fifth Edition 2005 w HFreeman and Company 3 Example We want to examine whether the amount of rainfall per year increases or decreases corn bushel output A sample of 10 observations was taken and the amount of rainfall in inches was measured as was the subsequent growth of corn Amount of Rain Bushels of Com 303 80 347 84 421 90 444 95 495 97 511 102 563 105 634 112 656 115 682 115 The scatterplot 120 u u D 110 D D 100 D D 90 n u 80 n 70 2 3 4 5 6 7 amount of rain in a What does the scatterplot tell us What is the form Direction Strength What do we expect the correlation to be Correlations amount of corn yield rain in bushels amount of rain in Pearson Correlation Sig 2tailecl N corn yield bushels Pearson Correlation Sig 2tailecl N Correlation is significant at the 001 level 2tailed Inference for Correlation R correlation R2 of variation in Y explained by the regression line the closer to 100 the better p Greek letter rho correlation for the population When p 0 there is no linear association in the population so X and Y are independent if X and Y are both normally distributed Hypothesis test for correlation 2 To test the null hypothesis Hg p 0 SPSS will compute the t statistic t Fri 2 17 r degrees of freedom n 7 2 for simple linear regression b Are corn yield and rain independent in the population Perform a test of signi cance to determine this c D0 corn yield and rain have a positive correlation in the population Perform atest of signi cance to determine this This test statistic for the correlation is numerically identical to the t statistic used to test Hg Can we do better than just a scatter plot and the correlation in describing how x and y are related What if we want to predict y for other values of x Least Squares Regression ts a straight line through the data points that will minimize the sum of the vertical distances of the data points from the line Minimizes ZXel2 11 0 Equation of the line is j 2 be blx with 3 the predicted yum s I Slope of the line is b1 ri where the slope measures the amount of change SK caused in the predicted response variable when the explanatory variable is increased by one unit I Intercept of the line is be 7 7 b1 where the intercept is the value of the predicted response variable when the explanatory variable 0 Type of line Least Squares Regression slope y intercept equation of line Ch 10 Sample j be blx b1 b0 Ch 10 Population y o lxl 81 u model Using the corn example nd the least squares regression line Tell SPSS to do Analyze 9Regression9Linear Put rain into the independent box and corn into the dependent box Click OK Model Sum man Adjusted Std Error of Model R R Square R Square the Estimate 1 995a 991 989 1290 3 Predictors Constant amount of rain in b Dependent Variable corn yield bushels ANOVR Sum of Model Squares df Mean Square F Sig 1 Regression 1397195 1 1397195 840070 0008 Residual 13305 8 1663 Total 1410500 9 8 Predictors Constant amount of rain in 13 Dependent Variable com yield bushels Coef cient Unstandardized Standardized Coef cients Coel cients 95 Con dence Interval for B Model B Std Error Beta t Sig Lower Bound Upper Bound 1 Constant 50835 1728 29421 000 46851 54819 amount ofrain an 9625 332 995 28984 000 8859 10391 3 Dependent Variable corn yield bushels d What is the leastsquares regression line equation The scatterplot with the least squares regression line looks like 120 n R2 is the percent of 11 0 variation in corn yield explained by the regress1on 1 line with rain 9906 100 u u 90 u 80 70 qu 09906 2 3 4 5 6 7 amount of rain in Con dence Intervals and Signi cance Tests for Regression Slope and Intercept Level C con dence interval for the intercept o is be if SEbn Level C con dence interval for the slope is 131 i t SEbl SPSS will also give you these con dence intervals for 95 but you may have to use the estimates for the coef cients and their standard errors to find other confidence intervals use t table and n 2 degrees offreedom to get I Hypothesis testing for Ho 1 0 b Test statistic t 1 With df n 2 b1 SPSS will give you the test statistic under t and the 2sided Pvalue under Sig e Give a 95 con dence interval for the slope f Give a 90 con dence interval for the slope g Is the slope positive Perform a test of significance h What of the variability in corn yield is explained by the least squares regression line i What is the estimate of the standard error of the model What do we mean by prediction or extrapolation Use your leastsquares regression line to nd y for other x values 0 Prediction using the line to nd yvalues corresponding to xvalues that are within the range of your data xvalues I Extrapolation using the line to nd yvalues corresponding to xvalues that are outside the range of your data xvalues Be careful about extrapolating yvalues for x values that are far away from the x data you currently have The line may not be valid for wide ranges of x Example On the raincom data above predict the corn yield for a 5 inches of rain b 72 inches ofrain c 0 inches of rain d 100 inches of rain e For which amounts of rainfall above do you think the line does a good job of predicting actual corn yield Why Deadly Sins Cartoon by J B Landers on wwwcauseweborg used with permission Assumptions for Regression Repeated responses y are independent of each other For any fixed value of x the response y varies according to a Normal distribution The mean response y has a straightline relationship with x 4 9 The standard deviation of y o is the same for all values of x The value of o is unknown How do you check these assumptions 0 Scatterplot and R2 Do you have a straightline relationship between X and Y How strong is it How close to 100 is R2 Hopefully no outliers 3 0 Normal probability plot Are the residuals approximately normally distributed Do the dots fall fairlv close to the diagonal line which is always there in the same spot 2 Normal PP Plot of Regresslorl Standardlzed Resldual Dependentvariable com yleld lbushelsi i Expe ed ciim th n i i i i in D n in x in Ohselved Cum Full 0 Residual plot Do you have constant variability Do the dots on your residual plot look random and fairly evenly distributed above and below the 0 line Hopefully no outliers 1 and 4 Residual 0 eld measuremenr Residual is the vertical difference between the observed yvalue and the regression line yvalue reszdual bx yum yum Residual plot scatterplot of the regression residuals against the explanatory variable 6 vs x eaXis has both negative and positive values but centered about 6 0 the mean of the leastsquares residuals is always zero E 0 Good total randomness no pattern approximately the same number of points above and below the e 0 line 0 Bad obvious pattern funnel shape parabola more points above 0 than below or vice versa 0 if you have a pattern your data does not necessarily t the model line well Legend zoo subjen 15 150 100 o wa SO Residuals Residual Subjen 1 a x g L s s 7 a 9 0quotIZIJHISWWI5192021222314 n 20 AU 50 Number of years Laboratory measurement 0 6 8101214161820 H an peltem Residual o 2uo a no mo sun son mun Nonexercise activity calories 2 3 a 5 a 7 amcunt ofraln m 10 Outliers I Outliers are observations that lie outside the overall pattern of the other observations 0 Outliers in the y direction of a scatterplot have large regression residuals e Outliers in the x direction of a scatterplot are often in uential for the regression line 0 An observation is in uential if removing it would markedly change the result of the calculation 0 Outliers can drastically affect regression line correlation means and standard deviations I You can draw a second regression line that doesn t include the outliersiif the second line moves more than a small amount when the point is deleted or if R2 changes much the point is in uential Which hypothesis test do you use when If you re not sure whether to use or p here are some guidelines The test statistics and P values are identical for either symbol Use If the words are 1 Slope regression coef cient p Correlation independence Either 1 01 p linear relationship Review 0fSPSS instructions for Regression When you set up your regression you click on Analyze gtRegression gtLinear Put in your y variable for quotdependentquot and your x variable for quotindependentquot on the gray screen Don39t hit quotokquot yet though At the bottom of that gray screen click on quotStatisticsquot and then click on quotconfidence intervalsquot if you will want the confidence intervals for any part of the problem You can also click on quotdescriptivesquot if you want information like the mean and standard deviation for each variable Click quotcontinuequot to the Statistics gray screen Back on the regression gray screen click on quotPlotsquot and then click on quotnormal probability plotquot Click quotcontinuequot on the Plots gray screen Back on the regression gray screen click on quotSavequot and then click on quotunstandardized residualsquot Click quotcontinuequot on the Save gray screen and then quotokquot to the big Regression gray screen You still won39t have a residual plot yet If you click back to your data input screen you now have a new column called quotResilquot To make the residual plot you follow the same steps for making a scatterplot go to graphs gtscatter gtsimple then put quotresilquot in for y and your x 11 variable in for x Click quotokquot Once you see your residual plot you39ll need to double click on it to go to Chart Editor On the Chart Editor tool bar you can see a button that shows a graph with a horizontal line Click on that button Make sure that the y axis is set to O 12 H 0w will I ever use this stu agaiu in my future career Testimonial from a Former Student Emails received June 9 2005 Ellen I hope that all is well It is your favorite student here Eric from your Fall 04 stat 301 class I need some help Believe it or not you were right and I am using stat everyday all day long but I am drawing a blank I am trying to determine a linear regression line and I can t remember the equation YmXb or course but on the regression analysis output what dquseasmampb I am very disappointed in myself because I can t remember but alas I am asking for help I tried looking for the notes on your home page but I couldn t nd it any longer and I think it was taken down for the summer If you could help me that would be great Hope your summer months are spent by the pool Regards Eric Ellen As for the project It essentially was a regression to determine the amount of out of state cotton seed a crushing plant would need when their own states current production increased or decreased It is not very statistically sound since we are only using 5 years worth of databut it is just a tool in price analysis that we are using to determine a spread between plants that we can buy the cotton seed at As for using methat would be great Let those impressionable young students see that we are using what they learn on an everyday basissome more than othersespecially me since I deal with prices and trading Eric 13 Example The scatterplot below shows the calories and sodium content for each of 17 brands of meat hot dogs a Describe the main features of the relationship 600 u 500 u u n u n u 400 u u u a U u u 300 200 u 100 100 120 140 160 180 200 Calories b What is the correlation between calories and sodium Correlations Sodium Calories content Calories Pearson Correlation 1 863 Sig 2tailed 000 N 17 17 Sodium content Pearson Correlation 863 1 Sig 2tailed 000 N 17 17 Correlation is significant at the 001 level 2tailed c Report the leastsquares regression line Model Summary Adjusted Std Error of Model R R Square R Square the Estimate 1 863a 745 728 48913 3 Predictors Constant Calories b Dependent Variable Sodium content 14 Coel cient Unstandardized Standardized Coef cients Coef cients 95 Con dence Interval for B Model B Std Error Beta t Sig Lower Bound Upper Bound 1 Constant 91185 77812 1172 260 257038 74668 Calories 3212 485 863 6628 000 2179 4245 3 Dependent Variable Sodium content d Show a residual plot and comment on its features 100 0 u a 7100 u 7200 100 120 140 160 180 200 Calories e Is there an outlier If so where is it f Show a normal probability plot and comment on its features Normal PP Plot of Regression Standardized Residual Dependent Variable Sodium content 10 DE DE u4 0 Expected Cum Prob EIZ EIEI I I I I EIEI EIZ D4 EIE US In Observed Cum Prob 15 g Leave off the outlier and recalculate the correlation and another least squares regression line Is your outlier in uential Explain your answer Correlation s cal2 sod2 cal2 Pearson Correlation 1 834 Sig 2tailed 000 N 16 16 sod2 Pearson Correlation 834 l Sig 2tailed 000 N 16 16 Correlation is signi cant at the 001 level Model Summary Adjusted Std Error of Model R R Sq uare R Sq uare the Estimate 1 834a 695 674 36406 3 Predictors Constant cal2 b Dependent Variable sod2 Coel cient Unstandardized Standardized Coef cients Coef cients 95 Con dence Interval for B Model B Std Error Beta t Siq Lower Bound Upper Bound 1 Constant 46900 69371 676 510 101886 195686 Ca2 2401 425 834 5653 000 1490 3312 3 Dependent Variable sod2 h If there is a new brand of meat hot dog with 150 calories per frank how many milligrams of sodium do you estimate that one of these hotdo gs contains 16

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "Selling my MCAT study guides and notes has been a great source of side revenue while I'm in school. Some months I'm making over $500! Plus, it makes me happy knowing that I'm helping future med students with their MCAT."

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.