### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# 231 Class Note for STAT 30100 with Professor Wang at Purdue

### View Full Document

## 18

## 0

## Popular in Course

## Popular in Department

This 13 page Class Notes was uploaded by an elite notetaker on Friday February 6, 2015. The Class Notes belongs to a course at Purdue University taught by a professor in Fall. Since its upload, it has received 18 views.

## Similar to Course at Purdue

## Reviews for 231 Class Note for STAT 30100 with Professor Wang at Purdue

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 02/06/15

Chapter 11 Multiple Regression Multiple Regression is what we use when there are 2 or more quantitative explanatory variables which will be used to predict another quantitative response variable Simple Linear Regression Chapters 2 and 10 is used when you have just 1 quantitative explanatory variable and 1 quantitative response variable For simple linear regression Chapters 2 and 10 our statistical model was y o Ax 8 only 1 explanatory variable In the multiple regression Chapter 11 our statistical model is y 60 1x1 2x2 6 pxp 8 have p explanatory variables Recall that 0 is the sample estimator for o bl is the sample estimator for 1etc The deviationserror 8 are independent and normally distributed with mean 0 and standard deviation 0 The parameters ofthe modelare 8 p and o kkkk Just because we have data for several xvariables doesn t mean that all the X variables are important enough to go in your model We must do a multiplestep procedure to decide which X variables are the most important when describing y So what do we do when we have multiple xvariables I Look at the variables individually Means standard deviations minimums and maximums outliers if any stem plots or histograms are all good ways to show what is happening with your individual variables In SPSS Analyze gt Descriptive gt Statistics gt Expore gt Plots check the desired plots I Look at the relationships between the variables using the correlation and scatter plots In SPSS Analyze gt Correlate gt Bivariate Put all your variables all the X s and y into the variables box and hit ok The higher the Pearson Correlation between 2 variables the better and the lower the Sig 2tailed the better The Pvalue Sig is the result ofthe test Hoip0V8HaZp 0 thatwedidinchapter10 STAT 301 Spring 2009 Chapter 11 Page 1 Which are the stronger relationships between an X and the y Which are the stronger X toX relationships Look at scatter plots between each pair ofvariables too you will look at a LOT of graphs We are only interested in keeping the variables which had strong relationships I Do a regression using the variables you decided were important from part 2 ANOVA Table for Multiple Regression This will include an ANOVA table and coefficient output like in Chapter 10 We had ANOVA remm for 5 imle linear I39egi39eioir in Cl ll hm hm Since we only lmd me B we lidn 39r need to me it ANOVA SS df MS F Significance Regression SSM DFMp MSMSSMlDFM MSMlMSE Pvalue Residual SSE DFEnp 1 MSESSEIDFE52 Total SST DFTn1 MSTSSTlDFT s estimate for the standard deviation lslISE Analysis of Variance FTest In the multiple regression model the hypothesis HOI IZ ZZ 6p0 Ha notall 1 282 8p 0 uses the Fstatistic Ha means at least one 3 i 0 We can t tell how many are regression coef cients are not 0 at this point We need to do ttests to be more specific Think back We did Bonferroni multiple comparisons t tests if we could reject the null hypothesis in a Oneway ANOVA F test If we reject H0 basically we have determined that this problem is worthy of further study 2 2 Even if the Pvalue Sig is small you need to look at R lfthe R is small it means the model variables being used does not do a very good job of explaining the variation in y You can get a fitted regression equation from the estimates for bj in the SPSS output at this point The SPSS output will also include confidence intervals t test statistics and respective Pvalues for the respective individual bj STAT 301 Spring 2009 Chapter 11 Page 2 The degrees of freedom we will use for the ttests will be np1 where n is the of subjects we study p is the of explanatory variables Confidence Intervals for individual j b i SE 95 Cl given to you in SPSS b J39 To test H0 j 0 for an individual variable use the t statistic l 2 given to you b in SPSS Look at the t statistic and its associated pvalue to determine ifthe particular coefficient differs from O is greater than 0 or is less than 0 remember to divide by 2 for a one sided test I Interpretation of Results and Checking ResidualsAssumptions Sometimes variables that are significant by themselves may not be signi cant when other variables are included too The signi cance tests for individual regression coef cients assess the signi cance of each predictor variable assuming that all other predictors are included in the regression equation Use residuals to help determine whether the multiple regression model is appropriate for the data Look for outliers in uential observations evidence of a nonlinear relation and anything else unusual Multiple Regression Assumptions 1 LINEARITY The regression equation must be ofthe right form to describe the true underlying relationship among the variables To check for linearity make a scatterplot ofy against each X 2 CONSTANT VARIANCE The variability of the residuals must be the same for all values ofthe xvariables To check for constant variance scatterplots of residuals against predicted values are made 3 INDEPENDENCE The residual at one data value must be independent of the residuals at any other data values To check this plot residuals vs each ofthe explanatory variables 4 NORMALITY The distribution of the residuals must be Normal for the t test on the coef cients to follow student s t distribution exactly To check the normality assumption make a probability plot of residuals STAT 301 Spring 2009 Chapter 11 Page 3 E Refining the Model We are interested in keeping only the variables with the strongest relationships o Try deleting the variable with the largest Pvalue then rerun the regression 2 o Check to see if R s Pvalue from Ftest individual t statistics change much 0 2 o R should be as high as possible or at least not drop drastically when you remove a variable 0 The standard deviation s should be as small as possible 0 The Ftest statistic from ANOVA should get bigger and the Pvalue from the ANOVA F test should get smaller 0 Any variables left in the equation should have a signi cant Pvalue from their ttest of the coef cient their confidence intervals should not contain 0 unless taking out a slightly 2 insignificant coefficient makes the R and s move the wrong direction 0 Our goal is to keep only the variables which are the most useful to us Get rid ofany excess variables but balance removing insigni cant variables with the change that has on the whole model How do we know which variables should be included in our model and which should not Procedure 1 Start with a model that contains all your explanatory variables with strong correlations run the regression and then remove one at a time whichever variables aren t signi cant from the t test 2 until you find that your R starts to decrease too rapidly or your 3 goes up too rapidly You may end up leaving in one or more variables which are not sini cant on their own You 39ust have to see what removin them does to the whole model 39 n Procedure 2 Start with a model that contains only one explanatory variable and add one variable at a time till 2 you nd that your R is no longer increasing rapidly Sometimes there may be more than one appropriate choice for your model The most important thing is to be able to explain why you chose the model you did Not every model is as easy to define as the one in the CHEESE example below STAT 301 Spring 2009 Chapter 11 Page 4 Example 1 As cheddar cheese matures a variety of chemical processes take place The taste of mature cheese is related to the concentration of several chemicals in the final product In a study of cheddar cheese from the La Trobe Valley of Victoria Australia samples of cheese were analyzed for their chemical composition and were subjected to taste tests Data for one type of cheesemanufacturing processes appears in below The variable Case is used to number the observations from 1 to 30 Taste is the response variable of interest The taste scores were obtained by combining the scores from several tasters Three chemicals whose concentrations were measured were acetic acid hydrogen sul de and lactic acid For acetic acid and hydrogen sul de natural log transformations were taken Thus the explanatory variables are the transformed concentrations of acetic acid Acetic and hydrogen sulfide H28 and the untransformed concentration of lactic acid Lactic These data are based on experiments performed by G T Lloyd and E H Ramshaw ofthe CSIRO Division of Food Research Victoria Australia Case Taste Acetic H28 Lactic 1 123 4543 3135 086 2 209 5159 5043 153 3 39 5366 5438 157 4 479 5759 7496 181 5 56 4663 3807 099 6 259 5697 7601 109 7 373 5892 8726 129 8 219 6078 7966 178 9 181 4898 385 129 10 21 5242 4174 158 11 349 574 6142 168 12 572 6446 7908 19 13 07 4477 2996 106 14 259 5236 4942 13 15 549 6151 6752 152 16 409 6365 9588 174 17 159 4787 3912 116 18 64 5412 47 149 19 18 5247 6174 163 20 389 5438 9064 199 21 14 4564 4949 115 22 152 5298 522 133 23 32 5455 9242 144 24 567 5855 10199 201 25 168 5366 3664 131 26 116 6043 3219 146 27 265 6458 6962 172 28 07 5328 3912 125 29 134 5802 6685 108 30 55 6176 4787 125 a Look at each variable individually using graphs and descriptive statistics Any outliers STAT 301 Spring 2009 Chapter 11 Page 5 Enter data in SPSS then selectgt Analyze gtDescriptive StatisticsgtExplore Select Plots options Statistics options choose either stemplot or histogram mi 41 Note if more detail on the descriptive statistics for each variable is desired eg min max median IQR variance etc need to use additional procedures in SPSS Each histogram above gives n x and s STAT 301 Spring 2009 Chapter 11 Page 6 Descriptives I Statistic Std Error TaSte Mean 24533 29678 95 Con dence Lower Bound 13453 Interval for Mean Upper Bound 30603 5 Trimmed Mean 24052 Median 20950 Variance 264237 Std Deviation 162554 Minimum 7 Maximum 572 Range 565 nterquartie Range 246 Acetic Mean 549803 104228 95 Con dence Lower Bound 523435 Interval for Mean Upper Bound 571 120 5 Trimmed Mean 550043 Median 542500 Variance 326 Std Deviation 570878 Minimum 4477 Maximum 6458 Range 1981 nterquartie Range 713 H2S Mean 594177 388313 95 Con dence Lower Bound 514753 nterva for Mean Upper Bound 673596 5 Trimmed Mean 537755 Median 532900 Variance 4524 Std Deviation 2126879 Minimum 2996 Maximum 10199 Range 7203 nterquartie Range 3766 Laotic Mean 14420 05541 95 Con dence Lower Bound 13237 nterva for Mean Upper Bound 15553 5 Trimmed Mean 14407 Median 14500 Variance 092 Std Deviation 30349 Minimum 86 Maximum 201 Range 115 nterquartie Range 46 STAT 301 Spring 2009 Chapter 11 Page 7 b Look at a scatterplot and a residual plot oftaste versus acetic taste versus H28 and taste versus lactic Do you see any problems STAT 301 Spring 2009 Chapter 11 Page 8 Scatterplot Taste vs Acetic Residual plotAcetic EM Scatterplot Taste vs H28 Residual plotH28 Eu Scatterplot Taste vs Lactic Residual plotLactic STAT 301 Spring 2009 Chapter 11 Page 9 Be ow We see a mamx Showmg each of the vanab es p otted agamst the others Tms gWeS us ah dea of the asaocwauon between vamab es We Want to examme these potehhax assocwauona further by Ookmg at the c0rre atwon5 a D o D was own a Dog 000 oo o o co 0 o cg can a n 06 93 33 E g o a9 9 o gb 0 6 08 8 a 0 o 00 OS on Lg Do a g goo 0 no a a I o i o 0Q 0 3quot 000 o no 90 0 8 0 u a 00 i Q o o O O 2 mmoo 3g 0 gage 20 00 We use we Lam c Wmch exp anatory vamab es x seAcehc H23 Lacuc are most strongw corre ated to the response vanab e y tame Ca cu ate the c0rre auon for each pah ofva ab es and report the Fiveus for each SelectAnalyzzgtgtCorrelatzgtgt81varmtz Cnnelzllnns mm mm ma Feavaan 1 Carm azbw smasher h M We p mu p M Umbraquot 0 sc 245m A v N 30 3 M25 Pearatm Landaan 73 quot HM 5 mm n Dan M Cmve aliom N 30 iv Mgm csnlau ne luau Penman zonevem Dvrm aMv I 5 gt ahed she mm m ho N an 3 STAT 3m Sp ng 2mg chamem Pagem d Which variables look important at this time e Perform a multiple regression using the explanatory variables which look important at this point Give the tted regression equation and answer the additional questions below the output Moder Summary Adjusted R Std Error of Model R R sdaare Square the Esltmate t 807a 552 612 101307 a Predictors Constant Lactic Acelic H28 ANOVAlbl sum or Model Squares dr Mean Square F Sta 1 Regression 4994476 3 1564825 16221 000a Residual 2668411 25 102631 Total 7662887 29 a Predictors Constant Lactic Acelic H23 b Dependenl Varlable Taste comment unstandardtzed standardized Cnel ctents cuemctents 95 Con dence interval luv 5 Modet B std Error EEta t Std Lower Bound upper Bound 1 Constalvt 725 an t9 735 V1 453 155 east 444 11 ago H25 1w t24s 512 3133 004 t 345 6A7 Lacttc 19 671 s 529 357 2 250 031 t 933 374 Acetic 325 4 450 otz m 942 rd ass 9 495 a DependentVaname Taste Regression line 0 State your hypotheses for an ANOVA Ftest give the test statistic and its pvalue and state your conclusion g Report the tstatistics and the pvalues for the tests of the regression coef cients of your explanatory variables What conclusions do you draw from these tests h Give the 95 con dence intervals for the regression coef cients of your explanatory variables Do any of the intervals contain the point 0 STAT 301 Spring 2009 Chapter 11 Page 11 i What percent of the variation in taste is explained by the regression line j What is the value of s the estimator for standard error of the model k One variable looks like a good candidate to be dropped Which one is it Try running the multiple regression again without this variable Look at parts e throughj again Model Summary Adjusted R Std Error of Model R R Square Square the Estimate 1 007ta 652 626 99424 a Predictors Constant Lactic H28 ANOVAle Sum or Moder Squares di Mean Square F Sig 1 Eagles 4993921 2 2496961 25260 DODa Residual 2969995 27 96651 Total 7662887 29 a Predrctors Constant Laotic H25 b Dependent Varraore Taste Coe iciemslaj Siandard ized Unslandardized Coer cie 95 Con dence Moder Coef cients nts t Sig interval for a Lower Upper B Std Error Beta Bound Bound 1 Constant 27592 3992 9072 005 46021 9163 H25 3949 1 135 516 3475 002 1516 6277 Lactic 19887 7959 371 2499 019 3557 36215 a Dependent Variable Taste e Regression Line f ANOVA Ftest g Tests on regression coefficients h 95 Cl on regression coefficients STAT 301 Spring 2009 Chapter 11 Page 12 i What percent of variation explained by new regression line j What is the value of s the estimator for standard error of the model What changed What stayed the same or improved Original tall 3 New lonly H28 and Change explanatory Lacticl un39inblcsl R 652 652 5 101307 99424 F P vanc 162210 25200 0 Insigni cant None lanaiory variables Now look at a residual plot for each of the variables you still have in the model Do a normal probability plot too ill k Which is the better model Using the better model predict the taste for an H2S4 and Lactic1 STAT 301 Spring 2009 Chapter 11 Page 13

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "I made $350 in just two days after posting my first study guide."

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "Their 'Elite Notetakers' are making over $1,200/month in sales by creating high quality content that helps their classmates in a time of need."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.