### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# Elementary Statistical Methods STAT 30100

Purdue

GPA 3.63

### View Full Document

## 9

## 0

## Popular in Course

## Popular in Statistics

This 13 page Class Notes was uploaded by Bailey Macejkovic on Saturday September 19, 2015. The Class Notes belongs to STAT 30100 at Purdue University taught by Yong Wang in Fall. Since its upload, it has received 9 views. For similar materials see /class/207933/stat-30100-purdue-university in Statistics at Purdue University.

## Similar to STAT 30100 at Purdue

## Reviews for Elementary Statistical Methods

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 09/19/15

Chapter 11 Multiple Regression Multiple Regression is what we use when there are 2 or more quantitative explanatory variables which will be used to predict another quantitative response variable Simple Linear Regression Chapters 2 and 10 is used when you have just 1 quantitative explanatory variable and 1 quantitative response variable For simple linear regression Chapters 2 and 10 our statistical model was y o Ax 8 only 1 explanatory variable In the multiple regression Chapter 11 our statistical model is y 60 1x1 2x2 6 pxp 8 have p explanatory variables Recall that 0 is the sample estimator for o bl is the sample estimator for 1etc The deviationserror 8 are independent and normally distributed with mean 0 and standard deviation 0 The parameters ofthe modelare 8 p and o Just because we have data for several xvariables doesn t mean that all the X variables are important enough to go in your model We must do a multiplestep procedure to decide which X variables are the most important when describing y So what do we do when we have multiple xvariables Look at the variables individually Means standard deviations minimums and maximums outliers if any stem plots or histograms are all good ways to show what is happening with your individual variables In SPSS Analyze gt Descriptive gt Statistics gt Expore gt Plots check the desired plots Look at the relationships between the variables using the correlation and scatter plots In SPSS Analyze gt Correlate gt Bivariate Put all your variables all the X s and y into the variables box and hit ok The higher the Pearson Correlation between 2 variables the better and the lower the Sig 2tailed the better The Pvalue Sig is the result ofthe test Hoip0V8HaZp 0 thatwedidinchapter10 STAT 301 Spring 2009 Chapter 11 Page 1 Which are the stronger relationships between an X and the y Which are the stronger X toX relationships Look at scatter plots between each pair ofvariables too you will look at a LOT of graphs We are only interested in keeping the variables which had strong relationships I Do a regression using the variables you decided were important from part 2 o This will include an ANOVA table and coefficient output like in Chapter 10 We hurl Al VOl A l39 39llll for s implc linear I39t gi39cniou in Ch 1 ton lifil39t39 we ANOVA Table for Multiple Regression onlr lrml one 3 we litln 39r need to me it ANOVA SS df MS F quot39 Regression SSM DFMp MSMSSMlDFM MSMMSE Pvalue Residual SSE DFEnp 1 MSESSEIDFEs2 Total SST DFTn1 MSTSSTlDFT s estimate for the standard deviation ViiSE Analysis of Variance FTest In the multiple regression model the hypothesis H0 1 2 Ha not all 81 82 2 y 2 0 uses the Fstatistic Ha means at least one 3 i 0 We can t tell how many are regression coef cients are not 0 at this point We need to do ttests to be more specific Think back We did Bonferroni multiple comparisons t tests if we could reject the null hypothesis in a Oneway ANOVA F test If we reject H0 basically we have determined that this problem is worthy of further study 0 Even if the Pvalue Sig is small you need to look at R lfthe R is small it means the model variables being used does not do a very good job of explaining the variation in y You can get a fitted regression equation from the estimates for bj in the SPSS output at this point The SPSS output will also include confidence intervals t test statistics and respective Pvalues for the respective individual bj STAT 301 Spring 2009 Chapter 11 Page 2 The degrees of freedom we will use for the ttests will be np1 where n is the of subjects we study p is the of explanatory variables Confidence Intervals for individual j b i SE 95 Cl given to you in SPSS To test H0 j 0 for an individual variable use the t statistic l j SE in SPSS Look at the t statistic and its associated pvalue to determine ifthe particular coefficient differs from O is greater than 0 or is less than 0 remember to divide by 2 for a one sided test given to you Interpretation of Results and Checking ResidualsAssumptions Sometimes variables that are significant by themselves may not be signi cant when other variables are included too The signi cance tests for individual regression coef cients assess the signi cance of each predictor variable assuming that all other predictors are included in the regression equation Use residuals to help determine whether the multiple regression model is appropriate for the data Look for outliers in uential observations evidence of a nonlinear relation and anything else unusual Multiple Regression Assumptions 1 LINEARITY The regression equation must be ofthe right form to describe the true underlying relationship among the variables To check for linearity make a scatterplot ofy against each X 2 CONSTANT VARIANCE The variability of the residuals must be the same for all values ofthe xvariables To check for constant variance scatterplots of residuals against predicted values are made 3 INDEPENDENCE The residual at one data value must be independent of the residuals at any other data values To check this plot residuals vs each ofthe explanatory variables 4 NORMALITY The distribution of the residuals must be Normal for the t test on the coef cients to follow student s t distribution exactly To check the normality assumption make a probability plot of residuals STAT 301 Spring 2009 Chapter 11 Page 3 5quot Refining the Model We are interested in keeping only the variables with the strongest relationships o Try deleting the variable with the largest Pvalue then rerun the regression 2 Check to see if R s Pvalue from Ftest individual t statistics change much 2 R should be as high as possible or at least not drop drastically when you remove a variable The standard deviation s should be as small as possible The Ftest statistic from ANOVA should get bigger and the Pvalue from the ANOVA F test should get smaller Any variables left in the equation should have a signi cant Pvalue from their ttest of the coef cient their confidence intervals should not contain 0 unless taking out a slightly 2 insignificant coefficient makes the R and s move the wrong direction Our goal is to keep only the variables which are the most useful to us Get rid of any excess variables but balance removing insigni cant variables with the change that has on the whole model How do we know which variables should be included in our model and which should not Procedure 1 Start with a model that contains all your explanatory variables with strong correlations run the regression and then rengove one at a time whichever variables aren t signi cant from the t test until you find that your R starts to decrease too rapidly or your 3 goes up too rapidly You may end up leaving in one or more variables which are not sini cant on their own You 39ust have to see what removin them does to the whole model Procedure 2 Start with a model that contains only one explanatory variable and add one variable at a time till you nd that your R is no longer increasing rapidly Sometimes there may be more than one appropriate choice for your model The most important thing is to be able to explain why you chose the model you did Not every model is as easy to define as the one in the CHEESE example below STAT 301 Spring 2009 Chapter 11 Page 4 Example 1 As cheddar cheese matures a variety of chemical processes take place The taste of mature cheese is related to the concentration of several chemicals in the final product In a study of cheddar cheese from the La Trobe Valley of Victoria Australia samples of cheese were analyzed for their chemical composition and were subjected to taste tests Data for one type of cheesemanufacturing processes appears in below The variable Case is used to number the observations from 1 to 30 Taste is the response variable of interest The taste scores were obtained by combining the scores from several tasters Three chemicals whose concentrations were measured were acetic acid hydrogen sul de and lactic acid For acetic acid and hydrogen sul de natural log transformations were taken Thus the explanatory variables are the transformed concentrations of acetic acid Acetic and hydrogen sulfide H28 and the untransformed concentration of lactic acid Lactic These data are based on experiments performed by G T Lloyd and E H Ramshaw ofthe CSIRO Division of Food Research Victoria Australia Case Taste Acetic H28 Lactic 1 4543 3135 86 2 20 9 5159 5043 1 53 3 39 5366 5438 1 57 4 47 9 5759 7 496 1 81 5 5 6 4663 3 807 0 99 6 25 9 5697 7 601 1 09 7 37 3 5892 8 726 1 29 8 21 9 6078 7 966 1 78 9 18 1 4 898 3 85 1 29 10 21 5 242 4 174 1 58 11 34 9 574 6142 168 12 57 2 6446 7 908 1 9 13 0 7 4477 2 996 1 06 14 25 9 5236 4 942 1 3 15 54 9 6151 6 752 1 52 16 40 9 6365 9 588 1 74 17 159 4787 3912 116 18 6 4 5412 4 7 1 49 19 18 5247 6174 163 20 38 9 5438 9 064 1 99 21 14 4564 4 949 1 15 22 15 2 5298 5 22 1 33 23 32 5455 9 242 1 44 24 56 7 5855 10199 2 01 25 16 8 5366 3 664 1 31 26 11 6 6043 3 219 1 46 27 26 5 6458 6 962 1 72 28 0 7 5 328 3 912 1 25 29 13 4 5802 6685 1 08 30 5 5 6176 4787 1 25 a Look at each variable individually using graphs and descriptive statistics Any outliers STAT 301 Spring 2009 Chapter 11 Page 5 Enter data in SPSS then selectgt Analyze gtDescriptive StatisticsgtExplore Select Plots options Statistics options choose either stemplot or histogram AU Note if more detail on the descriptive statistics for each variable is desired eg min max median IQR variance etc need to use additional procedures in SPSS Each histogram above gives n f and s STAT 301 Spring 2009 Chapter 11 Page 6 Descriptives I Statistic Std Error TaSte Mean 24533 29678 95 Con dence Lower Bound 13453 Interval for Mean Upper Bound 30603 5 Trimmed Mean 24052 Median 20950 Variance 264237 Std Deviation 162554 Minimum 7 Maximum 572 Range 565 nterquartie Range 246 Acetic Mean 549803 104228 95 Con dence Lower Bound 523435 Interval for Mean Upper Bound 571 120 5 Trimmed Mean 550043 Median 542500 Variance 326 Std Deviation 570878 Minimum 4477 Maximum 6458 Range 1981 nterquartie Range 713 H2S Mean 594177 388313 95 Con dence Lower Bound 514753 nterva for Mean Upper Bound 673596 5 Trimmed Mean 537755 Median 532900 Variance 4524 Std Deviation 2126879 Minimum 2996 Maximum 10199 Range 7203 nterquartie Range 3766 Laotic Mean 14420 05541 95 Con dence Lower Bound 13237 nterva for Mean Upper Bound 15553 5 Trimmed Mean 14407 Median 14500 Variance 092 Std Deviation 30349 Minimum 86 Maximum 201 Range 115 nterquartie Range 46 STAT 301 Spring 2009 Chapter 11 Page 7 b Look at a scatterplot and a residual plot oftaste versus acetic taste versus H28 and taste versus lactic Do you see any problems STAT 301 Spring 2009 Chapter 11 Page 8 Scatterplot Taste vs Acetic Residual plotAcetic EM Scatterplot Taste vs H28 Residual plotH28 Eu Scatterplot Taste vs Lactic Residual plotLactic STAT 301 Spring 2009 Chapter 11 Page 9 Tms gwes us an dea of am further by ookmg at me corre auons a D o oo 0 go a o o co 0 o 00 at D peel eoyooo mow 9 0 9 was o0 o0 a o m o a n 2 a 00 one o g a n 06 no 9 g9 co d o 5 a0 9 0 8 8 a 0 o Do a a on us Do 3 gas 00 on 0 K I o g o a ma o o o oo 0 O O 8 w 5000 a we u o o t 3 co m W0 0 3g 0 4 9 0 00 oo a mm W H25 mm c 39 H Jame each paw he F rVaue for each S2125Ahab22gtgtCmr21atzgtgt81vmm22 cmmnns M mam N mm W mmquot SK 2 12 m N25 Fem Wm st WM aive alion 1 N mu WW 0 0 w 2 mm med 5m STAT 3m Spnng 2mg Chaptev Pagem d Which variables look important at this time c Perform a multiple regression using the explanatory variables which look important at this point Give the tted regression equation and answer the additional questions below the output Model Summary Adjusted R Std Error of M d t RSuare State the Esltmate a 652 101307 a Predidors Constant Laaic Acetic H28 ANOVAlbl l l l l Model s uares df Mean uare F Sin 1 Regression 4994476 1664525 16221 000a Residual 2668411 26 102631 Total 7662887 29 39maatc Acetic H28 Dependent Variable Taste comttctentr Uiistandardized standardized coemctents caemctents Model 5 Std Er Eeta t Sin Lower Bound Upper Bound 1 Constant 725 37 t9 735 71 453 155 69 444 11 can H25 aetz t24s 5t2 3133 004 1346 tacttc 19 571 a 529 357 2 250 031 t 933 37 4nd Acettc 325 4 450 m m 942 d a 9 9 455 a Dependent Variable Taste Regression line 0 State your hypotheses for an ANOVA Ftest give the test statistic and its pvalue and state your conclusion g Report the tstatistics and the pvalues for the tests of the regression coef cients of your explanatory variables What conclusions do you draw from these te s h Give the 95 con dence intervals for the regression coef cients of your explanatory variables Do any of the intervals contain the point 07 STAT 301 Spring 2009 Chapter 11 Page 11 i What percent of the variation in taste is explained by the regression line j What is the value of s the estimator for standard error of the model k One variable looks like a good candidate to be dropped Which one is it Try running the multiple regression again without this variable Look at parts e throughj again Model Summary ANDVAlbl Sumol Model Squares di Mean Square F Sig 1 Eagles 4993921 2 2496961 26260 DODa Residual 2668965 27 90061 Total 7662887 29 a Predictors Constant aaie H2s b Dependent Variable Taste Coe iciemslaj Slandard ize Unslandardized Coemcie 95 Con idence Model One 1 iems ms t Sig imewa rerB 0w Upper 9 Std Error Beta Bound Bound 1 Constant w27592 6992 3072 005 16021 9163 H25 3946 1136 516 3475 002 1616 6277 Lactic 19067 7959 371 2499 019 3557 36210 H p m m anan Iasre e Regression Line f ANOVA Ftest g Tests on regression coefficients h 95 Cl on regression coefficients STAT 301 Spring 2009 Chapter 11 Page 12 i What percent of variation explained by new regression line j What is the value of s the estimator for standard error of the model What changed What stayed the same or improved Lacticl Now look at a residual plot for each of the variables you still have in the model Do a normal probability plot too ill k Which is the better model Using the better model predict the taste for an H2S4 and Lactic1 TAT 301 Spring 2009 Chapter 11 Page 13

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

#### "When you're taking detailed notes and trying to help everyone else out in the class, it really helps you learn and understand the material...plus I made $280 on my first study guide!"

#### "There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

#### "Their 'Elite Notetakers' are making over $1,200/month in sales by creating high quality content that helps their classmates in a time of need."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.