### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# 214 Class Note for STAT 30100 with Professor Zhao at Purdue

### View Full Document

## 16

## 0

## Popular in Course

## Popular in Department

This 11 page Class Notes was uploaded by an elite notetaker on Friday February 6, 2015. The Class Notes belongs to a course at Purdue University taught by a professor in Fall. Since its upload, it has received 16 views.

## Similar to Course at Purdue

## Reviews for 214 Class Note for STAT 30100 with Professor Zhao at Purdue

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 02/06/15

Chapter 11 Multiple Regression Multiple Regression is what you use when you have 2 or more quantitative explanatory variables which will be used to predict another quantitative response variable Simple Linear Regression Chapters 2 and 10 is used when you have just 1 quantitative explanatory variable and 1 quantitative response variable For simple linear regression Chapters 2 and 10 our statistical model was y u ag 8 In the multiple regression Chapter 11 our statistical model is y1 o 1x11 2x21m pxpl 8 where you have p explanatory variables Just because you have data for several x variables doesn t mean that all the x variables are important enough to go in your model We must do a multiplestep procedure to decide which x variables are the most important when describing y So what do we do when we have multiple x variables 1 Look at the variables individually 0 Means standard deviations minimums and maximums outliers if any stem plots or histograms are all good ways to show what is happening with your individual variables 0 In SPSS Analyze Descriptive Statistics Explore 2 Look at the relationships between the variables using the correlation and scatter plots 0 In SPSS Analyze Correlate Bivariate Put all your variables all the x s and y into the variables box and hit 0k 0 The higher the Pearson Correlation between 2 variables the better and the lower the Sig 2tailed the better The PValue Sig is the result of the test H0 p 0 vs Ha p at 0that we did in chapter 10 0 Which are the stronger relationships between an x and the y Which are the stronger x tox relationships 0 Look at scatter plots between each pair of variables too you will look at a LOT of graphs 0 We are only interested in keeping the variables which had strong correlations 3 Do a regression using the variables you decided were imp01tant from part 2 o This will include an ANOVA table and coef cient output like what we saw in Chapter 10 We had ANOVA results for simple linear ANOVA Table for Multiple Regression regression in Ch 10 00 but Since we only had one i we didn t need to use it ANOVA SS df MS F Signi cance Regression SSM DFMp MSMSSMIDFM MSMMSE Pvalue Residual SSE DFEnp 1 MSESSEDFEs2 Total SST DFTn1 MSTSSTDFT s estimate for the standard deviation JMSE Analysis of Variance F Test In the multiple regression model the hypothesis H05 2 p0 Ha Not 2 m0 Ha means at least one 0 We can t tell how many are regression coefficients are not 0 at this point We need to do ttests to be more speci c Think back We did Bonferroni multiple comparisons t tests ifwe could reject the null hypothesis in a One way ANOVA F test If we reject H g basically we have determined that this problem is worthy of further study 0 Even if the Pvalue Sig is small you need to look at R2 If the R2 is small it means the model variables you are using does not do a very good job of explaining the variation in y 0 You can get a fitted regression equation from the estimates for bj in the SPSS output at this point 0 The SPSS output will also include con dence intervals t test statistics and respective P values for the respective individual bj o The degrees of freedom we will use for the ttests will be npl where n is the of subjects we study p is the of explanatory variables 0 Con dence Intervals for individual b j ifFSEbJ 95 CI given to you in SPSS J 0 To test Ho 0 for an individual variable use the t statistict 17 given to you in SPSS Interpretation of Results Sometimes variables that are significant by themselves may not be signi cant when other variables are included too The signi cance tests for individual regression coef cients assess the signi cance of each predictor variable assuming that all other predictors are included in the regression equation Residuals Use residuals to help determine whether the multiple regression model is appropriate for the data Use several residual plots since there are several explanatory variables Plot residuals vs each of the explanatory variables and vs f Look for outliers in uential observations evidence of a nonlinear relation and anything else unusual Use a normal probability plot to determine that the residuals are normally distributed Look for your points to make an increasing line Re lling the Model Try deleting the variable with the largest Pvalue then rerun the regression Check to see if R2 s Pvalue from Ftest individual t statistics change much 0 R2 should be as high as possible or at least not drop drastically when you remove a variable 0 The standard deviation s should be as small as possible o The Ftest statistic from ANOVA should get bigger and the Pvalue from the ANOVA Ftest should get smaller 0 Any variables left in the equation should have a signi cant Pvalue from their ttest of the coef cient their con dence intervals should not contain 0 unless taking out a slightly insigni cant coef cient makes the R2 and s move the wrong direction 0 Our goal is to keep only the variables which are the most useful to us Get rid of any excess variables but balance removing insigni cant variables with the change that has on the whole model How do we know which variables should be included in our model and which should not Procedure 1 Start with a model that contains all your explanatory variables with strong correlations run the regression and then remove one at a time whichever variables aren t signi cant from the ttest untill you nd that your R2 starts to decrease too rapidly or your s goes up too rapidly You may end up leaving in one or more variables which are not signi cant on their own You just have to see what removing them does to the whole model This is the procedure that we will follow in the lecture notes and that you should use for this class Procedure 2 Start with a model that contains only one explanatory variable and add one variable at a time till you nd that your R2 is no longer increasing rapidly Sometimes there may be more than one appropriate choice for your model The most important thing is to be able to explain why you chose the model you did Not every model is as easy to de ne as the one in the CHEESE example below Example Exercises 1143 1 151 As cheddar cheese matures a variety of chemical processes take place The taste of mature cheese is related to the concentration of several chemicals in the nal product In a study of cheddar cheese from the La Trobe Valley of Victoria Australia samples of cheese were analyzed for their chemical composition and were subjected to taste tests Data for one type of cheesemanufacturing processes appears in below The variable Case is used to number the observations from 1 to 30 Taste is the response variable of interest The taste scores were obtained by combining the scores from several tasters Three chemicals whose concentrations were measured were acetic acid hydrogen sul de and lactic acid For acetic acid and hydrogen sul de natural log transformations were taken Thus the explanatory variables are the transformed concentrations of acetic acid Acetic and hydrogen sul de H28 and the untransformed concentration of lactic acid Lactic Case Taste Acetic H23 Lactic 1 123 4543 3135 086 2 209 5159 5043 153 3 39 5366 5438 157 4 479 5759 7496 181 5 56 4663 3807 099 6 259 5697 7601 109 7 373 5892 8726 129 8 219 6078 7966 178 9 181 4898 385 129 10 21 5242 4174 158 11 349 574 6142 168 12 572 6446 7908 19 13 07 4477 2996 106 14 259 5236 4942 13 15 549 6151 6752 152 16 409 6365 9588 174 17 159 4787 3912 116 18 64 5412 47 149 19 18 5247 6174 163 20 389 5438 9064 199 21 14 4564 4949 115 22 152 5298 522 133 23 32 5455 9242 144 24 567 5855 10199 201 25 168 5366 3664 131 26 116 6043 3219 146 27 265 6458 6962 172 28 07 5328 3912 125 29 134 5802 6685 108 30 55 6176 4787 125 a For each of the 4 variables in the CHEESE data set find the mean median standard deViation and IQR Display each distribution by means of a stemplot Descriptives I Statistic Std Error Taste Mean 24533 29678 95 Confidence Lower Bound 13463 Interval for Upper Bound Mean 30603 5 Trimmed Mean 24052 Median 20950 Variance 264237 Std Deviation 162554 Minimum 7 Maximum 572 Acetic H28 Lactic Range Interquartile Range Mean 95 Confidence Lower Bound Interval for U er Bound Mean pp 5 Trimmed Mean Median Variance Std Deviation Minimum Maximum Range Interquartile Range Mean 95 Confidence Lower Bound Interval for U er Bound Mean pp 5 Trimmed Mean Median Variance Std Deviation Minimum Maximum Range Interquartile Range Mean 95 Confidence Lower Bound nterva for U er Bound Mean pp 5 Trimmed Mean Median Variance Std Deviation Minimum Maximum Range Interquartile Range 565 246 549803 528486 571120 550043 542500 326 570878 4477 6458 1 981 713 594177 514758 673596 587765 532900 4524 2126879 2996 10199 7203 3766 1 4420 1 3287 1 5553 14407 14500 092 30349 86 201 115 46 104228 388313 05541 Taste Stem and Leaf Plot Frequency Stem amp Leaf 500 0 00556 900 1 123455688 600 2 011556 500 3 24789 200 4 07 300 5 467 Acetic Stem and Leaf Plot Frequency Stem amp Leaf 100 4 4 500 4 55678 1100 5 12222333444 600 5 677888 700 6 001134 H25 Stem and Leaf Plot Frequency Stem amp Leaf 100 2 9 700 3 1268899 500 4 17799 300 5 024 500 6 11679 400 7 4699 100 8 7 300 9 025 100 10 1 Lactic Stem and Leaf Plot Frequency Stem amp Leaf 200 0 89 1500 1 000112222333444 1200 1 555566777899 100 2 0 b Make a scatterplot for each pair of variables in the CHEESE data set you will have 6 plots Describe the relationships Calculate the correlation for each pair of variables and report the PValue for the test of zero population correlation in each case I Taste Taste Pearson 1 39 Correlation Sig 2tailed N 30 Acetic Pearson H Correlation 39550 Sig 2tailed 002 N 30 H25 Pearson H H Correlation 39756 39618 Sig 2tailed 000 000 Correlation is N 30 30 significant at the LaCtic Pearson H H 001 level 2 Correlation 39704 39604 tailed Sig 2tailed 000 000 N 30 30 o O o 00 go 0 O o O 00 O 9 o 0 lt9 0 0 960 0 a 0 oo 0 9 We 95 0 an 0 90 0 000 00 O 0 CO 0 O O 3 a 0 00 Q 0 0c 0 g o O o 3 c o o Cb r of 93 0amp6 0 8 0 08 98 c3 0 019 0 08 m 3 38 3 o g A 00 o I o g 0 0530 6 0 8 0 mo O O o 8 00 o L 8 Cbng C9358 o 3 6 00c9 0 0 o o 3 o no 00 mo 0 W o g 9 O 00 00 0 0 Acetic Taste H23 Lactic c Perform a multiple regression using the explanatory variables which look important at this point Give the tted regression equation Model Summary Adjusted R Std Error of Model R R Square Square the Estimate 1 807a 652 612 101307 a Predictors Constant Lactic Acetic H2S ANOVAb Sum of Model Squares df Mean Square F Sig 1 Regression 4994476 3 1664825 16221 000a Residual 2668411 26 102631 Total 7662887 29 a Predictors Constant Lactic Acetic H2S b Dependent Variable Taste Coefficientsquot Unstandardized Standardized Coef cients Coef cients 95 Con dence Interval for B Model B Std Error Beta t Sig Lower Bound Upper Bound 1 Constant 28877 19735 1463 155 69444 11690 H28 3912 1248 512 3133 004 1346 6478 Lactic 19671 8629 367 2280 031 1933 37408 Acetic 328 4460 012 073 942 8839 9495 3 Dependent Variable Taste d State your hypotheses for an ANOVA F test give the test statistic and its P value and state your conclusion e Report the t statistics and Pvalues for the tests of the regression coefficients of your explanatory variables What conclusions do you draw from these tests Give the 95 con dence intervals for the regression coefficients of your explanatory variables Do any of the intervals contain the point 0 This should verify your answer to part e g h What is the value of s the estimator for standard error of the model What percent of variation in taste is explained by these explanatory variables One variable looks like a good candidate to be dropped Which one is it Try running the multiple regression again without this variable Look at parts c through h again Model Summary Adjusted R Std Error of Model R R Square Square the Estimate 1 807a 652 626 99424 a Predictors Constant Lactic H2S ANOVAb Sum of Model Squares df Mean Square F Sig 1 Segressw 4993921 2 2496961 25260 000a ReSidual 2668965 27 98851 Total 7662887 29 a Predictors Constant Lactic H2S b Dependent Variable Taste Coefficientsa Standard ized Unstandardized Coefficie 95 Confidence Model Coefficients nts t Sig Interval for B Lower Upper B Std Error Beta Bound Bound 1 Constant 27592 8982 3072 005 46021 9163 H28 3946 1136 516 3475 002 1616 6277 Ladic 19887 7959 371 2499 019 3557 36218 a Dependent Variable Taste What changed What stayed the same or improved 10 Original all 3 New only H28 and Change explanatory Lactic variables R 652 652 5 101307 99424 F Pvalue 16221 0 25260 0 Insigni cant Acetic None explanatory variables j Now look at a residual plot for each of the variables you still have in the model Do a normal probability plot too anmmmm mm ummmm mm m Unslandardixed mam k 11 9 mm a n g g m a 0 um H mm a Nam nu Shimmum Ruml maul mm mm Emma Bum m Dasaxed Bum m Using the better model predict the taste for an H2S4 and Lactic1

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "I used the money I made selling my notes & study guides to pay for spring break in Olympia, Washington...which was Sweet!"

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.