### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# 521 Outline for STAT 30100 with Professor Sorola at Purdue

### View Full Document

## 13

## 0

## Popular in Course

## Popular in Department

This 15 page Class Notes was uploaded by an elite notetaker on Friday February 6, 2015. The Class Notes belongs to a course at Purdue University taught by a professor in Fall. Since its upload, it has received 13 views.

## Similar to Course at Purdue

## Reviews for 521 Outline for STAT 30100 with Professor Sorola at Purdue

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 02/06/15

CHAPTER 11 Multiple Regression With multiple linear regression more than one explanatory variable is used to explain or predict a single response variable Introducing several explanatory variables leads to additional considerations We will not be able to address all these issues but will outline some basic facts about multiple regression Equation of the Multiple Regression Model We have data on several explanatory variables x1x2x3 xp Where p is the number of explanatory variables in the model and a response variable y o The regression model for the population is yl 250 lx1 Zx2 pxp 81 0 The sample prediction equation is 32 b0 b1x1b2x2 bpxp o The 139 residual is 61 yl yl observed response predicted response 0 The estimate for the variability of the response about the regression equation is s The degrees of freedom associated with s2 are n p l As with all models there are some assumptions that need to be met with multiple regression Multiple Regression Assumptions 1 LINEARITY The regression equation must be of the right form to describe the true underlying relationship among the variables To check for linearity make a scatterplot of y against each predictor variable 2 CONSTANT VARIANCE The variability of the residuals must be the same for all values of the x variables To check for constant variance scatterplots of residuals against predicted values are made 3 INDEPENDENCE The residual at one data value must be independent of the residuals at any other data values 4 NORMALITY The distribution of the residuals must be Normal for the I test on the coefficients to follow student s I distribution exactly To check the normality assumption make a probability plot of residuals Collinearity A multiple regression has a collinearity problem when any of the predictors has a strong linear relationship with any of the other predictors The standard error of the coefficient of any predictor that is collinear with the others is in ated leading to a smaller I statistic and correspondingly larger less significant pvalue One clue that collinearity might be a problem is a regression with a large overall Rsquare but with small I ratio for the coefficients Detecting collinearig Regress one predictor on the others If R2 is high for any of the regressions you know that the two predictors are collinear THE WHOLE POINT You may have several explanatory X variables however not all of them are important enough to make it into your model Your goal is to find a model which has low bias and low variability There is however a trade off between bias and variability in your model Adding variables to the model will decrease the bias if the variables are helpful in predicting the response However adding variables could also increase the variability of your model Consequently you only want to add variables that will be helpful How do we know which variables should be included in our model and which should not There are two procedures that you can use Procedure 1 Start with a model that contains all your eXplanatory variables and remove them one at a time till you find that your bias starts to increase too rapidly Procedure 2 Start with a model that contains only one eXplanatory variable and add one variable at a time till you find that your bias is no longer decreasing How do you check the bias in your model Look at the overall R2 the squared multiple correlation for the model The R2 is the proportion of the variation of the response variable y that is eXplained by the eXplanatory variables x1x2xp in amultiple linear regression Basically R2 should be as high as possible or at least not drop drastically when you remove a variable Any variables left in the equation ideally should have a significant Pvalue from the individual I tests of the coefficient Furthermore the confidence intervals or these coefficients should not contain 0 Confidence Intervals for individual j b i SE5 SPSS gives you a 95 CI Significance Test for 6 j Format 0 State the null and alternative hypothesis H 0 5 0 Haz jio Haz jlt0 orHaz j gt0 0 Find the test statistic on the printout or by using the formula I I Z J SE 0 Find the Pvalue from the printout 0 Compare the Pvalue to the I level If Pvalue S 0 then reject H 0 If Pvalue 2 0 then fail to reject H 0 0 State your conclusions in terms of the problem How do you check the variability in your model 1 Look at the standard deviation s for the model Recall 2 2 Ze i 1 and can be found in the regression output The n p smaller s is the better S 2 Look at the Widths of the confidence intervals for the 39s Note Individual regression coe icients their standard errors and signi cance tests are meaningful only when interpreted in context of the other explanatory variables in the model Another test that is useful is the F test It is an overall test that will tell you Whether you want to proceed If you fail to reject the null in the Ftest then none of the explanatory variables in your model will help explain the changes in the response so there is not point continuing This test is helpful if you start with the overall model Analysis of Variance F Test Format 0 State the null and alternative hypothesis HO 1 z z p 0 Ha not all are equal to 0 0 Find the test statistic on the printout 0 Find the Pvalue from the printout 0 Compare the Pvalue to the I level If Pvalue S 0 then reject H 0 If Pvalue 2 0 then fail to reject H 0 0 State your conclusions in terms of the problem What are some of the other things that will be helpful when trying to find the overall best model 1 Look at the variables individually Find their means standard deviations minimums maximums and outliers Look at a histogram or stemplot of the variables 2 Look at the relationship between variables using the correlation and scatterplots Note We want the explanatory variables to have a high correlation with the response variable but do not want two explanatory variables to have a high correlation with each other this could cause collinearity problems We will now look at an example on SPSS Our goal today will be to find the best model to predict a STAT 301 students test 2 score based on their scores for test 1 lab grades homework grades attendance and whether or not they handed in the review for exam 2 The grades are taken from three of Joan Brenneman s stat 301 classes last semester 1 For each of variables in the data set find the mean median standard deviation and IQR Display each distribution with a histogram SPSS To get the descriptive statistics gtAnalyze gtDescriptive statistics gt Explore Pall all variables into Dependent List box Click OK Do the histograms individually for each variable Descnptlves stausue Std Ermr anenganee Mean 17 33 443 95 Cunngenee aner Emma 1619 1nterva1f3r Mean Uppergum 77 97 5 7nmmeg Mean 1745 Megan 19 33 vananee 13 395 stg Dewanun 4 254 M1nnn3m 5 Maxwum 21 Range 16 mterquame Range 6 skewness V1 136 254 k3n3s1s 393 533 Lang Mean 16412 3136 95 Cunngenee aner 331mg 15 795 1nterva1f3r Mean Uppergum 77 my 5 7nmmeg Mean 16635 Megan 16 753 vananee 3 631 stg Dewanun 2 9463 M1nnn3m 5 6 Maxwum 19 9 Range 14 3 mterquame Range 4 3 skewness V1 333 254 k3n3s1s 2 124 533 NW Mean 14 375 4442 95 Cunngenee aner 331mg 13493 1nterva1f3r Mean Uppergum 75 258 5 7nmmeg Mean 14333 Megan 15 533 vananee 17 755 stg Dewanun 4 2137 M1nnn3m 3 Maxwum 19 4 Range 19 4 mterquame Range 3 5 skewness V1 733 254 k3n3s1s 3 395 533 renew Mean 63 353 95 Cunngenee aner 331mg 53 1nterva1f3r Mean Upperaum 7B 5 7nmmeg Mean 73 Megan 1 33 vananee 221 stg Dewanun 473 M1nnn3m 3 Maxwum 1 Range 1 mterquame Range 1 skewness r 774 254 k3n3s1s V1 434 533 7est1 Mean 77 31 1 797 95 Cunngenee aner 331mg 73 44 1nterva1f3r Mean Uppergum EU 58 5 7nmmeg Mean 77 93 Megan 33 53 vananee 293 753 stg Dewanun 17 351 M1nnn3m 33 Maxwum 133 Range 67 mterquame Range 24 skewness r 351 254 k3n3s1s r 334 533 7est2 Mean 74 26 1 633 95 Cunngenee aner 331mg 73 91 1nterva1f3r Mean Uppergum 77 EU 5 7nmmeg Mean 75 32 Megan 77 33 vananee 254 777 stg Dewanun 15 962 M1nnn3m 26 Maxwum 97 Range 71 mterquame Range 19 skewness 7972 254 k3n3s1s 632 533 1512 mm Renew newa A r 7 v 3 2 3 a rm mcy A aquot 7 v i l a a m a 2 Make a scatterplot for each pair of variables in data set Describe the relationships for each Calculate the correlation for each pair of variables and report the PValue for the test of zero population correlation in each case SPSS A quick way to get all the scatterplots in a matrix is as follows gtGraphs gtScatterplots Select Matrix and click De ne Pull all variables into the Matrix variable box Below are the steps to get all the correlations gtAnalyze gtCorrelate gtBivariate Move all variables into Variable box and click OK Matrix of all Possible Scatterplots a 0 2 g 5 o I b g 8f 4 O Q 0 0 o E39 a g g C 8 9 S 6 o o 3 o g C C a 395 9 n M m 90 lt2 0 8 cm 06 N o 0 1 39 0 0 1 Q30 0 attendance Labt hwt review Test 1 Test 2 Correlations Lab t hw t attendance review Test1 Test 2 Labt Pearson Correlation 1 711 686 405 615 719 Sig 2tailed 000 000 000 000 000 N 90 90 90 90 90 90 hwt Pearson Correlation 711 1 685 406 493 609 Sig 2tailed 000 000 000 000 000 N 90 90 90 90 90 90 attendance Pearson Correlation 686 685 1 558 351 490 Sig 2taiIEd 000 000 000 001 000 N 90 90 90 90 90 90 review Pearson Correlation 405 406 558 1 173 369 Sig 2taiIEd 000 000 000 103 000 N 90 90 90 90 90 90 Test 1 Pearson Correlation 615 493 351 173 1 706 Sig 2tailed 000 000 001 103 000 N 90 90 90 90 90 90 Test 2 Pearson Correlation 719 609 490 369 706 1 Sig 2tailed 000 000 000 000 000 N 90 90 90 90 90 90 Correlation is signi cant at the 001 level 2tailed 3 Perform a multiple regression using all the explanatory variables and answer the questions on the next page based on the output SPSS To get the multiple regression output below gtAnalyze gtRegression gtLinear Move Test 2 to dependent box Move remaining variables to Independent box Select Statistics and click on Con dence Intervals Then click Continue followed by OK Model Summary Adjusted Std Error of Model R R Square R Square the Estimate 1 8083 653 632 9681 3 Predictors Constant Test 1 review hwt attendance Labt ANOVAb Sum of Model Squares df Mean Square F Sig 1 Regression 14803030 5 2960606 31591 0003 Residual 7872092 84 93715 Total 22675122 89 3 Predictors Constant Test 1 review hwt attendance Labt b Dependent Variable Test 2 Coef cientsa Unstandardized Standardized Coef cients Coef cients 95 Con dence Interval for B Model B Std Error Beta t Sig Lower Bound Upper Bound 1 Constant 6355 6203 1024 309 5981 18692 attendance 287 393 076 730 468 1069 495 Labt 1908 606 352 3151 002 704 3112 hwt 565 381 149 1484 142 192 1322 review 4623 2639 136 1752 083 625 9872 Test 1 393 078 419 5028 000 237 548 3 Dependent Variable Test 2 10 a State your hypotheses for an AVOVA Ftest give the test statistic and its Pvalue and state your conclusions b Report the t statistic and Pvalues for the tests of the regression coefficients of your explanatory variables Which regression coefficients give you non significant results What conclusions do you draw from these tests c Give 95 confidence intervals for the regression coefficients of your eXplanatory variables Do any of the intervals contain the point 0 d What is the value of s the estimator for standard deviation e What is the percent of the variability in taste that is eXplained by this regression line 11 4 One variable looks like a good candidate to be dropped Which is it Try running the regression again Without this variable Model Summary Adjusted Std Error of Model R R Square R Square the Estimate 1 807a 651 634 9654 3 Predictors Constant review Test 1 hwt Labt ANOVAb Sum of Model Squares df Mean Square F Sig 1 Regression 14753108 4 3688277 39574 0003 Residual 7922014 85 93200 Total 22675122 89 3 Predictors Constant review Test 1 hwt Labt b Dependent Variable Test 2 Coef cients Unstan dardized Standardized Coef cients Coef cients 95 Confidence Interval for B Model B Std Error Beta t Sig Lower Bound Upper Bound 1 Constant 5463 6065 901 370 6596 17522 Labt 1745 561 322 3110 003 629 2860 Test 1 400 077 428 5193 000 247 554 hwt 465 354 123 1312 193 239 1168 review 3905 2442 115 1599 114 950 8761 3 Dependent Variable Test 2 a Give the fitted regression equation b Report the t statistic and Pvalues for the tests of the regression coefficients of your explanatory variables Which regression coefficients give you non significant results What conclusions do you draw from these tests 12 c What is the value of s the estimator for standard deviation d What is the percent of the variability in test 2 that is explained by this regression line Now lets see what happens when we remove hwt Model Summary Adjusted Std Error of Model R R Square R Square the Estimate 1 8023 644 631 9694 3 Predictors Constant Labt review Test1 ANOVAb Sum of Model Squares df Mean Square F Sig 1 Regression 14592579 3 4864193 51756 0003 Residual 8082543 86 93983 Total 22675122 89 3 Predictors Constant Labt review Test1 b Dependent Variable Test 2 Coefficientsa Unstandardized Standardized Coef cients Coef cients 95 Con dence Interval for B Model B Std Error Beta t Sig Lower Bound Upper Bound 1 Constant 4395 6035 728 468 7603 16393 Test 1 413 077 441 5374 000 260 566 review 4533 2405 133 1885 063 248 9314 Labt 2132 479 394 4451 000 1180 3084 3 Dependent Variable Test 2 a Give the fitted regression equation b What is the value of s the estimator of standard deviation l3 c Has R2 changed drastically with hwt removed d Are any of the explanatory variables in the model still not significant at the 5 significance level How about the 10 significance level 5 Now let s look at the model with review removed Model Summary Adjusted Std Error of Model R R Square R Square the Estimate 1 793a 629 620 9836 3 Predictors Constant Test 1 Labt ANOVAb Sum of Model Squares df Mean Square F Sig 1 Regression 14258661 2 7129330 73695 000a Residual 8416461 87 96741 Total 22675122 89 3 Predictors Constant Test 1 Labt b Dependent Variable Test 2 Coefficientsa Unstandardized Standardized Coefficients Coefficients 95 Confidence Interval for B Model B Std Error Beta t Sig Lower Bound Upper Bound 1 Constant 2947 6073 485 629 9125 15018 Labt 2479 449 458 5526 000 1587 3371 Test 1 398 078 425 5130 000 244 552 3 Dependent Variable Test 2 a Has R2 changed drastically with the review variable removed b What is the value of s the estimator of standard deviation 14 c Which model do you think is best and why d What are some additional things we can look at when deciding which model would be best 15

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "I signed up to be an Elite Notetaker with 2 of my sorority sisters this semester. We just posted our notes weekly and were each making over $600 per month. I LOVE StudySoup!"

#### "There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

#### "It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.