### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# INTROBUSINESSSTAT STAT102

Penn

GPA 3.53

### View Full Document

## 9

## 0

## Popular in Course

## Popular in Statistics

This 24 page Class Notes was uploaded by Orval Funk on Monday September 28, 2015. The Class Notes belongs to STAT102 at University of Pennsylvania taught by Staff in Fall. Since its upload, it has received 9 views. For similar materials see /class/215434/stat102-university-of-pennsylvania in Statistics at University of Pennsylvania.

## Reviews for INTROBUSINESSSTAT

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 09/28/15

Lecture 14 STAT 102 Outliers and in uential observations in simple linear regression Review Outliers and in uential observations in multiple linear regression Leverage plots Outliers and in uential points in simple regression Does the age at which a child begins to talk predict a score on a test of mental ability at a later age gesellJ MP contains data on the age at rst word X and their Gesell Adaptive score y an ability test taken at a later age Child 18 is an outlier in the X 50 9 direction so it is a leverage point and 5 1390 1395 2390 2395 3390 3395 4390 45 potentially in uential Age Child 19 is a regression outlier Outliers in Simple Linear Regression An outlier is an observation that is unusually small or large Three types of outliers in scatterplots Outlier in X direction Outlier in y direction Outlier from regression line of scatterplot residual has large magnitude Several possibilities need to be investigated when an outlier is observed There was an error in recording the value The point is not representative of the population of interest The observation is valid Identify outliers from the scatterplot 3 Leverage and In uential Points An observation has high leverage if it is an outlier in the X direction An observation is in uential if removing it would markedly Change the least squares line Observations that have high leverage and are outliers tend to be in uential Outliers and in uential points in simple linear regression To assess Whether a point is in uential t the least squares line With and Without the point excluding the row to t it Without the point and see how much of a difference it makes Child 18 is highly in uential child 19 is not highly in uential 6039 I I I I I I I 5 10 15 20 25 30 35 4O 45 Age 130 40 120 3919 W018 30 19 110 35 20 100 10 E 90 Full data g 0 I 80 quotgt10 3918 70 6390 7390 8390 9390 60 Score Predicted 50 I I I I I I I 5 1 15 2 9 3 35 4 45 Residuals of full data Full data quuare04l WO 19 quuare057 Score 10987 1127 Age Score 10930 1193Age WO 18 quuarezolll 19 pOSSIble outlier Score 10563 0779Age 18 High leverage and in uential 6 Bivariate Fit of Score By Age wo influential point 18 150 I I I 5 10 15 20 25 Linear Fit Score 10563 078Age Summary of Fit RSquare 01121 Analysis of Variance Source DF Sum oquuares Mean Square F Ratio Mode 1 2805195 280519 2 2740 Error 18 22204805 123360 Prob gt F CTota 19 25010000 0 Parameter Estimates e m Estimate Std Error t Ratio Probgtt Intercept 10562987 7161928 Ag 79221 0516733 151 0 Bivariate Fit of Score By Age all data I I I I I I I 5 10 15 20 25 30 35 40 45 Age Linear Fit Score 10987 113Age Summary of Fit RSquare 0 Analysis of Variance Source DF Sum oquuares Mean Square F Ratio Model 1604 132018 Error 19 23085858 121 50 Prob gt F C Total 20 39126667 0 8 Parameter Estimates erm Estimate Std Error t Ra io Probgtt Intercept 10987384 5067802 2168 Age 6989 0310172 363 00018 Conclusion It is not clear at all that scores and ages are related for normal children How to identify outliers and in uential points in multiple regression Leverage Plot Outliers leverage and in uential points in multiple regression Pollution Example Data set p011utionJMP provides information about the relationship between pollution and mortality for 60 cities between 19591961 The variables are y MORTt0tal age adjusted mortality in deaths per 100000 population PRECIPmean annual precipitation in inches EDUCmedian number of school years completed for persons 25 and older NONWHITEpercentage of 1960 population that is nonwhite NOXrelative pollution potential of N0X related to amount of tons of N0X emitted per day per square kilometer SOZrelative pollution potential of 02 Based on the previous study we will use PRECIP EDUC NONWHITE LogNOX and 802 to predict MORT a We will use stepwise selection guided by the effect tests to add or delete predictors into the model b Under Analyze gt Fit Model gt MORT gt Y Add PRECIP EDUC NONWHITE LogNOX and 802 into Construct Model Effects c Choose Stepwise under Personality gt Run Model We will check or uncheck each variable according to the F ratio statistics The nal model is chosen based on R squares and the pvalues Usually only variables which are significant should stay in the nal model Here an the steps Stepwise Fit Response MORT Stepwise Regression Control Prob to Enter 0250 Prob to Leave 0100 Direction Current Estimates SSE DFE MSE RSquare 7625974 55 13865 06659 Enter ed Parameter Estimate nDF Intercept 999316169 1 PRECIP 161112351 1 EDUC 15773367 1 NONWH ITE 306092577 1 LogNOX 1 802 03271823 1 Step History Step Parameter Action 1 NONWH ITE Entered 2 EDUC Entered 3 802 Entered 4 PRECIP Entered RSquare Adj 88 quotF Ratioquot 0 0000 89658 6466 7056097 5089 3445846 24852 3613212 2686 2110228 15219 quotSig Seq SS Probquot 00000 9459556 00000 3384833 00030 1460366 00138 89658 quotProbgtFquot 10000 00138 00281 00000 01071 00003 RSquare 04144 05627 06267 0 6659 Cp AIC 06416 6685792 4388534 CD 43366 20206 1135 66858 1 Our nal model is Summary of Fit RSquare 0665928 Analysis of Variance Source DF Sum of Squares Mean Square FRatio Model 4 15201334 380033 274087 Error 55 7625974 13865 ProbgtF C Total 59 22827308 lt0001 Parameter Estimates Term Estimate Std Error tRatio Probgtt Intercept 99931617 9207861 1085 lt0001 PRECIP 16111235 0633579 254 00138 EDUC 1577337 6992113 226 00281 NONWHITE 30609258 0614004 499 lt0001 SO2 03271823 0083867 390 00003 EffectTests Source Nparm DF Sumoquuares FRatio ProbgtF PRECIP 1 1 8965800 64663 00138 EDUC 1 7056097 50890 00281 NONWHITE 1 1 34458462 248521 lt0001 S02 1 1 21102285 152194 00003 Residual by Predicted Plot 100 E 50 E 8 DC 0 g 39 E 50 o 3939 750 800 850 900 950 1050 1150 MORT Predicted Outliers in Multiple Regression Outliers in terms of multiple regression Observations with large residuals If residuals come from normal distribution then a residual with absolute value larger than about 2696 is expected only 1 of the time Investigate observations with residuals of large magnitude 100 39Albany NY 50 I New Orleans LA 0 39 l 50 1 00 39 LNhaaiFd llPA 750 850 950 1050 MORT Predicted MO RT Residual 0 Residual plot of MORT vs PRECIP EDUC NONWHITE and 802 0 Four places shown on the plot show some large residuals 0 Notice that residual plots for multiple regression are using residuals vs predicted values Leverage in Multiple Regression In a simple regression a point has high leverage if it is an outlier in X In a multiple regression We will identify leverage points for each predictor We use leverage plots to identify high leverage and in uential points for each regression coefficient High leverage observations for a certain Xvariable may affect the estimated value of that coefficient Leverage Plots A simple regression view of a multiple regression coeff1c1ent For X Residual y vvo Xj vs Residual X vs the rest of X s both axes are recentered at their means Slope is the Coefficient for that variable in the multiple regression The pvalue same as the effect test pvalue Distances from the points to the LS line are multiple regression residuals Useful to identify relative to Xj outliers leverage in uential points Use them the same way as in a simple regression Pollution data the nal model is Summary of Fit RSquare 0665928 Analysis of Variance Source DF Sum of Squares Mean Square FRatio Model 4 15201334 380033 274087 Error 55 7625974 13865 ProbgtF C Total 59 22827308 lt0001 Parameter Estimates Term Estimate Std Error tRatio Probgtt Intercept 99931617 9207861 1085 lt0001 PRECIP 16111235 0633579 254 00138 EDUC 1577337 6992113 226 00281 NONWHITE 30609258 0614004 499 lt0001 SO2 03271823 0083867 390 00003 Effect Tests Source Nparm DF Sum of Squares F Ratio Prob gt F PRECIP 1 1 8965800 64663 00138 EDUC 1 7056097 50890 00281 NONWHITE 1 1 34458462 248521 lt0001 1 1 21102285 152194 00003 802 Residual by Predicted Plot 100 0391 0 MORT Residual O 01 00 39 39 750 800 850 900 950 1050 1150 MORT Predicted PRECIP Leverage Plot Leverage plots 1150 1100 1050 MORT Leverage R EDUC Leverage Plot PRECI P Leverage P00138 1150 gnoc 850 800 MORT Leverage Resi 900 39 750 I I 90 95100 I I I 110 120 EDUC Leverage P00281 NONWHITE Leverage Plot 1150 1100 MORT Leverage Resi New Orleans LA I 02 Leverage Plo I I I I I I I I 50 510152025303540 NONWHITE Leverage Plt0001 t 1150 1100 1050 01000 0 950 Wew39QrIe39a ns LA quotx 900 850 800 MORT Levera n 750 50 0 I I I I I 50 100 150 200 250 30c 02 Leverage P00003 18 Whole Model Actual by Predicted Plot Summary of Fit RSquare 0665928 RSquare Adj 0641631 Root Mean Square Error 3723628 Mean of Response 9403568 Observations or Sum Wgts 60 Analysis of Variance Source DF Sum of Squares Mean Square Model 4 15201334 380033 Error 55 7625974 C Total 59 22827308 Parameter Estimates Term Estimate Std Error t Ratio Pro Intercept 99931617 9207861 8 PRECIP 16111235 0633579 254 0 EDUC 1577337 6992113 226 0 NONWHITE 30609258 0614004 499 lt 802 03271823 0083867 390 0 Effect Tests Source Nparm DF Sum of F Squares Ratio 8965800 64663 7056097 50890 34458462 248521 21102285 152194 PRECIP 1 1 EDUC 1 1 NONWHITE 1 1 802 1 1 Leverage Plot F Ratio 274087 13865 Prob gt F lt0001 bgtItl lt 0001 0138 0281 0001 0003 Prob gt F 00138 00281 lt0001 00003 New Orleans LA I 950 OJOJCO 0010 I I I I I I I 5 0 51015202530 I 35 40 NONWHITE Leverage Plt0001 Bivariate Fit of Y Leverage of NONWHITE for MORT By X Leverage of NONWH O 1100 E New Orleans LA 02 LO LO 0 0 0391 0 0391 0 0391 0 O O O O 02 O O l l l l l l l 0 5 10 15 20 25 30 35 X Leverage of NONWHITE for MORT Y Leverage of NONWHITE for 1 0391 Linear Fit Linear Fit Y Leverage of NONWHITE for MORT 90402358 30609258 X Leverage ofNONWHITE for MORT Summary of Fit RSquare 0311227 RSquare Adj 0299351 Root Mean Square Error 36 26049 Mean of Response 9403568 Observations or Sum Wgts 60 Analysis of Variance Source DF Sum oquuares Mean Square F Ratio Model 1 3445846 344585 262077 Error 5 7625974 13143 prob gt r c Total 59 11071820 Parameter Estimates Term Estimate Std Error t Ratio Intercept 90402358 8502028 10633 X Leverage of NONWHITE for MORT 30609258 0597914 512 oThe output from the whole model fit is on the left together with the Leverage plot for NONWHITE oWe can reproduce the leverage plot by Analyze gt Fit Model gt Save Columns gt Effect Leverage Pairs Then fit Y leverage to X leverage in a simple regression shown on the right 0 Notice the coefficients for NONWHITE are the same from both outputs Interpretation of Leverage Plots The enlarged observation New Orleans is a moderate outlier and it is somewhat leveraged for estimating the coef cient of both 802 and NONWHITE and possibly of EDUC Since New Orleans is both moderately highly leveraged and an outlier we suspect that it might be in uential 20 Whole Model Actual by Predicted Plot Summary of Fit RSquare RSquare Adj Root Mean Square Error Mean of Response Observations or Sum Wgts Analysis of Variance Source DF Model Error 55 C Total 59 Parameter Estimates Term stimate Intercept 99931617 PRECIP 16111235 EDUC 1577337 NONWHITE 30609258 SO2 03271823 Effect Tests Source Nparm PRECIP 1 E UC 1 NONWHITE 1 S02 1 Sum of Squares Mean Square 01334 76259 74 22827308 Std Error 92 07861 0633579 6 9921 1 3 0614004 0083867 DF Sum of Squares 8965800 7056097 34458462 21 1 02285 0665928 0641631 37 23628 9403568 F Ratio 274087 Prob gt F lt0001 380033 1386 5 t Ratio Probgtt 1085 lt0001 254 00138 226 00281 499 lt0001 390 00003 F Ratio Prob gt F 64663 50890 248521 152194 00138 00281 lt0001 00003 Whole Model Actual by Predicted Plot Summary of Fit RSquare RSquare Adj Root Mean Square Error Mean of Res onse Observations or Sum Wgts Analysis of Variance 0656139 0630668 3550292 9374297 OThe output on the right is for the data Without New Orleans F Ratio ORemoving New Orleans has some impact on the coef cients for 02 EDUC and NONWHITE The difference is quite noticeable for the EDUC coefficient less so for the others We Will stop here for this model Sourc F Sum of Squares Mean Square Model 12987783 324695 257601 Error 54 6806471 12605 Prob gt F C Total 58 19794254 lt0001 Parameter Estimates Term Estimate Std Error tRatio Probgtt Intercept 96776738 8865992 1092 lt0001 PRECIP 16309136 0604135 270 00093 EDUC 128732 6762958 190 00623 NONWHITE 26542287 0606761 437 lt0001 SO2 03675916 0081518 451 lt0001 Effect Tests Source Nparm DF Sum of F Ratio Prob gt F Squares PRECIP 1 1 9185898 72877 00093 EDUC 1 1 4566969 36233 00623 NONWHITE 1 1 24119568 191356 lt0001 S02 1 1 25630014 203339 lt0001 21 The in uential points can have extreme impact on the 0 We might have used an alternative model 0 Because of the importance of NOX and 802 we might have chosen the nal model to be MORTVSPRECIRNONWHITE EDUC and log Nox and log 802 0 Notice that log Nox is not signi cant One could still leave it in the model so that we can better see Whether it has an effect analysis Whole Model Summary of Fit RSquare 0688278 Analysis of Variance Source DF Sum of Mean F Ratio Squares Square Model 5 15711528 314231 238462 Error 54 7115780 13177 Prob gt F C Total 59 22827308 lt0001 Parameter Estimates Term Estimate Std Error t Ratio Probgtt Intercept 9406541 9405424 1000 lt0001 PRECIP 19467286 0700696 278 00075 EDUC 1466406 6937846 211 00392 NONWHITE 3028953 0668519 453 lt0001 LogNOX 67159712 739895 091 03681 LogSO2 1135814 5295487 214 00365 Effect Tests Source Sum of Squares F Ratio Prob gt F PRECIP 10171388 77188 00075 EDUC 5886913 44674 00392 NONWHITE 27051227 205285 lt0001 LogNOX 1085691 08239 03681 LogSO2 6062217 46005 00365 8951493 bx rsdis gt 100 rleans LA N m MORT Residual o l l l l l l l 7 0 800 850 900 950 1050 11 0 MORT Predicted 22 PRECIP Log NOX Leverage Plot 50 MORT Leverage Re Leverage Plot 1 150 g 1 100 1 3931mm M II I I I 13 20 30 4O 50 GO PRECWH va PQOOE MORT Leverage Re idu Log NOX Leverage PO3681 Leverage Plot I I I I I I I 90 95 10010511011512012513 EDUCLwemgePQOQ Log 02 Leverage P00366 NONWHITE Leve raqe Plot MORT Leverage Res I I 73 O 5 1O NOMNHWELa mgePltOGN I I I I I 15 20 25 30 35 4D The enlarged observation New Orleans is an outlier for estimating each coefficient and is highly leveraged for estimating the coefficients of interest on log Nox and log 802 Since New Orleans is both highly leveraged and an outlier we eXpect it to be in uential 23 Multiple Regression with New Orleans Summary of Fit RSquare RSquare Adj Root Mean Square Error Mean of Response Observations or Sum Wgts Analysis of Variance 0688278 0659415 3630065 9403568 60 Source DF Sum of Squares Mean Square F Ratio Model 5 15711528 Error 54 7115780 C Total 59 22827308 Parameter Estimates Term Intercept 9406541 9405424 PRECIP 19467286 0700696 EDUC 1466406 6937846 NONWHITE 3028953 0668519 Log NOX 67159712 739895 Log SO2 1135814 5295487 314231 238462 13177 Prob gt F lt0001 Estimate Std Error t Ratio Probgtt 1000 lt0001 278 00075 211 00392 453 lt0001 091 03681 214 00365 Multiple Regression without New Orleans Summary of Fit RSquare RSquare Adj Root Mean Square Error Mean of Response Observations or Sum Wgts Analysis of Variance 0724661 0698686 3206752 9374297 59 Source DF Sum of Squares Mean Square F Ratio Model 5 14344128 Error 53 5450126 C Total 58 19794254 Parameter Estimates Term Intercept 8523761 859328 PRECIP 13633298 0635732 EDUC 5666948 652378 NONWHITE 30396794 0590566 Log NOX 9898442 7730645 Log SO2 26032584 5931083 286883 278980 10283 Prob gt F lt0001 Estimate Std Error tRatio Probgtt 992 lt0001 214 00366 087 03889 515 lt0001 128 02060 439 lt0001 Removing New Orleans has a large impact on the coefficients of log NOX EDUC and log S02 in particular it reverses the sign of log NOX

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "I bought an awesome study guide, which helped me get an A in my Math 34B class this quarter!"

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.