### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# Applied Regression Analysis STAT 462

Penn State

GPA 3.92

### View Full Document

## 36

## 0

## Popular in Course

## Popular in Statistics

This 0 page Class Notes was uploaded by Hilbert Denesik on Sunday November 1, 2015. The Class Notes belongs to STAT 462 at Pennsylvania State University taught by Staff in Fall. Since its upload, it has received 36 views. For similar materials see /class/233128/stat-462-pennsylvania-state-university in Statistics at Pennsylvania State University.

## Popular in Statistics

## Reviews for Applied Regression Analysis

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 11/01/15

ltCOOzmgtNlt gt20 ltgtNgtZOm zugtOz ugtOONm n 03583033 Pool of available predictorsterms from them in the data set Related to model selection are the questions What is the relative importance of different terms what is the sign and magnitude of their effect on Y what is their contributions to explaining the variability of Y Can a term be dropped should a term be added to the model because its contribution is small large When the terms are not correlated with one another answers are straightforward eg yz39 o lxu ZxZJ 8139 yz39 o lxu 5i if corx1 x2 0 then the LS coefficient and the 0 bl 2 b1 response variability explained by x1 alone 39 SSROQ I x2 SSEx2 SSEx12 x2 SSRx1 and together with x2 are the same F Chiaromonte 2 However when terms are correlated with one another things become complicated 7 LS coefficient and contribution to explaining y variability can change depending on what other terms are considered with xj SSROCJ I xha h 75 J in the model they are context dependent bj can change in magnitude and even sign depending on the presence of terms correlated to xj interpretation of bj as the average change in y when xj increases by one and all other terms are held constant becomes ambiguous in practice can you hold the other terms constant while changing xj the SSR attributable to xj can decrease but also increase depending on the presence of terms correlated to xj eg decrease if both xj and other terms are correlated with y and with one another increase if xj is not correlated with y per se but is correlated with other terms which in turn are correlated with y F Chiaromonte 3 In addition when terms are correlated with one another the sampling variance of the LS regression coefficients and therefore their standard errors 02bJ 02X 39X 1JJ 7 i 1 sebJ 7 s1 X X J39J Our estimates become less accurate When correlations among terms are ven strong the inversion of X X becomes numerically unstable det close to 0 so our estimates b X X71X IY are not just ven variable they are poorly determined increase Many different b vectors provide ven similar LS fit to the data F Chiaromonte Because both the elements in b and their se are affected if terms are correlated it is hard to use the t ratios 5 l J J sebj as indicative of importance many pvalues for individual t tests can be nonsignificant although some terms are obviously relevant and the regression is significant as a whole overall F test pvalues can change dramatically when droppingadding The regression equation is Systo1 225 131 Years 450 Weight 00381 YearsA2 00504 WeightA2 00503 YearsWeight Predictor Coef SE Coef T P Constant 2251 1155 195 0060 Years 1308 1919 068 0500 Weight 4499 3906 115 0258 YearsA2 003809 001388 274 0010 WeightA2 005042 003325 152 0139 YearsWeight 005032 003247 155 0131 S 946320 R Sg 548 R Sgadj 479 Ana1ysis of Variance Source DF SS MS F P Regression 5 357621 71524 799 0000 Residua1 Error 33 295522 8955 Total 38 653144 F Chiaromonte 5 Diagnose pairwise correlations among terms scatterplot matrix and correlation coefficients matrix Matrix Plot of x1 x2 x3 x1 Correlations xl X2 X2 0674 x3 0830 0853 x2 x3 F Chiaromonte However these diagnostics are incomplete because the real issue is the presence of linear interdependencies among the terms These can be strong even when pairwise correlations are relatively week Given the model y o 1x12 jxjg p1xp1 8 Consider each term as a linear function of all others fitting regressions of the type xla0Zazxwerri gt RfR2xjxzE j j1p 1 1 share of the variability of xj explained by a linear form in the other terms variance of regr coeff 02 9 72 X 39 X 1jj 0C 2 Vle variance in ation factor 2 1 12 Rules of thumb serious multicollinearity if 1 P VIF210 andor V1 LZV1F 210 J p1j1 J F Chiaromonte 7 manzlp 1 Simulated example yl O x11 x21 x31 81 81 iid N O l x2 x small gaussian noise i F Chiaromonte x11 2 x xii small gaussian noise The regression equation is y 0115 131 X1 0109 X2 138 X3 Predictor Coef SE Coef T P VIF Constant 01148 03523 033 0745 X1 1312 1031 127 0204 224145 X2 01093 09971 011 0913 207526 X3 13798 07323 188 0061 454378 5 101361 R Sq 944 R Sqadj 943 x1 Analysis of Variance Source DF 55 MS F P 8 Regression 3 34111 11370 110670 0000 6 Residual Error 196 2014 10 2 Total 199 36124 X 4 0 Correlations u 0 0 X1 X2 10 x3 X2 0 9 95 39 X3 0998 0998 5 I I l l I l I l 6 s 4 6 s One remedy dropping terms as needed eg in the simulated example we have The regression equation is y 0051 275 X1 123 X2 Predictor Coef SE Coef T P VIE Constant 00507 03529 014 0886 X1 27457 07001 392 0000 102060 X2 12299 07037 175 0082 102060 S 102015 R Sq 943 R Sqadj 943 Analysis of Variance Source DE SS MS F P Regression 2 34074 17037 163708 0000 Residual Error 197 2050 10 Total 199 36124 The regression equation is y 0107 396 X1 Predictor Coef SE Coef T P VIE Constant 01073 03532 030 0762 X1 396326 006965 5690 0000 1000 S 102543 R Sq 942 R Sqadj 942 Analysis of Variance Source DE SS MS F P Regression 1 34042 34042 323751 0000 Residual Error 198 2082 11 Total 199 36124 F Chiaromonte 9 More sophisticated remedies orthogonalize terms at the outset but new terms are linear combs of original ones harder to interpret fit a ridge regression Important multicollinearity does not affect prediction fitted values their sampling variability and stability are not affected it s just that very similar fitted values can be produced by very different models F Chiaromonte 1O MODEL SELECTION F Chiaromonte pool of available predictors in the data set even larger pool of terms derived from them eg powers other transformations interaction terms How do we select a good regression model Want to explain a large share of the variability of the response do so with a model that is parsimonious and easy to interpret make sure what is left unexplained does not have a systematic component eg curvature in residuals vs fitted values or included predictors suggesting omission of important terms trends in residuals vs omitted predictors suggesting they may be important F Chiaromonte 2 Quantitative Criteria to evaluate a model in p terms 2 SSRP SSEP Share of explained variability a function of the Rp SST 1 SST error sum of squares I Always increase or stays constant when adding terms to a regressionwhen do the marginal improvements become negligible SSEP n p MSEP Adjusted R2 a function of the mean square error of the regression Can decrease when adding terms that give negligible improvementswhen does it peak 2 adj p SSTOtn 1 MSTOt SSEP Mallow s criterion want small values small error p n 2P sum of squares and values near p indicative of MSE ALL small bias if the model comprising ALL terms is assumed to be unbiased ie leave no systematic components unaccounted for F Chiaromonte 3 More on Mallow s criterion If the model comprising ALL terms is unbiased the CIO is an estimate of 1 n A 1 n A n A Ey at xi2 EyiEy at xi2 Varyi J V total expected square containing a bias part and a True error of the fits variance part error vanance for CID to be an effective criterion we need the model in ALL terms to be good If the model in p terms is unbiased ECpzp By construction CPP where P1 is the largest number of terms minus intercept Many more criteria that combine ability to explainpredict and parsimony exist Information Criteria AIC BIC PRESS prediction sum of squares etc F Chiaromonte How do we use these criteria to select an effective model If we have P terms available P1 terms plus intercept we can build 2P1 models a very large number even for moderate P eg 2101024 Computer software can fit all such models and compute one or more criteria on all then report in output a few best performing models for each size p1 2 P BEST SUBSETS PROCEDURE These can be further explored using diagnostics possibly refined and evaluated for interpretability Commonly used are also STEPWISE PROCEDURES Starting from the intercept only add the most predictive term at each step Starting from all terms drop the least predictive term at each step At each step add the most predictive term given the ones already in then drop the least predictive term given the expanded collection In all procedures it is important to understand that the contribution of a term is always evaluated in the context of the other terms that are included in the model If the terms are interdependent and correlated the picture is complex F Chiaromonte 5 Example Perumtw Response Systol Predictors Age Years Weight Height Chin Forearm Calf Pulse Stat gt Regression gt Best Subsets loading the predictors as such to select a good group of predictors to use with their linear terms only Options print 2 models of each size F Chiaromonte F W H o Y e e r P 191 e i i C e C u A a g g h a a 1 Mallows g r h h i r 1 s Vars RSq RSqadj Cp S e s t t n m f e 1 272 252 87 11338 X 1 74 49 205 12784 X 2 421 389 17 10251 X X 2 331 294 71 11018 X X 3 451 404 19 10120 X X X 3 429 380 33 10323 X X X 4 479 418 22 10002 X X X X 4 458 394 35 10204 X X X 5 491 413 36 10041 X X X X X 5 482 404 41 10124 X X X X X 6 497 403 52 10131 X X X X X X 6 491 395 55 10195 X X X X X X 7 499 386 70 10270 X X X X X X X 7 497 384 71 10291 X X X X X X X 8 500 366 90 10435 X X X X X X X X 6 Correlations Age Years Weight Height Chin ForearmCalf Years 0588 Weight 0432 0481 Height 0056 0073 0450 Chin 0158 0222 0562 0008 Forearm 0055 0143 0544 0069 0638 Calf 0005 0001 0392 0003 0516 0736 Pulse 0091 0237 0312 0008 0223 0422 0209 Forearm 10 Age Forearm F Chiaromonte 7 The regression equation is Systol 503 0572 Years 135 Weight Predictor Coef SE Coef Constant 5032 1582 Years 05718 01879 Weight 13541 02672 5 102512 R Sq 421 Analysis of Variance Source DF 55 Regression 2 27483 Residual Error 36 37832 Total 38 65314 F Chiaromonte T P 318 0003 304 0004 507 0000 R Sqadj 389 MS F P 13741 1308 0000 1051 Years Weight 0 O 20 O O o 0 10 I o a o 39o O C 0 no 0 quot10 C O H 0 5 O 039 20 g 0 1390 20 30 40 5390 6390 70 8390 90 Fl39l39Sl 20 10 I o O 0 w 0 C 40 O 5 20 39 39 1i0 1i0 1amp0 140 150 Stat gt Regression gt Best Subsets loading Years Weight Years 2 Wight 2 and YearsWeight full set of terms for a 2nd order model to select a good group of terms to address curvature Options 3 model of each size It is often Claimed that one should not drop lower order terms and retain higher order ones in the same variable F Chiaromonte Y e a r W s Y e W e i W Y e a g e e i r h i a g s t g Mallows r h A A h Vars R Sq R Sqadj Cp S s t 2 2 t 1 273 254 180 11325 X 1 272 252 181 11338 X 1 08 00 374 13235 X 2 435 404 82 10124 X X 2 428 396 87 10191 X X 2 421 389 92 10251 X X 3 529 489 34 93760 X X X 3 514 473 44 95190 X X X 3 511 469 47 95536 X X X 4 541 487 45 93884 X X X X 4 529 474 53 95085 X X X X 4 516 459 63 96424 X X X X 5 548 479 60 94632 X X X X X 27 The regression equation Systol Predictor Coef Constant 225l Years l308 Weight 4499 YearsA2 003809 WeightA2 005042 YearsWeight 005032 S 946320 R Sg 54 Analysis of Variance Source DF Regression 5 3576 Residual Error 33 2955 Total 38 653i F Chiaromonte 225 l3l Years is 450 Weight 0038l Years02 00504 WeightA2 00503 YearsWeight SE Coef T P ll55 l95 0060 l9l9 068 0500 3906 ll5 0258 00l388 274 00l0 003325 l52 0l39 003247 l55 0l3l 8 R Sgadj 479 SS MS F P 2l 7l524 799 0000 22 8955 44 Years Weight O O o o I J 3 10 o 0 3 o f n 0 o 0 5 o I I o 3940 o 0 a 0 o n I I I I I I I I I I E 0 10 20 30 40 50 60 70 80 90 FWSZ Q o 10 3 0 39 o 39 o 0 5 390 o O o I 10 0 2 12390 13390 140 150 1 0 Yearsquot2 Weight Weig htquot2 YearsWeig ht Correlations Years Weight YearsAZ WeightAZ Weight 0481 YearsAZ 0939 0545 WeightA2 0499 0997 0568 YearsWeight 0979 O 620 0957 0642 1000 0 8000 o 6000 IUUU O 2000 0 I I I I I I I I I I 20 40 60 70 80 0 1000 2000 4000 6000 8000 Years Weight Yearsquot2 WeightAZ F Chiaromonte EXAMPLES FOR THE GENERAL LINEAR MODEL F Chiaromonte Salarymtw Minitab Student 12 subdirectory Study faculty salary in 1991 as a function of various predictor variables year 1991 StartYr Model 1 linear form in years and beginning salary The regression equation is 1991 10477 1922 years 142 Begin Predictor Coef SE Coef T P Constant 10477 1774 591 0000 years 192215 4987 3855 0000 Begin 142001 008058 1762 0000 S 192227 R Sq 935 R Sqadj 934 Analysis of Variance Source DF SS MS F P Regression 2 8934313979 4467156989 120893 0000 Residual Error 168 620781226 3695126 Total 170 9555095205 F Chiaromonte Scatterplot of RESI1 vs years Begin FITS1 REII 10000 5000 5000 0 in sf 5 I I l I I 00 10000 15000 20000 25000 years Begin FiIs Regress 10000 Lowss o 5000 000 Probability Plot of RESI1 Normal 95 CI I I 30000 40000 I 50000 F Chiaromonte Percent 8 I 5000 I 0 5000 I 10000 RESI1 Man 146796E12 StDeV 1911 N 171 AD 131 PValue lt0005 Model 2 full 2nd order polynomial in years and beginning salary The regression equation is 1991 22975 2940years 183Begin 324yearsA2 0000012Begin22 00150yearsBegin Predictor Coef SE Coef T P Constant 22975 12140 189 0060 years 29396 6961 422 0000 Begin 1834 1015 181 0073 yearsA2 3238 1072 302 0003 Not all second order terms are Begin22 000001208 000002184 055 0581 significant may want to drop yearsBegin 001498 002752 054 0587 some 8 150725 R Sq 961 R Sqadj 960 Analysis of Variance Source DF SS MS F P Regression 5 9180248879 1836049776 80819 0000 Residua1 Error 165 374846326 2271796 Tota1 170 9555095205 F Chiaromonte 4 Scatterplot of RESIZ vs years Begin FITSZ years Begin Fis O O Regrss 10000 LOWES O O 5000 O o 9 o O I o I 0 I o 0 o O U o O O I21 5000 a 5 10 15 20 255 00 10000 15000 20000 25000 FITSZ 0 10000 I Probability Plot of RESIZ 0 5000 Normal 95 0 CI I O nn n 39 o 39 Mean 8212046E12 039 StDev 1485 o 9939 N 171 5000 n 95 AD 6653 20000 30000 40000 50000 90 P39Va39ue 039005 so u 70 5 60 E 5039 o 4039 n 30 20 1o 5 1 o 5600 2500 6 2500 5000 7500 10600 12500 RESIZ F Chiaromonte Model 3 2nd order but dropping the square of Begin The regression equation is 1991 16663 2606 years 128 Begin 278 yearsA2 00290 yearsBegin Predictor Coef SE Coef T P Constant l6663 4125 404 0000 years 26064 3475 750 0 000 NOW all terms are highly Begin 12814 01783 719 0000 yearsA2 27813 6825 408 0000 explained variability in yearsBegil 1 002902 001058 274 0007 s 150409 R Sq 961 R Sqadj 960 Analysis of Variance Source DF SS MS F P Regression 4 9179554307 2294888577 101441 0000 Residual Error 166 375540898 2262295 Total 170 9555095205 F Chiaromonte Scatterplot of RESI3 vs years Begin FITS3 years Begin H s Regress 10000 Lowss O C 5000 I O I 0 O 0 U U C O O 5000 a 5 0 15 20 255 00 10000 15000 20000 25000 FITS3 0 10000 Probability Plot of RESI3 5000 Normal 95 CI 8 39 mp 0 W Mean 610585E12 StDev 486 9939 171 39500039 AD 699 20000 30000 40000 50000 95 Mame lt0le a m u 70 s so 2 5039 a 4039 30 F Chiaromonte I I I 5000 2500 0 I I 2500 5000 RESB I I I 7500 10000 12500 C 0 if female 1 if male Model 4 linear form in years which can differ between males and females The regression equation is 1991 18826 1145 years Predictor Coef Constant 18826 years 114460 C 2330 yearsC 2554 S 312715 R Sq Ana1ysis of Varianc Source DF Regression 3 Residua1 Error 167 Tota1 170 F Chiaromonte e Scatterplot of 1991 vs years 55000 39 50000 45000 40000 1991s 35000 30000 25000 20000 1390 1395 2390 years 25 2330 C 255 yearsC SE Coef T P 1159 1624 0000 6489 1764 0000 1625 143 0153 8577 030 0766 829 R Sqadj 826 SS MS F 7921992122 2640664041 27003 1633103083 9779060 9555095205 P 0000 The difference in intercept is mildly significant the one in slope is not Scatterplot of RESI4 vs years FITS4 20000 30000 40000 50000 years 4 Fi 15000 Regress Lowess 10000 Boxplotof RESI4 0 0 15000 0 o o o 5000 O r a w 0 s 3 r 10000 0 P 0 a i a 5000 0 Q a a 39 0 u o 0 0 5000 0 u 0 0 5 1390 1395 2390 2395 5000 6 Probability Plot of RESI4 C Normal 95 CI Mean 574418E12 StDev 3099 9939 N 171 I AD 1896 9539 PValue lt0005 90 8039 IE 70 3 8 a 40 D 3039 20 10 5 1 o F Chiaromonte 10000 5000 0 5000 10000 15000 RESI4 mmltgtzm gt mmmmmmmaz zm n 03583033 This course is about REGRESSION ANALYSIS 0 Constructing quantitative descriptions of the statistical association between y response variable and X predictor or explanatory variable on the sample data 0 Introducing models to interpret estimates and inferences on the parameters of these descriptions in relation to the underlying population log length ratio log large inst ratio MULTIPLE regression when we consider more than one predictor variable F Chiaromonte loglen ratio Fitting a line through bivariate sample data as a descrigtor 05 10 15 log large inst ratio Normal equations derivatives of obj fct Zyi quot60 IBIZXI39 i1 i1 7 n n inyi 802 1612x1392 i1 i1 i1 F Chiaromonte 20 Least squares fit find the line that minimizes the sum of squared vertical distances from the sample points y 80 81x generic equation of line y o neD2 obj fct Solution unique x w W 3 xi YY bOZJ7 b1f 3 yi b0 b1xi tted value for sample point i1n e yi 2 residual Geometric properties of the least square line i6 0 i1 e 2min i1 Ii 2 anxiei 0 i1 7 b0 b1f og ength rano og arge mat rauo F Chiaromonte 4 Simple linear regression MODEL Assume the sample data is generated as yl o lxi 81 xvi 2 Ln fixed or condition on 81i 1n random errors st E 81 0 Vi no systematic component var81 0392 Vi constant variance corg 8 0 Vi 2 j uncorrelated l The values of y given various values of X scatter about a line with constant variance and no correlations among the departures from the line carried by different observations quite simplistic but very useful in many applications Note distribution of errors is unspecified for now F Chiaromonte 5 If we assumed a bellshaped distribution for the errors which we will do later here is how the population picture would look like Hours Y Ey o x Varyl o2 coryy0 J T Em 104 4i9 521x 45 Number of Bids Prepared Interpretation of parameters B1sope change in the mean ofywhen X changes by one unit Bointercept if x0 is a meaningful value mean ofywhen x0 62error variance scatter of y about the regression line F Chiaromonte Under the assumptions of our simple linear regression model the slope and intercept of the least square line are point estimates of the population slope and intercept with the following very important properties GAUSSMARKOV THEOREM Under the conditions of the simple linear regression model b1 and b0 are unbiased estimates for 31 and BO Eb181 Ebo80 they are the most accurate smallest MSE ie variance among all unbiased estimates that can be computed as linear functions of the response values Linearity ice m y ice m n 91 i1 n i2 Zkiyi 206139 if 206139 if i1 i1 i1 71 1 71 71 0 yb1x 222 Zkix yi Zkiyi i1 i1 i1 F Chiaromonte Point estimation of error variance SSE error sum of squares 1 n 2 1 20 b0 b1xi n 2 i1 n 2 MSE for the regression line dot of SSE constraints two parameters of line Unbiased for 02 ES2 0392 Point estimation of mean response Ey at x 37x 2 90 be F Chiaromonte 8 Example Simple linear regression for y log length ratio between human and chicken DNA on X log large insertion ratio as sampled on n100 genome windows Estimates from least squares Line parameters Intercept 019210 Slope 021777 Error variance 0033 on 98 dof s Mean responses ya 041 92 063 would you trust this F Chiaromonte iog length ratio log large inst ratio Example Simple linear regression for y mortality rate due to malignant skin melanoma per 10 million people on X latitude as sampled on 49 US states State 1 Alabama 2 Arizona 3 Arkansas 4 California 5 Colorado 49 Wyoming 33 34 35 39 OU39IOU39IO MORT 219 160 170 182 149 Estimates from least sguares Line parameters Intercept 38919 Slope 5977 Error variance 36557 on 47 dof s Mean responses 930 2 20988 940 15011 F Chiaromonte 2w O 150 Mortality 100 30 40 Latitude would you trust this Maximum likelihood estimation under normaliy xi 139 2 Ln xed or condition on Assume error distribution 81439 1n iid N00392 Then yl N 0 1xi0392 and indep 1 Ln Likelihood function Llt oiazgt d21i2exp T2ltyi lt omigtgt2 7T0 i1 exp2f2yi o 1xi DZ 2 02n2 F Chiaromonte 11 max L 0 102 obj fct 3023152 Solution unique n N 209 xyl y 1 2 i1 n 2 20 x gt same as from least square fit 70 y 91 J n n 2 52 1 yib0b1xi2 Sz n i1 n F Chiaromonte Some remarks a strong statistical association between y and X does not automatically imply causation eg a functional relationship of linear form X can proxy the real causing variable perhaps is a spurious fashion ln observational studies x likewise y is not controlled by the researchers we condition on the observed values of x In experimental studies x is controlled we can consider the values of x as fixed although assignment of x levels to units may be arranged at random in an experimental design Experimental design facilitates causality assessment Extrapolating a statistical association eg a regression line outside the range of the data on which it was fitted is dangerous we don t really know how the association would be shaped where we didn t get to look With an experimental design we can make sure we cover the range that is of interest F Chiaromonte 13 INFERENCE ON REGRESSION COEFFICIENTS F Chiaromonte NORMAL SIMPLE LINEAR REGRESSION MODEL yi o 1 xi 8i xvi 2 Ln fixed or condition on 81439 1n random errors st 81 N0O 2Vi independent Under this scenario we consider inference standard errors confidence interval and testing for B1 slope B0 intercept F Chiaromonte INFERENCE FOR THE SLOPE iQr WrW EXWJMn b1 2 n n Z kiyi 206139 if 206139 if i1 i1 i1 x f h7L J xMAAa 206139 x2 j1 It follows that 2 1 N N 1 O Emez F Chiaromonte vAarltblgtns seltb1gt S 09 f2 ice 402 and b1 l seb1 quot 2 basis for confidence interval and tests F Chiaromonte 4 For instance 1o level Confidence Interval for the slope 91 i tn21 05 2seb1 Testinq Ho 810 vs Ha Bl Test statistic b t 1 tn2 under Ho 5611 39tobs tobs Pvalue area in the two tais F Chiaromonte INFERENCE FOR THE INTERCEPT n 1 n n 0 y b1x ZZyi Zkix yi Zkiyi i1 i1 i1 21 fxi4 0f yiNlt o lxpazgt Zoe m2 j1 It follows that g 0 N 002 l 1 1 2 x1 if J1 F Chiaromonte 2 2 varbo52 1 n x 1 x n 202 fgt2 Zoe 2 c2 F1 F1 b0 0 basis for confidence interval and tests F Chiaromonte 7 For instance 10i level Confidence Interval for the intercept Testinq Ho 890 vs Ha 89750 Test statistic b t 0 tn2 under H0 5610 F Chiaromonte b0 i zn21 a2seb0 tobs Pvalue area in the two tais SCORESMTW data set Regression Analysis Second versus First The regression equation is Second 225 0755 Fi Predictor Coe Constant 22439 First 07546 s 115131 R Sq 494 R Sqadj 477 F39tted L39 Plot Secondl 224nf07546 First 39 Standard errors for the 100 s 115131 estimates of intercept and slope O Rqadj Observed values of the t tests statistics coefficient over se Second I O Corresponding pvalues for Ho BO vs Ha 0 obtained under a T distribution with dofn2 40 50 6390 70 8b 90 100 F Chiaromonte 9 95 confidence intervals in Minitab Calc gt Probability Distributions gt t Inverse cumulative probability do not worry about the non centrality parameter it is O as by default Degrees of freedom at n2 312 29 lnput constant at 10025O975 Inverse Cumulative Distribution Function Student39s t distribution with 29 DF P X lt x x 0975 204523 90 itn21 0025seb0 2247 i2045gtlt1022 b1 itn21 O025seb1 0755i2045gtlt 0142 F Chiaromonte 10 Remarks lf errors and hence yvalues at given X s depart from normality the student T s are not rigorously speaking the right reference distributions to use in inference But some departure is tolerated and if n is large asymptotic normality holds Interpretation of inferences confidence intervals pvalues is conditional on the X levels The spread of the X levels affects the standard errors of slope and intercept estimates F Chiaromonte 11

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "I made $350 in just two days after posting my first study guide."

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.