Introduction to Statistics
Introduction to Statistics STAT 201
Popular in Course
Popular in Statistics
verified elite notetaker
This 18 page Class Notes was uploaded by Kamren McLaughlin on Monday October 26, 2015. The Class Notes belongs to STAT 201 at University of Tennessee - Knoxville taught by Ramon Leon in Fall. Since its upload, it has received 11 views. For similar materials see /class/229890/stat-201-university-of-tennessee-knoxville in Statistics at University of Tennessee - Knoxville.
Reviews for Introduction to Statistics
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 10/26/15
Unit 5 Simple Linear Regression and Correlation Statistics 201 Introductory Statistics Ramon V Leon 4212008 Lecture 5 Ram n V Le n Introductory Remarks Regression analysis is a method for studying the relationship between two or more numerical variables In regression analysis one of the variables is regarded as a response outcome or dependent variable The other variables are regarded as predictor explanatory or independent variables An empirical model approximately captures the main features of the relationship between the response variable and the key predictor variables Sometimes it is not clear which of two variable should be the response In this case correlation analysis is used In this unit we consider only two variables 4212008 Lecture 5 Ram n V Le n A Probabilistic Model for Simple Linear Regression Note that the Yi are independent N M o lyq 0392 rV s thY o x t l Constant Notice that the v7 m varlablhty predictor 39 variable IS about the regarded as M lt1 r g ressmn nonrandom line because it is p assumed to be H xl x i set by the I I investigator Figure m Simple 1 mm Regmm wch 4212008 Lecture 5 Ramo39nV Leon 3 Re ections on the Method of Tire Data Collection Experimental Design Mileage Depth 0 9433 Was one tire or nine used in the 4 3295 experlment Read the book 8 291 o What effect would this answer have 12 25517 7 on the experimental error 16 22933 7 on our ability to generalize to all tires of this 20 20483 24 179 0 Was the data collected on one or several 28 16383 cars Does it matter 32 15033 Was the data collected in random order or 7 in the order given Does it matter Mileage in 1000 miles Is there a poss1ble confounding problem Depth in mils 4212008 Lecture 5 Ramo39nV Leon 4 Example 101 Tread Wear vs Mileage USng the iastse mglmrmns m magi Emmi Fit Y by X mm new UK platform in the Analyze Ll menu Fveq 7 By W 7 7 4ZlZEIEI8 Scatter Plot Tread Wear vs Mileage V V Hiuarimn Fit nf ru v Show Points Fit Mean u u 10 m an Fir L M es e in1ooo Miss Fit Polynomial 421zun8 Lecture 5 r Rmng Lenin Least Squares Line Fit i465 39 Parameter Estimates Term Es39tlmste Std Errur tRailn Probgtt Intercept 3606386 1165355 3085 ltDIJIJ1 Mileage in1nuu Miles 250525 5513772 4155 lt0EE1 350 Grove Derh in mils u n Ln 0 m a T f 1 39 A A 1 U 10 20 30 40 Mileage in1DEIU Miles y o l x 539 Linear Fit Grove Depth in mils 3555355 7255525 Mileage in 1555 Miles 4212008 Lecture 5 Ramo39nV Leo39n 7 Illustration Least Squares Line Fit 7 rial linc39 t 175 7 7 55 The least squares line t is 300 the line that Rwdm 100 WL i 0 4 8 l7 16 20 Z4 23 32 35 segments Mileage m WHO miles 4212008 Lecture 5 Ramo39nV Leo39n 8 Mathematics of Least Squares Find the line ie values of n and that minimizes the sum of the squared deviations QZyr5n iz How Solve for values of n and A for which aQ 0 and aQ 7 0 a n a Azmnnx ucmsrmnv 1m 9 Predict Mean Grove Depth for a Given Mi eage Forx 25 25000 miles the predicted y y 3U 5425 360647 7281gtlt 25 17862 mils 9 32 Isuaa i i n i 25 i A u m Mueage i I v Line Dr H a me WWW 32 15B 33 127 555557 Emma Euwes indw u 25 mum MW Azmnnz Lemmian 1m in Goodness of Fit of the LS Line Fitted values ofy 7 5 1x i12n Residuals e y if y A 1x segment length W maimed Resimels mu m u m Mueagef LWED39FV cm md F x v u uwes m a r Euwes mm 2 w am gt Save Predicted gt Save Residuais Auxmm LemxeL Rama39nV Lea n Coef cient of Determination ss39r SSR SSE SSR r leg proportion of the variation iny SST that is accounted for by regression on x V Summag nl Fit stave 055261 stare Adi 963 Ram Mean Square Em1r 1921 m2 2441 457 earl nf Respmsa Observations ov am W218 Auxmm LemxeL Rama39nV Lea n Estimation of 62 28 ZUr f Z 11 11 n72 n72 7 in may v Analysls awariance I V Summaly of Fl Suume DF Sum o1 Squares Mean Square F Rama Esquem 035251 Mme 1 SUEIZUD 503872 1401095 quue e Aw 34584 Sam 73 3m 5 F Ructt Mean Square Error 19 111702 Mean 039 Respense 2441437 Observemuns or Sum W915 a This estimate has n2 degrees of freedom because two unknown parameters are estimate 421zun8 Lecture 5 e Ramen V Leen 13 Statistical Inference for 50 and 51 378 N01 and 3378 N01 Dltm 39 mm s M 1H and 1H SEQ3 SE09quot 1001ea c1 78 t a s 7728117Vm gtlt0614 7728 2365gtlt 0614 7873175829 twizvaIZSE n36063t7vnn25gtlt1169 V Parameter Estimates Term Es mete Mereepi 35D53667 11 55355 Mileage In 1000 MIES 4250825 0613772 std Error 421zun8 Lecture 5 e Ramen V Leen v r We v Mums V N we v alum v v Rena Pmmm Lawev95 u v Lawevaaz nee m v uppeysa we IRatbo Fromm aner35 Upper 5 332 99754 3352757 4155 3001 5 731964 5325255 Test of Hypothesis for B0 and B1 Hu30vsHqae0 Reject HE at 2 level ifltl A 10 Mesaa 55 55 r L 71186 rims 2355 es Estimate std am 3557 M sauss t Rana pionsm au as s uuut Yeml Mel22m 3505 MileagsnntuuuMiles 77250525 umm 4155 suuut mmum Prediction of a Future y or Its Mean For a xed value of t are we trying to predict the average value of v the value ofafuture observation ofy Do 1 want to predict the average selling price of all 4000 square feet houses in my neighborhood Or do 1 want to predict the particular luture selling price ofmy 4000 square feet house Which prediction is subject to the rnost erro mmum Lecture 5 sxmv Lauri 15 Prediction of a Future y or Its Mean Prediction Interval or Confidence Interval LS line i LS lmc i f 0 l quot Figuru 105 How 39rmllilcncu lnlcrvzlls and FlCLllL le lulurvulx Vary Vlll 39 In 39miliilcncc inlcrvnl 1h Prudictlon mlewzll For a xed value of x are we trying to predict the average value of y the value of a future observation of y 4212008 Lecture 5 Ramt39m V Let39m Grove Death r in milsl IMP Prediction of the Mean of y Canl Curves lndiv Line Eolov Line Style Line Width gt Save Predicteds Save Residuals Plot Residuals Set Alpha Level gt Remove Fit ineer Fit l vv 4212008 w o o I 250 200 Grove Depth in mils 150 l 10 20 30 Mileage in 1000 Miles Lecture 5 Ramt39m V Let39m HVIP Prediction of a Future Value of y on 400 350 350 v Line of Fit Confid Curves Fil soo 39 a on I ZSO c o 39 Line 5er g 200 39 Line Width gt 99 Save Predicted 150 39 Save Fiesi uals F lot Residuals 199 SetAlpha Level gt 0 0 10 20 30 40 Remove Fit Mileage in 1000 Miles inear Flt l 4212008 Lecture 5 Ramo n V Leo39n 19 Formulas for Confidence and Prediction Intervals A 1000 00 CI for u39 is given by A 1 x i 1 x39 iv A tn2 zs 5 u 5 u n2aZV 1018 n Squot n S x where L39 2 50 Emquot and s MSE is the estimate of or A 10017a PI for Y is given by 1 z 1 2 1 n zzs 1 X X 5 Y39 5 Y 2zms 1 x X n SM n 5x x 1019 where Y 90 1x Prediction Interval of Cha ter 7 llll 7 00 PI l ora future observation X NUL 7 is given by 1 V l v 7 IHlxzv 1 i S X 5 x WW l 4 7 r V 71 4212008 Lecture 5 Ramo39n v Leo39n 20 Confidence and Prediction Intervals with IMP Using the I In Model I I Platfonn I not the Flt RunMade Y by X l l platform Wang M man W V Rvsidual 3911 PI lndw cmtudamlmmm AmiInna Lecture 5 rRamn39nV Lenin 21 CI for Mean for p Predickion Formula Pledided Values Indw Eonll ence Interval Prediction Interval for Y Fyedmhan mm 2 c edV 1 s P1 m A sue Residua s uwev 35 1an 40mm Lecture 5 7 mm mm 23 Prediction for the Mean of Y or a Future Observation of Y 0 Point estimate prediction is the same in both cases 1 62 0 But the error bands are different 7 Narrower for the mean on 15873 19851 7 Wider for a future value on 12944 22780 40mm Lecture 5 7 mm mm 24 Residuals and the predicted value of y Regression Diagnostics The residuals s y 79 can be viewed as the le over 2 can be viewed as the estimates ofthe random eno s s s Y As sueh the residuals are mtai for checking the model assumptions Residual plots Ifthe model is correct the residuals shouldhave ho structure Lhatis they shouldlook random oh Lheirplots mimnx Lemms mmv mm zo Basic Principle Underlying Residual Plots If the assumed model is correct then the residuals should be randomly scattered around 0 and should show no obvious systematic pattern manna 1mm srhmnv 1m 27 In Ht Y by X platform In Ht Model platform l 39 e v i Fv Lmaurm V 1333 rmquot emh In m s Em cum Fquot Event Smeenin lt Em dtuwaslndw 57 F F Y E Llnatalm WW 6 n9 LmaSM LineWMh ave F39y Sad gtSave Sandals Plat Rumnah Mu age in mm Mugs E u Re siduals in IMP Amznnx Lecture shmw 1m Checking for Linean39ty x mammal Shomd EY n WWW be tted rather than E 7 Y gal 3 asn 33 E n iaa E nn 5 gm gm gm g u a sn x a n m In a a an 4 MW n mammes mm m mm was manna Lecmsrmanv Lea n 29 v by x unlaxmal Predlcted nutmm a v mm xmma waaeymmm b seemcawmmr 691 Swaacammna ahs Man y BMWrsuavmoauwasl m s p anxa ancsxauawsmvsoapm ux em a M ReSIdual veD pm a V am Epvemmeuameoem plot w ua p r 5 x Fania mmch va2 Damn Remove QII 39 E 3n E 5 5m 39 51 2n 3 a g a m m g m o D 5 U 39 a g a m grm r 7 a a an 1 Mauveomcm W5 w w w w D II 15D Inn 25 Sun 35m 3 Pvemueu owe Dem m mus What are the advantages othxs plow u Auxmax newsman nv Lea n z Tread Wear Quadratic Mod el 39 Bivariale H D Hv mm m m 1000 Miles 39 350 7 K 2 300 Table He FilY hyX gm 6 v Biu riale H1 at rnve De th m e v Show Points w g 200 e F Mean om Fit Line 39 I I 1 Fit Sueclal 300m D 10 20 30 4 Mllaags In 1 mm was 39 F0lynomial rm Denree2 l v e2 an Demh i 233062 1200525 Mtlwgz n1000 Mlle 0 men Mneage m 1000 mewsyz 4212008 Lecture 5 rRamo39n V Leo39n 31 Residuals for Residuals have a random pattern Quadratic Model Residuals The gt Save Predicted Fit Y by X Seveleduals 1 f Flnt esiduals P a 0H 53 Npha Lew was used to get these plots V Pnl nomlalJt De ree2 Le plot can 393 g a s 5 5 also be obtained 3 u B using the i 5 Plot 75 g 5 Residuals 395 e option 0 cl 1 0 2390 2 39 1 25 240 n 4 We 0 WW W s FIBMed Grave Depth m mas 4212008 Lecture 5 eRamo39n v Leo39n 32 Tread Wear Quadratic Model 393 Bioariate Fit of Grove Depth in mils By Mileage in 1555 Miles 1 l V Polynomial Fit Degree2 I 1 Polynomial Fit Degree2 Grove Depth in rnilsji 34233552 2255525 Mileage in 1555 Miles o11a13 Mileage in moo Miles15312 quot WWW l R2 was 95261 for RSquare 5555553 RSquare Adj 0994 model hnear 111 X Root Mean Square Error 5555535 Mean of Response 2441452 Observations or Sum Wgtsj 5 5 Analysis of39tfarianee Source 5F Sum of Squares Mean Square F Ratio Model 2 53255421 255542 2525253 Error 5 255255 345 Prob r F C Total 5 53415235 55551 239 Parameter Estimates Term Estimate Std Error t Ratio Probbltl Intercept 34233552 4252115 5523 55551 Mileage in 1555 Miles 2255525 5155554 3525 5551 Mileage in 1555 Miles 15 51215123 5521532 815 55552 4212008 Lecture 5 Ramon V Leon 33 Checking for Constant Variance Plot of Predicted vs Residual ell If assumption is incorrect often VarY is some function of E Y u Figure 107 Plots of Residuals e vs 51 Corresponding to Different Functional Relationships Between VarY and E Y a Constant variance b Variance proportional to p c Variance proportional to 1 4212008 Lecture 5 Ramon V Leon 34 l7 Other Model Diagnostics Based on Residuals Checking for normality of errors Do a normal plot of the residuals Warning Don t plot the response y on a normal plot 7 This plot has no meaning when one has regressors 7 Don t transform the data on the basis of this plot 7 Many students make this mistake in their project Don t be one of them 4212008 Lecture 5 Ram n V Le n Data Transformations If the functional relationship between x and y is known it may be possible to nd a linearizing transformation analytically Fory ax g take log on both sides log y loga log x Fory 056 take log on both sides logy loga x If we are tting an empirical model then a linearizing transformation can be found by trial and error using the scatter plot as a guide 4212008 Lecture 5 Ram n V Le n 36