INTRODUCTION TO REGRESSION AND STATISTICAL COMPUTING
INTRODUCTION TO REGRESSION AND STATISTICAL COMPUTING STAT 410
Popular in Course
Popular in Statistics
This 49 page Class Notes was uploaded by Miss Sabina Grimes on Monday October 19, 2015. The Class Notes belongs to STAT 410 at Rice University taught by Staff in Fall. Since its upload, it has received 22 views. For similar materials see /class/225039/stat-410-rice-university in Statistics at Rice University.
Reviews for INTRODUCTION TO REGRESSION AND STATISTICAL COMPUTING
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 10/19/15
Fitting Polynomial Regression to Int l Airline Traf c David W Scott Dept Statistics Rice U Houston Texas Stat 410 October 4 2005 httpWWWstatriceeduscottdw Intem ational Airlin e Traf c 250 U035 3moamq A c m m T e m N a n w a m w m U035 3m cam N c m m T e m N a n w a m w m U035 3m cam w c m m T e m N a n w a m w m U035 3m cam A c m m T e m N a n w a m w m U035 3m cam m c m m T e m N a n w a m w m e c m m T N M polyn mial order 1 m o E E 2 E polyn mial order 2 m o E E 2 E polyn mial order 3 m o E E 2 E polyn mial order 4 m o E E 2 E polyn mial order 5 m o E E 2 E Stat 410 Properties of a Regression Line Dr D Scott September 1 2005 Y 2 b0 le regression prediction but if no data collected around X m Ob0 Re centering 0 le j bl 37 0b1Xb1X X but b0 2 Y le so that YYb1X X Notes The point KY is on the regression line If X is 1 unit more than X then Y is b1 units more than Y Here are the maximum likelihood estimates of the variance covariance and correlation 1 1 vara 132 512 n i n i 1 1 cov 20m xy y 232er any i i COMM yr c0rxrpyz m The correlation coefficient p is dimension less and satisfies 1 g p g 1 Properties of the residuals and predictions Z ZYL Yz i1 221 n EGG bo lez i1 ZnY nbO nle nY nY b1X nb1X 0 Hence 2 ZY same average Less obvious er and Xi are uncorrelated 1 Z z sz X n 1 Z einL 6X n 1 Z eiXi smce e 0 TL continuing 1 g 00 b0 b1X X 1 1 EZXin boX gblzXf 1 1 XiY Y b1XX b1 Xi2 TL TL 1 1 ZXZgtY YX b1X2 b1XE TL TL 1 COMMWL b1ltgZX 2 X2 COMM yz b1 WNW 0 since bl covxiyivarazi What is the big deal If the two quantities X and Y are uncorrelated then their covari ance is also 0 and hence so is b1 Thus the best linear predictor is YYb1X XY Finally e and Y are uncorrelated 1 A x 1 A 1 g 2613 53 g 623 EZOB39LbO blez X i 1 bOb1 ZeiX O n since 5 O and ex uncorrelated Family Regression Examples Dr Scott August 29 2005 The multivariate normal distribution has a number of Closed form expressions of inter est We ll not pursue them all here but for your interest here is one of the more power ful results Cf Stat 541 Multivariate Stats In class we will generally assume that X is chosen then Y measured Here we assume both X and Y are random from a multi variate normal distribution NuZ where z nay M l27er 12expl z mix1o lo More explicitly fz f13y N NMZ can be decomposed as follows M W and Z Z Z My Zyzv 92 Here the marginal densities are also normal N Nanzxx and fC J N NUbyaZyy Now it is always the case that f2 fxy fxfylx What is fy13 It is an amazing algebraic fact that N NU ypm yym where Myx My Zyngag x 23131410 Z 93 yngxlzxy The simple linear regression case corresponds to both 1 and y one dimensional where so that 2319410 Z Z2yy Zyngagzgy 1 0g paxay 0x paxaz 1 p Uy H q As Pearson discovered the correlation be tween any pair of parentchild heights is ex actly p 12 The same is true of siblings The correlation between mother and father is approximately p 14 The standard deviation of height is about 25 03 If you know a parent s height then 1 p2 34 and am 03 ll p2 03 0866 which is a 14 reduction in uncertainty How much can you improve your prediction if you know both parents heights Well Z NIHNH l MII l l lgti l 39 lll 39lll Using the formula for Zyym 2 2E aylx a or aylx ay 077 a 23 reduction in uncertainty If you add information about the heights of your siblings to your parents that will also improve your prediction accuracy That in cludes your younger siblings In standard units a variance of 1 becomes 075 with one parent 06 with two parents 05833 with two parents and one sibling 05713 with two parents and two siblings 05625 with two parents and three siblings You can also use the children s heights to predict your parents heights Thus you can choose to predict things that are clearly not causaL It is an easily forgotten lesson but correla tion does not prove causation However lack of correlation does support the lack of causation at least in a linear sense Visualizing Correlations in 5 Dr Scott Stat 410 October 13 2005 For our model YX e m ih 333 2 etc 2 Y XmTo X5 WY 2 tXtY BtXtXB The least squares coefficient solves o v SS 2 g 2Xty 2XtX5 gt B XtX1XtY Hessian VVtSS 2XtX which is pos def gt minimizer Statistical Accuracy View E8 XtX1XtEY XtX 1XtX g B 00228 CovAY where A XtX1Xt A 031 At 03XtX 1XtXXtX 1 0 XtXr1 Cute Example o 31 VamBk Vam o 010 o i Bk 819 1 2 Vame AY ageZAAtek 062 6XtX1ek a XtXk1 as we have seen before Familiar Example p 1 1 YX e with X 1 Thus 8 XtXr1 E y n 1 3 025 023J 03XtX1 162 TL as usuaL Looking at the Criterion n m in ete 2 yr m2 see sketch i1 Taylor s series 93 98 Bg 8 BQg 8 9 2m 3 93 2m 332 g w 2 2m 2nJ 2 9 O g 2n g B 2 Therefore 9 9Bon 32 Note 292 m2 292 B B m2 Z yi B 28 mm B B m2 2m a o MB B exactlyu Dual n 100 and n 400 see sketches Tentative conclusion steeper Criterion gt more accurate parameters Multivariate 3 93 ete Y 2 X3 I e B NB0 2XtX1 Multivariate Taylor s series 9 9B BtV9B gw va B 9 WY mixty BtXtXB ng g 2XtY 2XtX Hessian is V of Vtg so Vtgw 2YtX 2ampth and thm g 2XtX For our least squares problem A 1 A A M 93 o 5 mt2ltXtXgtltr3 r3 WU HY r3 B XtX B see sketch Facts about Positive Definite Matrices A XtX symmetric Look at the quadratic form t t t t gt yAy yXJSUy ww0 Vy Look at eigenvalueseigenvectors A UkZAk Uk k21p and vzvkzl 22222620 k7 Assume A1 gt A2 gt gt Ap Consider Z A 11k 112 0 Wk Ak 02 M gt 0 so in fact all the M gt O Definitions A symmetric matrix A is pd if all M gt O A is psd if all M 2 O A is nd if all M lt O A is nsd if all M g 0 A is indefinite if some M gt O amp some M lt O Level Sets 9 98 B BY A B 8 Find values of8 satSfthg B BtA Bc see sketch a1 0 0 Suppose A 9 Q 9 O O ap Byinspchon14ek akekSO agenvecton are the cocwdinate axes Levelsets saUsfy t p 2 p 9 yAyZakykZl 6 k1 k1 k m chisan eMpse see sketch Note A Uk Akvk gt Uk 2 AkA1 Uk gt Uk 2 14 11 so the eigenvectors of A and A 1 are the same while the eigenvalues are reciprocal of each other Next find point y on the level set in the direction 221 thus y has the form owl ngy c 042 vi AU 2 c 1 so that 2 C T1 Since A1 is the largest eigenvalue this is the shortest axis of the ellipse Also changes in 3 in that direction give the quickest increase 04 in the criterion function BUT most accurate in that direction In general points yk 2 04k 22k on the level set satisfy 0 ainAvkzc gt 0 M C yk ill Uk Ak In ERQ see sketch of 9 about 98 Recall that 028 2 0 XtXVl Thus Varth 2 wt 062 XtX1 111 Look in the direction 111 2 22k a2 062 vi XtX1 22k 6 Ak therefore 06 stdwt 2 VM see sketch or get same result by recalling r3 8 Mo 03XtX 1 has level sets 3 mix 1w B c Look in the direction 3 2 04k 22k then 1 04 vi 062XtX 1 22k c 04 t t X UkC 06 k SO 2 2 0396 C C 04k 2 gt 0 06 M M see sketch note the same orientation in the end THE END Well now for the computer demos
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'