Applied Regression Analysis
Applied Regression Analysis STAT 51200
Popular in Course
Popular in Statistics
This 9 page Class Notes was uploaded by Bailey Macejkovic on Saturday September 19, 2015. The Class Notes belongs to STAT 51200 at Purdue University taught by Staff in Fall. Since its upload, it has received 18 views. For similar materials see /class/207947/stat-51200-purdue-university in Statistics at Purdue University.
Reviews for Applied Regression Analysis
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 09/19/15
Stat 512 Topic 1 Topic Overview This topic we will cover Course Overview amp Policies SAS KNNL Chapter 1 much should be review 7 Simple linear regression KNNL Chapter 2 7 Inference in simple linear regression Prediction intervals and Con dence bands ANOVA tables General Linear Test Class website httpweh ics mlrdne edn 39 5 2 Class policies Refer to handout m We will cover simple linear regression SLR Chapters 1 7 5 multiple regression MLR Chapters 6 7 ll analysis of variance AN OVA Chapters 16 7 25 The emphasis will be placed on using selected practical tools using SAS rather than on the mathematical manipulations We want to understand the theory so that we can apply it appropriately Some of the material on SLR will be review but our goal with SLR is to be able to generalize the methods to MLR References Text Applied Linear Statistical Models 4Lh ed by Neter Kutner Nachtsheim and Wasserman KNNL SAS System for Regression by Freund and Little SASSTAT User s Guide Vol 1 and 2 SAS System for Elementary Analysis by Schlotzhauer SAS Help menus SAS Getting Help with SAS Statistical Consulting Service Math B5 Hours 104 M through F httpwww stat nnrdne J quot39 TopicUl doc 82205 1134 AM 1 of 24 MW Room help with SAS Excel for multiple Stat courses Hours 77 9 M through Th starting second week of classes staffed with graduate student TA s SAS SAS is the program we will use to perform data analysis for this class I will often give examples from SAS in class The commands are meant to be all together as one program but it will be easier to understand if I show each command followed by its output The programs will be available for you to download from the website I will use the following font convention in the lecture notes SAS input will look like this Courier New SAS output will look like this SAS Monospace I will usually have to edit the output somewhat to get it to fit on the page of notes My own comments will be in regular Timesprintout or AriaIlecture fonts like this Let me know if you get confused about what is input output or my comments You should run the SAS programs yourself to see the real output and experiment with changing the commands to learn how they work I will tell you the names of all SAS files I use in these notes Ifthe notes differ from the SAS file take the SAS file to be correct since there may be cutandpaste errors There is a tutorial in SAS to help you get started Help 9 Getting Started with SAS Software You should spend some time before next week getting comfortable with SAS see HW 0 For today don t worry about the detailed syntax of the commands Just try to get a sense of what is going on ExamQIe Price Analysis for Diamond Rings in Singapore Variables response variable price in Singapore dollars Y explanatory variable weight of diamond in carats X Goals create a scatterplot of the data fit a regression line predict the price of a sale for a 043 carat diamond ring SAS Data Ste file diamondsas on website One way to input data in SAS is to just type or paste it in In this case we have a sequence of ordered pairs weight price TopicOldoc 82205 I 134 AM 2 of 24 data diamonds input weight price cards 17 355 16 328 17 350 18 325 25 642 16 342 15 322 19 485 21 483 15 323 18 462 28 823 16 336 20 498 23 595 29 860 12 223 26 663 25 750 27 720 18 468 16 345 17 352 16 332 17 353 18 438 17 318 18 419 17 346 15 315 17 350 32 918 32 919 15 298 16 339 16 338 23 595 23 553 17 345 33 945 25 655 35 1086 18 443 25 678 25 675 15 287 26 693 15 316 43 data diamonds1 set diamonds if price he S ntaX Notes Each line must end with a semicolon There is no output from this statement but information does appear in the log window Often you will obtain data from an existing SAS le or import it from another le such as a spreadsheet Examples showing how to do this will come later SAS Proc Print Now we want to see what the data look like proc print datadiamonds run Obs weight price 1 017 355 2 016 328 3 017 350 47 026 693 48 015 316 49 043 SAS Proc GElot We want to plot the data as a scatterplot using circles to represent data points and adding a smoothing curve to see if it looks linear The symbol statement Vcircle V stands for Value lets us do this The symbol statement ism70 will add a smooth line using splines interpolation smooth These are options which stay on until you turn them off In order for the smoothing to work properly we need to sort the data by the X variable proc sort datadiamonds1 by weight symbo11 vcirc1e ism70 tit1e1 39Diamond Ring Price Study39 tit1e2 39Scatter plot of Price vs Weight with Smoothing Curve39 axis1 1abe139Weight Carats39 axis2 1abe1ang1e90 39Price Singapore 39 proc gplot datadiamonds1 p1ot priceweight haxisaxis1 vaxisaxis2 TopicUl doc 82205 1134 AM 3 of 24 run Diamond Ring Price Study Sca39lter plat of Price vs Weight wih SmalhiIg Curve 1C00 Price Singapore as a O 010 06 020 025 030 035 Weight Carats Now we want to use simple linear regression to t a line through the data We use the symbol option irl meaning interpolation regression line that s an L not a one symboll vcircle irl title2 39Scatter plot of Price vs Weight with Regression Line39 proc gplot datadiamondsl plot priceweight haxisaxisl vaxisaxis2 run Diamond Ring Prioe Study Sm er plat of Price vs Weight wi39ih HeglESiDn Line 13900 1030 Price Singapore 3 D 010 06 020 025 030 035 Weight Carats SAS Proc Reg We use proc reg regression to estimate a regression line and calculate predictors and residuals from the straight line We tell it what the data are what the model is and what options we want TopicOldoc 82205 1 134 AM 4 of 24 proc reg datadiamonds model priceweightp r output outdiag ppred rresid id weight run Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr gt F Model 1 2098596 2098596 206999 lt0001 Error 46 46636 101381886 Corrected Total 47 2145232 Root MSE 3184052 RSquare 09783 Dependent Mean 50008333 Adj RSq 09778 Coeff Var 636704 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr gt t Intercept 1 25962591 1731886 1499 lt0001 weight 1 372102485 8178588 4550 lt0001 proc print datadiag run Output Statistics Dep Var Predicted Std Error Std Error Obs weight price Value Mean Predict Residual Residual 1 017 3550000 372 9483 53786 179483 31 383 2 016 3280000 335 7381 58454 77381 31 299 3 017 3500000 372 9483 53786 229483 31 383 4 018 3250000 4101586 50028 851586 31 445 5 025 6420000 670 6303 59307 286303 31 283 46 015 2870000 298 5278 63833 115278 31194 47 026 6930000 707 8406 64787 148406 31174 48 015 3160000 2985278 63833 174722 31194 49 043 1340 190332 Simple Linear Regression Why Use It Descriptive purposes causeeffect relationships Control often of cost Prediction of outcomes TopicU doc 82205 1 1 34 AlM Sof24 Data for Simple Linear Regression 7 Observe i l 2 n pairs ofvariables explanatory response 7 Each pair o en called a case or a data point 7 Y1 ith response variable 7 X 7 13911 explanatory variable 1 Simple Linear Regression Model Yl 60 BlXi i fori l 2 n Simple Linear Regression Model Parameters 7 3D is the intercept 7 31 is the slope 7 e are independent normally distributed mndom errors With mean 0 and variance 2 ie 2115 N 0 62 Features of Simple Linear Regression Model 7 Individual Observations If 3n 31Xx e 7 Since a are random Yx are also random and EK u 1XE x u 1X 7 Varlf0VarecsZ 7 Since 2 is Normally distributed Yx 7 N03 31X39162 See A36 p1319 Attempts 1 New r1 275 Fitted Regression Eguation and Residuals We must estimate the parameters 3 3162 from the data The estimates are denoted b blsz These give us the fitted or estimated regression line 1 bu le Where Topiwldoc 82205 1154 AM 6 of 2 4 b0 is the estimated intercept b1 is the estimated slope is the estimated mean for Y when the predictor is X I ie the point on the tted line I el is the residual for the I39m case the vertical distance from the data point to the tted regression line Note that ei Yi l Y b0 b1Xi Using SAS to plot the residuals Diamond Example When we called PROC REG earlier we assigned the residuals to the name resid and placed them in a new data set called diag We now plot them vs X symboll vcircle iNONE title2 colorblue 39Residual Plot39 axis2 labelangle90 39Residual39 proc gplot datadiag plot residweight haxisaxisl vaxisaxis2 vref0 where price he run Diamond Ring Price Study Residual Plat 80 60 40 a 20 n a E 0 5 u a a 20 g a 40 60 quot 80 100 010 015 020 025 030 035 Weight Carais Notice there does not appear to be any obVious pattern in the residuals We ll talk a lot more about diagnostics later but for now you should know that looking at residual plots is an important way to check assumptions Least Squares want to nd best estimates b0 and b1 will minimize the sum ofthe squared residuals Zn e2 b0 lei 2 i1 i1 TopicUl doc 82205 1134 AM 7 of 24 use calculus take derivative with respect to b0 and with respect to b1 and then set the two resulting equations equal to zero and solve for b0 and b1 see NKNW pgs 1920 Least Squares Solution These are the best estimates for Bl and Bo b 2ltX igtltn Y39gt ss 1 XXX X2 SSX 0 f b1 These are also maximum likelihood estimators see NKNW pp 3035 This estimate is the best because it is unbiased and minimum variance Maximum Likelihood Y N o 1Xcz Kiwmf U 1 1 7 f e 2116 L f1 likelihood function Find values for 50 and 51 which maximize L These are the SAME as the least squares estimators b0 and b1 Estimation of 0392 We also need to estimate 62 with 32 We use the sum ofthe squared residuals SSE divided by the degrees of freedom n2 A 2 2 ZYi Yquot Zeiz n 2 n 2 MSE E s J Root MSE where SSE 26 is the sum of squared residuals or errors and MSE stands for mean squared error There will be other estimated variances for other quantities and these will be denoted s2other quantity eg s2b1 Without any s2 refers to the value above that is the estimated variance of the residuals Identifving these things in the SAS Output Analysis of Variance Sum of Mean TopicOldoc 82205 1 134 AM 8 of 24
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'