### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# Applied Regression Analysis STAT 51200

Purdue

GPA 3.63

### View Full Document

## 35

## 0

## Popular in Course

## Popular in Statistics

This 46 page Class Notes was uploaded by Bailey Macejkovic on Saturday September 19, 2015. The Class Notes belongs to STAT 51200 at Purdue University taught by Dabao Zhang in Fall. Since its upload, it has received 35 views. For similar materials see /class/207942/stat-51200-purdue-university in Statistics at Purdue University.

## Similar to STAT 51200 at Purdue

## Reviews for Applied Regression Analysis

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 09/19/15

W512 WRW Analysis Pm Univemity PmlemrDahaaZlmu 8mm Statistics 512 Applied Regression Analysis Overview We will cover 1 simple linear regression SLR a multiple linear regression MLR 1 analysis of variance ANOVA mam Fagin Statistics 512 Applied Regression Analysis Purdue University Professor Dabao Zhang Spring 2009 Emphasis will be placed on using selected practical tools such as SAS rather than on mathematical manipulations We want to understand the theory so that we can apply it appropriately Some of the material on SLR will be review but our goal with SLR is to be able to generalize the methods to MLR January 12 2009 Page 2 WSMAWRWMW WWW WWW smmm Course Information I Class Section 3 MWF 230320pm at REC 121 Text Applied Linear Statistical Models 5th edition by Kutner Neter Nachtsheim and Li Recommended Applied Statistics and the SAS Programming Language 5th edition by Cody and Smith mum Pages Statistics 512 Applied Regression Analysis Purdue University Professor Dabao Zhang Spring 2009 Professor Dabao Zhang MATH 534 Office Hours MW 330pm430pm or by appointment or phone 46046 or email zhangdbstatpurdueedu Evaluation Problem sets will be assigned more or less weekly They will typically be due on Friday Refer to the handout about specific evaluation policies January 12 2009 Page 4 Statistics 512 Applied Regression Analysis Purdue University Professor Dabao Zhang Spring 2009 I Lecture Notes 0 Available as MSWord or PDF 0 Usually hopefully prepared a week in advance a Not comprehensive Be prepared to take notes a Onetwo chapters per week 0 Ask questions if you re confused January 12 2009 Page 5 Statistics 512 Applied Regression Analysis Purdue University Professor Dabao Zhang Spring 2009 Webpagel http www stat purdue edurvzhangdbstat512 0 Announcements 0 Lecture Notes a Homework Assignments 0 Data Sets and SAS files a General handouts please see immediately Course Information January 391 2 2009 Page 6 Statistics 512 Applied Regression Analysis Purdue University Professor Dabao Zhang Spring 2009 Mailing List I will very occasionally send reminders or announcements through email Blackboard Vista 0 Holds solutions documents a Moniter grades 0 Information restricted to enrolled students a Discussion groups January 12 2009 Page 7 Statistics 512 Applied Regression Analysis Purdue University Professor Dabao Zhang Spring 2009 a One midterm exam has been scheduled on March 5 2009 810pm Please check your schedule and make sure that it works for you Please notify me one week in advance for any conflict a If the lecture viewing schedule is not realistic for homework deadlines please let me know as soon as possible a In class please try to make sure I hear your question a Chatting with your neighbors may disturb others please be courteous to your classmates January 12 2009 Page 8 Statistics 512 Applied Regression Analysis Purdue University Professor Dabao Zhang Spring 2009 SAS is the program we will use to perform data analysis for this class Learning to use SAS will be a large part of the course Getting Help with SAS Several sources for help 0 SAS Help Files not always best a World Wide Web look up the syntax in your favorite search engine 0 SAS Getting Started in SAS Files section of class website and Tutorials 0 Statistical Consulting Service January 391 2 2009 Page 9 Statistics 512 Applied Regression Analysis Purdue University Professor Dabao Zhang Spring 2009 a Wednesday Evening Help Sessions 0 Applied Statistics and the SAS Programming Language 5th edition by Cody and Smith most relevant material in Chapters 1 2 5 7 and 9 0 Your instructor Statistical Consulting Service Math B5 Hours 104 M through F httpwwwstatpurdueeduscs January 12 2009 Page 10 Statistics 512 Applied Regression Analysis Purdue University Professor Dabao Zhang Spring 2009 o Offcampus students If DACS doesn t work for you fill out a license agreement online in SAS folder mail or fax it to Pro Ed Disks will be sent to you I need the license agreements or notification that you re sending a license agreeement by the end of the first week of classes January 12 2009 Page 11 Statistics 512 Applied Regression Analysis Purdue University Professor Dabao Zhang Spring 2009 Evening Computer Labs 0 SC 283 a help with SAS for multiple Stat courses 0 Hours 7pm9pm Wednesdays 0 starting second week of classes a staffed with graduate student TA January 12 2009 Page 12 Statistics 512 Applied Regression Analysis Purdue University Professor Dabao Zhang Spring 2009 I will often give examples from SAS in class The programs used in lecture and any other programs you should need will be available for you to download from the website I will usually have to edit the output somewhat to get it to fit on the page of notes You should run the SAS programs yourself to see the real output and experiment with changing the commands to learn how they work Let me know if you get confused about what is input output or my comments I will tell you the names of all SAS files I use in these notes If the notes differ from the SAS file take the SAS file to be correct since there may be cutandpaste errors There is a tutorial in SAS to hep you get started Help gt Getting Started with SAS Software January 391 2 2009 Page 13 Statistics 512 Applied Regression Analysis Purdue University Professor Dabao Zhang Spring 2009 You should spend some time before next week getting comfortable with SAS For today don t worry about the detailed syntax of the commands Just try to get a sense of what is going on January 12 2009 Page 14 W512 Applied Ramon Analysis Purdue Univemity meamtDahaoZimu Spa2009 Example Price Analysis for Diamond Rings in Singapore I Variables a response variable price in Singapore dollars Y a explanatory variable weight of diamond in carats X Goals It Create a scatterplot 9 Fit a regression line 0 Predict the price of a sale far a 043 caret diamond ring hmlzm Pagels W 512 AW Ramadan mws Pm Dam Huannxuw wm MZWQ SAS Data Step File diamond sas an wabsite One way to input data in SAS is to type gr paste it in In this case we have a sequence of ordered pairs weight price data diamcnd s input weight price w cards 355 483 223 353 919 i655 16 15 26 18 15 35 328 323 663 17 350 18 462 25 750 438 17 318 298 16 339 1086 18 443 18 325 28 823 27 720 25 542 16 336 18 468 16 342 20 498 16 3453917 352 18 419 17 346 15 315 17 350 32 918 16 338 23 595 23 553 17 345 33 945 25 678 25 675 15 287 26 893 15 316 15 322 23 595 19 485 29 860 16 332 Jmnrvizm 939318 Purdue University Statistics 512 Applied Regression Analysis Spring 2009 Professor Dabao Zhang data diamondsl set diamonds if price ne Syntax Notes a Each line must end with a semicolon c There is no output from this statement but information does appear in the 1 0g window a Often you will obtain data from an existing SAS file or import it from another file such as a spreadsheet Examples showing how to do this will come later January 391 2 2009 Page 17 Sbmks ZAnme nwmmNmMm PmuaumMmy Hmuuwounanmm autumn SAS proc pr int New we want to sea what the data leak lika proc print datadiamonds run hs weight pixies 1 6 1 355 2 016 323 3 017 355 47 026 693 43 015 315 49 043 January 12 2009 939318 Statistics 512 Applied Regression Analysis Purdue University Professor Dabao Zhang Spring 2009 We want to plot the data as a scatterplot using circles to represent data points and adding a curve to see if it looks linear The symbol statement v circle v stands for value lets us do this The symbol statement i sm7O will add a smooth line using splines interpolation smooth These are options which stay on until you turn them off In order for the smoothing to work properly we need to sort the data by the X variable January 12 2009 Page 19 WSWWRWMM Pmmmm manner summon proc sort dataudiamondsi by weight symboll vcirc1e iasmVO titlel Diamond Ring Price Study title2 39Scatter plot of Price vs Weight with Smoothing 01 axiSI label39Weight Carats axisz labelangle90 39Price Singapore pron gplot datadiamondsl plot priceweight haxisaxisl vaxisaxisz run mum P3933 Statistics 512 Applied Regression Analysis Purdue University Professor Dabao Zhang Spring 2009 Diamond Ring Price Study Scatter plot of Price vs Weight with Smoothing Curve 6 ea 9 o 0 cu c E 0 o 020 025 Weight Carats January 12 2009 Page 21 awwm zuwwmmmummmmm munmmMm mmuwummzmu magma Naw we want tn use tha simple linear regression ta t a line through the data we use the symbol optiun i r1 392 meaning interpolation regression line that s an L not a one symboll vcircle ir1 titlez 39Scatter plot of Price vs weight with Regressioni proc gplot dataudiamandsl plat priceweight haxisaxisl vaxisaxisz 5mm mum P139322 Statistics 512 Applied Regression Analysis Purdue University Professor Dabao Zhang Spring 2009 Diamond Ring Price Study Scatter plot of Price vs Weight with Regression Line 020 025 Weight Carats January 12 2009 Page 23 Statistics 512 Applied Regression Analysis Purdue University Professor Dabao Zhang Spring 2009 We use proc reg regression to estimate a regression line and calculate predictors and residuals from the straight line We tell it what the data are what the model is and what options we want proc reg datazdiamonds model pricezweightclb p r output outzdiag pzpred rresid id weight run January 391 2 2009 Page 24 Statistics 512 Applied Regression Analysis Professor Dabao Zhang Source Model Error Corrected Total Root MSE Dependent Mean Coeff Var Variable DF Intercept 1 weight 1 Analysis of Variance DF 1 46 47 3184052 50008333 636704 Parameter Estimate 2596259l 372102485 Sum of Squares 2098596 46636 2l45232 R Square Adj R Sq Purdue University Parameter Estimates Standard Error 1731886 8178588 Spring 2009 Mean Square F Value 2098596 206999 101381886 09783 09778 t Value Pr gt t 1499 lt0001 4550 lt0001 January 12 2009 Page 25 Statistics 512 Applied Regression Analysis Professor Dabao Zhang proc print datadiag run Obs weight 1 0 2 0 3 0 4 0 5 0 46 0 47 0 48 0 17 16 17 18 25 15 26 15 Output Statistics Dep Var Predicted 287 0000 316 693 price 355 328 350 325 642 0000 0000 0000 0000 0000 0000 0000 372 335 372 410 670 298 707 298 Value 9483 7381 9483 1586 6303 5278 8406 5278 Std Error Mean Predict 5 8454 3786 0028 9307 5 5 5 5 ONOW ON 3786 3833 4787 3833 Purdue University Sp ngZOOQ Residual 17 7 22 85 28 11 8406 4722 14 17 9483 7381 9483 1586 6303 5278 Std Error Residual 31 31 31 445 31 31 31 31 31 383 299 383 283 194 174 194 January 122009 Page 26 W512 WRW Mam Pm University PWDabaaZimg Spit192009 Simple Linear Regression I Why Use It 0 Descriptive purposes causeeffect relationships 6 Control often of cost a Prediction of outcomes Jamisz P3992 Statistics 512 Applied Regression Analysis Purdue University Professor Dabao Zhang Spring 2009 Data for Simple Linear Regression o Observez39 1 2 ny pairs of variables explanatory response a Each pair often called a case or a data point c K 2th response 0 X2 ith explanatory variable January 12 2009 Page 28 Statistics 512 Applied Regression Analysis Purdue University Professor Dabao Zhang Spring 2009 Simple Linear Regression Model mz o l lXil foori1727 Simple Linear Regression Model Parameters a g is the intercept o l is the slope a 61 are independent normally distributed random errors with mean 0 and variance 02 Le 61 N NO7 02 January 391 2 2009 Page 29 Statistics 512 Applied Regression Analysis Purdue University Professor Dabao Zhang Spring 2009 Features of Simple Linear Regression Model a Individual Observations YZ 50 lXZ a 0 Since 6239 are random Y2 are also random and EYz39 50 51X E z 50 51Xi VarYZ 0 VareZ 02 Since 6239 is Normally distributed Y2 N NW0 lXi 02 See A4 page 1302 January 12 2009 Page 30 Statistics 512 Applied Regression Analysis Purdue University Professor Dabao Zhang Spring 2009 Fitted Regression Equation and Residuals We must estimate the parameters g l 02 from the data The estimates are denoted b0 b1 82 These give us the tted or estimated regression line b0 1 X11 where 0 0 is the estimated intercept o bl is the estimated slope o is the estimated mean for Y when the predictor is Xi ie the point on the fitted line 0 8 is the residual for the ith case the vertical distance from the data point to the fitted regression line Note that K K hampl January 391 2 2009 Page 31 Statistics 512 Applied Regression Analysis Purdue University Professor Dabao Zhang Spring 2009 Using SAS to Plot the Residuals Diamond Example When we called proc reg earlier we assigned the residuals to the name resid and placed them in a new data set called diag We now plot them vs X proc gplot datadiag plot residweight haxiszaxisl vaxisaxi52 vrefO where price ne run January 391 2 2009 Page 32 Diamond Ring Price Study Residual Plot 3 3 Q I 0 EC 020 025 Weight Carats Notice there does not appear to be any obvious pattern in the residuals We ll talk a lot more about diagnostics later but for now you should know that looking at residuals plots is an important way to check assumptions Statistics 512 Applied Regression Analysis Purdue University Professor Dabao Zhang Spring 2009 Least Squares a Want to find best estimators b0 and b1 0 Will minimize the sum of the squared residuals 22711622 ELM 90 lez39W 0 Use calculus take derivative with respect to b0 and with respect to 1 and then set the two resultant equations equal to zero and solve for 0 and 1 see KNNL pages 1718 January 12 2009 Page 34 Statistics 512 Applied Regression Analysis Purdue University Professor Dabao Zhang Spring 2009 Least Squares Solution 0 These are the best estimates for l and g 09 22m Y ssXY 2Xi XV SSX b0 Y le 0 These are also maximum likelihood estimators MLE see KNNL pages 2632 a This estimate is the best because it is unbiased its expected value is equal to the true value with minimum variance January 12 2009 Page 35 Statistics 512 Applied Regression Analysis Purdue University Professor Dabao Zhang Spring 2009 Maximum Likelihood Yr Nlt 0 lX U 0239 J2 1 V27r02 f1 gtlt f2 gtlt gtlt fn likelihood function 2 L Find values for g and l which maximize L These are the SAME as the least squares estimators b0 and b1 January 12 2009 Page 36 Statistics 512 Applied Regression Analysis Purdue University Professor Dabao Zhang Spring 2009 Estimation of 02 We also need to estimate 02 with 52 We use the sum of squared residuals SSE divided by the degrees of freedom n 2 2 Yz 2 63 S n 2 n 2 SSE MSE de s V82 RootJWSE7 where SSE Z 812 is the Sum of squared residuals or errors and MSE stands for mean squared error January 12 2009 Page 37 Statistics 512 Applied Regression Analysis Purdue University Professor Dabao Zhang Spring 2009 There will be other estimated variance for other quantities and these will also be denoted 82 eg 82b1 Without any 82 refers to the value above that is the estimated variance of the residuals January 12 2009 Page 38 Statistics 512 Applied Regression Analysis Purdue University Professor Dabao Zhang Spring 2009 Identifying these things in the SAS output Analysis of Variance Sum of Mean Source DE Squares Square F Value Model 1 2098596 2098596 206999 Error 46 46636 101381886 Corrected Total 47 2145232 Root MSE 3184052 R Square 09783 Dependent Mean 50008333 Adj R Sq 09778 Coeff Var 636704 Parameter Estimates Parameter Standard Variable DE Estimate Error t Value Pr gt t Intercept 1 25962591 1731886 1499 lt0001 weight 1 372102485 8178588 4550 lt0001 January 12 2009 Page 39 WSMWRWMW WWW WWW 893mm Review of Statistical Inference for Normal Samples I This should be review In Statistics 503511 you learned how to construct con dence intervals and do hypothesis tests for the mean of a normal distribution based on a random sample Suppose we have an iid random sample WIi to it at Wn from a normal distribution Usually I would use the symbol X or Y but I want to keep the context general and not use the symbols we use for regression 11leon Page Statistics 512 Applied Regression Analysis Purdue University Professor Dabao Zhang Spring 2009 We have m wild NW 02 where u and 02 are unknown W sample mean n SSW WV sum of squares for W L m W 2 SS s2W W sample variance n 1 n 1 8W V s2W sample standard deviation W 8W a standard error of the mean m January 12 2009 Page 41 Statistics 512 Applied Regression Analysis Purdue University Professor Dabao Zhang Spring 2009 and from these definitions we obtain 2 W NH70 7 n M has a tdlstrlbutlon With it 1 df In short T N t This leads to inference a con dence intervals for u a signi cance tests for u January 12 2009 Page 42 Statistics 512 Applied Regression Analysis Purdue University Professor Dabao Zhang Spring 2009 Confidence Intervals We are 1001 00 confident that the following interval contains ILL W i tCsW W tCsW W tCSW where t0 tn11 the upper 1 percentile of the t distribution with n 1 degrees of freedom and 1 04 is the confidence level eg 095 95 so 04 005 January 12 2009 Page 43 Statistics 512 Applied Regression Analysis Purdue University Professor Dabao Zhang Spring 2009 Significance Tests To test whether pi has a specific value we use a ttest one sample nondirectional HOIMM0 vs Harm uo o t g has a tn1 distribution under H0 W o Reject H0 if iti 2 t0 where t0 tn11 o pvalue ProbHOiTi gt where T N tn1 January 12 2009 Page 44 Statistics 512 Applied Regression Analysis Purdue University Professor Dabao Zhang Spring 2009 The pvalue is twice the area in the upper tail of the tn1 distribution above the observed It is the probability of observing a test statistic at least as extreme as what was actually observed when the null hypothesis is really true We reject H0 ifp 339 CE Note that this is basically the same more general actually as having ltl 2 750 January 12 2009 Page 45

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "When you're taking detailed notes and trying to help everyone else out in the class, it really helps you learn and understand the material...plus I made $280 on my first study guide!"

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.