New User Special Price Expires in

Let's log you in.

Sign in with Facebook


Don't have a StudySoup account? Create one here!


Create a StudySoup account

Be part of our community, it's free to join!

Sign up with Facebook


Create your account
By creating an account you agree to StudySoup's terms and conditions and privacy policy

Already have a StudySoup account? Login here

Applied Regression Analysis

by: Bailey Macejkovic

Applied Regression Analysis STAT 51200

Marketplace > Purdue University > Statistics > STAT 51200 > Applied Regression Analysis
Bailey Macejkovic
GPA 3.63

Bowei Xi

Almost Ready


These notes were just uploaded, and will be ready to view shortly.

Purchase these notes here, or revisit this page.

Either way, we'll remind you when they're ready :)

Preview These Notes for FREE

Get a free preview of these Notes, just enter your email below.

Unlock Preview
Unlock Preview

Preview these materials now for free

Why put in your email? Get access to more of this material and other relevant free materials for your school

View Preview

About this Document

Bowei Xi
Class Notes
25 ?




Popular in Course

Popular in Statistics

This 27 page Class Notes was uploaded by Bailey Macejkovic on Saturday September 19, 2015. The Class Notes belongs to STAT 51200 at Purdue University taught by Bowei Xi in Fall. Since its upload, it has received 44 views. For similar materials see /class/207934/stat-51200-purdue-university in Statistics at Purdue University.


Reviews for Applied Regression Analysis


Report this Material


What is Karma?


Karma is the currency of StudySoup.

You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 09/19/15
Statistics 512 Applied Linear Models Topic 1 Topic Overview This topic will cover 0 Course Overview amp Policies 0 SAS 0 KNNL Chapter 1 emphasis on Sections 13 16 and 17 much should be review Simple linear regression method of least squares LS I KNNL Chapter 2 emphasis on Sections 21 29 Inference in simple linear regression Prediction intervals and Con dence bands ANOVA tables General Linear Test Class Website http www stat purdue edu xbwcoursesstat512512 htm Class Policies Refer to handout Overview We will cover a simple linear regression KNNL Chapters 1 5 a multiple regression KNNL Chapters 6 11 0 analysis of variance ANOVA KNNL Chapters 15 25 and possibly throw in some other stuff for fun Emphasis will be placed on using selected practical tools such SAS rather than on mathematical manipulations We want to understand the theory so that we can apply it appropriately Some of the material on SLR will be review but our goal with SLR is to be able to generalize the methods to MLR SAS SAS is the program we will use to perform data analysis for this class Learning to use SAS will be a large part of the course Getting Help with SAS Several sources for help a SAS Help Files not always best I World Wide Web look up the syntax in your favorite search engine 0 SAS Getting Started in SAS Files section of class website and Tutorials 0 Statistical Consulting Service 0 Evening Help Sessions 0 Applied Statistics and the SAS Programming Language 5th edition by Cody and Smith most relevant material in Chapters 1 2 5 7 and 9 0 Your instructor Statistical Consulting Service Math 1175 Hours 10 4 M through F httpwww stat purdue eduscs I will often give examples from SAS in class The pro grams used in lecture and any other programs you should need will be available for you to download from the website I will usually have to edit the output somewhat to get it to t on the page of notes You should run the SAS programs yourself to see the real output and experiment with changing the commands to learn how they work Let me know if you get confused about what is input output or my comments I will tell you the names of all SAS les I use in these notes If the notes differ from the SAS le take the SAS le to be correct since there may be cut and paste errors There is a tutorial in SAS to help you get started Help gt Getting Started with SAS Software You should spend some time before next week getting comfortable with SAS see HW 0 For today don t worry about the detailed syntax of the commands Just try to get a sense of what is going on Example Price Analysis for Diamond Rings in Singa pore Variables 0 response variable price in Singapore dollars Y I explanatory variable weight of diamond in carets X Goals 0 Create a scatterplot 0 Fit a regression line a Predict the price of a sale for a 043 caret diamond ring SAS Data Step File diamondsas on website One way to input data in SAS is to type or paste it in In this case we have a sequence of ordered pairs weight price data diamonds input weight price Q 17 355 16 328 17 350 18 325 25 642 16 342 15 322 19 485 21 483 15 323 18 462 28 823 16 336 20 498 23 595 29 860 12 223 26 663 25 750 27 720 18 468 16 345 17 352 16 332 17 353 18 438 17 318 18 419 17 346 15 315 17 350 32 918 32 919 15 298 16 339 16 338 23 595 23 553 17 345 33 945 25 655 35 1086 18 443 25 678 25 675 15 287 26 693 15 316 data diamonds1 set diamonds if price ne Syntax Notes 0 Each line must end with a semi colon I There is no output from this statement but information does appear in the log window I Often you will obtain data from an existing SAS le or import it from another le such a spreadsheet Examples showing how to do this will come later SAS proc print Now we want to see what the data look like proc print dat adiamonds run Dbs weight price 7 1 01 355 2 016 328 3 017 350 47 026 693 48 015 316 49 043 SAS proc gplot We want to plot the data a seatterplot using Circles to represent data points and adding a curve to see if it looks linear The symbol statement v circle v stands for Value lets us do this The symbol statement 1 511170 will add a smooth line using splines interpolation smooth These are Options which stay on until you turn them off In order for the smoothing to work properly we need to sort the data by the X variable proc sort datadiamondslg by weight symboll vcircle ism70 tit1e1 Diamond Ring Price Study title2 Scatter plot of Price vs Weight with Smoothing Curve axisl 1abe1 Weight Carets axi52 1abe1ang1e90 Price Singapore proc gplot datadiamondslg plot priceweight haxisaxisl vaxisaxi52g run Diamond Ring Price Study Scatter plot of Price vs Weight with Smoothing Curve Price Singapore 33 200 OJO 0J5 Q20 Q25 030 Q35 Weight Carats Now we want to use the simple linear regression to t a line through the data We use the symbol option 1 r1 meaning interpolation regression line that s an L not a one symbol vcircle ir1 tit1e2 Scatter plot of Price vs Weight with Regression Line proc gplot datadiamonds1g plot pricenzeight haXisaXis1 vaxisaxis2g run Diamond Ring Price Study Scatter plot of Price vs Weight with Regression Line 1100 1000 g 900 g 800 n 700 239 a 600 g 500 E 400 300 200 010 015 020 025 030 035 Weight Calais SAS proc reg We use proc reg regression to estimate a regression line and calculate predictors and residuals from the straight line We tell it what the data are what the model is and what options we want proc reg datadiamondsg model priceweightclb p r output outdiag ppred rresid id weight run Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr gt F Model 1 2098596 2098596 2069 99 lt 0001 Error 46 46636 101381886 Corrected Total 47 2145232 Root MSE 31 84052 R Square 09783 Dependent Mean 500 08333 Adj RSq 0 9778 Coeff Var 636704 Parameter Variable DF Est imate Intercept 1 25962591 weight 1 3721 02485 proc print dat adiagg Parameter Estimates Standard Error t Value 1731886 1499 81 78588 4550 Pr gt It lt0001 lt0001 run Jutput Statistics Dep Var Predicted Std Error Std Error Dbs weight price Value Mean Predict Residual Residual 1 017 3550000 3729483 53786 39179483 31383 2 016 3280000 3357381 5 8454 3977381 31299 3 017 3500000 3729483 5 3786 39229483 31 383 4 018 3250000 4101586 5 0028 39851586 31445 5 025 6420000 6706303 59307 39286303 31 283 46 015 2870000 2985278 63833 1l5278 31194 47 026 6930000 7078406 64787 39148406 31 174 48 015 3160000 2985278 63833 174722 31194 Simple Lmear Regressmn Why Use It 0 Descriptive purposes causeeffect relationships 0 Control often of cost 0 Prediction of outcomes Data for Simple Linear Regression 0 Observe i 1 2 7 pairs of variables explanatory response a Each pair often called a case or a data point 0 Y1 ith response 0 X1 ith explanatory variable Simple Linear Regression Model K5o51Xieifori12 Simple Linear Regression Model Parameters 7quot is the intercept a 7 1 is the slope o e are independent normally distributed random errors with mean 0 and variance 0392 ie 6 N0 0392 Features of Simple Linear Regression Model 0 Individual Observations Y 7 1 51X 6 0 Since 6 are random Y are also random and EGG 7 5 in39 5 7 5 le39 VarY 0 Vare 02 Since 6 is Normally distributed Y N NU iXi 02 See A4 page 1302 Fitted Regression Equation and Residuals We must estimate the parameters 7quot 7 1 0392 from the data The estimates are denoted bu b1 32 These give us the tted or estimated regression line Y bi lei where 0 71 is the estimated intercept 0 Z7 is the estimated slope a Y is the estimated mean for Y when the predictor is X ie the point on the tted line a e is the residual for the ith case the vertical distance from the data point to the tted regression line Note that e Y Y Y I lei Using SAS to Plot the Residuals Diamond Example When we called proc reg earlier we assigned the residuals to the name resid and placed them in a new data set called diag We now plot them vs X proc gplot datadiagg plot residnzeight haxisaxisl vaxisax152 vref0g where price ne run Notice there does not appear to be any obvious pattern in the residuals VVe ll talk a lot more about diagnostics later but for now you should know that looking at residuals plots is an important way to check assumptions Diamond Ring Price Study Residual Plot Residual 010 015 020 025 030 035 Weight Carats Least Squares I Want to nd best estimators bu and b1 0 Vill minimize the sum of the squared residuals 217 812 Egg bu lei2 I Use calculus take derivative with respect to bu and with respect to 71 and then set the two result equations equal to zero and solve for bu and 71 see KNNL pages 17 18 Least Squares Solution 0 These are the best estimates for 7 1 and 7 0 ZltXi XX Y SSXY ZltXi XV SSX bi bu Y b1X I These are also maximum likelihood estimators MLEL see KNNL pages 26 32 a This estimate is the best because because it is unbiased its expected value is equal to the true value with variance Maximum Likelihood Yz39 quot Ni u 1Xi902 1 iu do 51Xi2 6 2 6 f1 V 27m L f1 gtlt f2 gtlt gtlt fn likelihood function 8 Find values for 7 1 and 7 1 which maximize L These are the SAME the least squares estimators bu and b1 Estimation of 02 We also need to estimate 02 with 32 We use the sum of squared residuals SSE divided by the degrees of freedom 71 2 2 20242 71 2 71 2 SSE MSE 19 s V32 Root MSE where SSE is the sum of squared residuals or errors and WISE stands for mean squared error There will be other estimated variance for other quantities and these will also be denoted s2 eg s2b1 Without any 32 refers to the value above that is the estimated variance of the residuals Identifying these things in the SAS output Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr gt F Model 1 2098596 2098596 2069 99 lt 0001 Error 46 46636 101381886 Corrected Total 47 2145232 Root MSE 3184052 RSquare 09783 Dependent Mean 50008333 Adj R Sq 09778 Coeff Var 636704 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr gt ltl 957 Confidence Limits Intercept 1 25962591 1731886 1499 lt0001 29448696 22476486 weight 1 3721 02485 81 78588 4550 lt 0001 355639841 388565129 Review of Statistical Inference for Normal Samples This should be review In Statistics 503511 you learned how to construct con dence intervals and do hypothesis tests for the mean of a normal distribution based on a random sample Suppose we have an iid random sample 1 W n from a normal distribution Usually I would use the symbol X or Y but I want to keep the context general and not use the symbols we use for regression We have V NW N 0392 where and 0392 are unknown 1 u u W 27in sample mean SSW W 2 sum of squares for W s2VV if sample variance estimating variance of entire population 3VV sample standard deviation 39 SUV standard error of the mean standard deviation of mean W and from these de nitions we obtain 7 0392 W N in W u 8WI has a t distribution with 71 1 df in short T N 04 This leads to inference a con dence intervals for u 0 signi cance tests for 1 Con dence Intervals We are 1001 a con dent that the following interval contains n W i tC8W W tC8W39 W tC8VV where tc tn11 g the upper 1 percentile of the t distribution with 71 1 degrees of freedom and 1 a is the con dence level eg 095 95 so a 005 Signi cance Tests To test whether it has a speci c value we use a t test one sample non directional HOW4IL1 VS Hazliul I t has a twl distribution under H 10 o Reject H1 if t 2 N where tC tnn l o p value ProbHO T gt 1 where T N tnnl The p value is twice the area in the upper tail of the tnnl distribution above the observed gtg It is the probability of observing a test statistic at least extreme what was actually observed when the null hypothesis is really true We reject H1 if p g a Note that this is basically the same more general actually having t 2 tc Important Notational Comment The text conclude HA HO This is shorthand for if t is in the rejection region i 2 tc otherwise conclude I conclude HA means there is sufficient evidence in the data to conclude that H1 is false and so we assume that HA is true I conclude H1 means there is insufficient evidence in the data to conclude that either H1 or HA is true or false so we default to assuming that H1 is true Notice that a failure to reject H1 does not mean that there was any evidence in favor of H1 NOTE In this course a 005 unless otherwise specified Section 21 Inference about 51 b1 N Nah 02919 2 where 02b1 X bl 71 8b1 82 where sb1 33 X tN tnnglf5120 According to our discussion above for W you therefore know how to obtain 01 s and t tests for 7 1 I ll go through it now but not in the future There is one important difference the degrees of freedom df here are 71 2 not 71 1 because we are also estimating 7 0 Con dence Interval for 7 1 I b1 itCS Uh I where tc tn21 g the upper 1001 percentile of the t distribution with 71 2 degrees of freedom 0 1 a is the con dence level Signi cance Tests for 31 I Huzb l0vsHub 13 U bil t sibl I Reject H1 if t 2 t2 tC tni21 aQ p Value Prob T gt gig where T N tn Inference for g 0 b N NU 02bu where 0392bu 0392 I t 10 for sbu replacing 0392 by s2 and take f sbo sum 84 5 t N tni2 Con dence Interval for 50 I bu l tC8bu o where tC tn21 g 1 a is the con dence level Signi cance Tests for 50 39 H015030VSH 41503L 0 b 7 I Reject H1 if t 2 tc tc tni21 I p Value Prob T gt t where T N tn2 Notes I The normality of bu and b1 follows from the fact that each is a linear combination of the Yi themselves each independent and normally distributed a For b1 see KNNL page 42 For bu try this an exercise Often the CI and signi cance test for 7 0 is not of interest If the 6 are not normal but are approximately normal then the CI s and signi cance tests are generally reasonable approximations I These procedures can easily be modi ed to produce onesided con dence intervals and signi cance tests Because 0251 we can make this quantity small by making XV large ie by spreading out the SAS proc reg Here is how to get the parameter estimates in SAS Still using diamondsas The option clb SAS to give you con dence limits for the parameter estimates bu and b1 proc reg datadiamondsg model priceweightclb Parameter Variable DF Est imate Intercept 1 3925962591 weight 1 3721 02485 Points to Remember Parameter Estimates Standard Error t Value Pr gt It I 95 Conf idence Limits 1731886 391499 lt0001 3929448696 3922476486 8178588 4550 lt0001 355639841 388565129 0 What is the default value of a that we use in this class I What is the default con dence level that we use in this class 0 Suppose you could choose the X How would you choose them if you wanted a precise estimate of the slope intercept both Summary of Inference 39 Yi u lXi i I 6 N0 0392 are independent random errors 13 Parameter Estimators m Xxxz r ZltXiX2 71 I bu I b1X 2 92 K 71 2 For 7 1 b1 039 95 Con dence Intervals for 50 and 51 0 b1 1 tcsbl 0 b1 1 tcsbu o where tc tn21 g the 1001 upper percentile of the t distribution with 71 2 degrees of freedom Signi cance Tests for 50 and 51 11015030 Huz l0 t tmrm under H1 H z l0 Huz l0 t 5211 tmrm under H1 Reject H1 if the p Value is small lt 005 KNNL Section 23 Power The power of a signi cance test is the probability that the null hypothesis will be rejected when in fact it is false This probability depends on the particular value of the parameter in the alternative space When we do power calculations we are trying to answer questions like the following Suppose that the parameter 7 1 truly has the value 15 and we are going to collect a sample of a particular size 71 and with a particular SSX What is the probability that based on our not yet collected data we will reject H0 Power for 51 H01510gHu1510 b 39 t tc tni21 I for a 005 we reject H1 when t 2 tc I so we need to find Pt 2 tc for arbitrary values of 7 1 0 a when 7 1 0 the calculation gives a H is true a t N 0403 noncentral t distribution t distribution not centered at 0 o 6 will is the noncentrality parameter it represents on a standardized scale how far from true H1 is kind of like effect size I We need to assume values for 02b1 and 71 o KNNL uses tables see pages 50 51 a we will use SAS Example of Power for 51 0 Response Variable Work Hours a Explanatory Variable Lot Size 0 See page 19 for details of this study page 50 51 for details regarding power a We assume I2 2500 71 25 and SSX 19800 so we have 02071 01263 0 Consider 7 1 15 7 31 a Re now can calculate 6 W21 o with t N tn26 we want to find P t 2 tc I We use a function that calculates the cumulative distribution function cdf for the noncentral t distribution See program nlmw055 sas for the power calculations data a1 n25g sig22500g ssx19800g alpha05 sig2b1sig2ssx dfn2 beta115 deltaabsbeta1sqrtsngblg tstartinv1alpha2dfg power1probttstardfdeltaprobttstardfdelta output proc print dataa1grung Dbs n sig2 ssx alpha sig2b1 df betal delta tstar 1 25 2500 19800 005 012626 23 15 42213 206866 power 098121 data a2 n25g sig22500g ssx19800g alpha05 sig2b1sig2ssx dfn2 do beta120 to 20 by 05 deltaabsbeta1sqrtsigzbl tstartinv1alpha2dfg power1probttstardfdeltaprobttstardfdelta output end proc print dataa2g run titlel Power for the slope in simple linear regression symboll vnone ijoin proc gplot dataa2g plot powerbeta1 run Power for the slope in simple linear regression u may Figure 1 Power of the t test for detecting different values of 7 1 SectiOn 24 Estimation of o EEOh uh 7 1 th the mean value of Y for the subpopulation with X Xh 16 I We will estimate EYh with l 11h bu leh o KNNL uses r to denote this estimate we will use the symbols 1 h interchangeably I See equation 228 on page 52 Theory for Estimation of EOE A A 7 2 Yh is normal with mean M and variance 02931 0392 o The normality is a consequence of the fact that bu leh is a linear combination of 1 I The variance has two components one for the intercept and one for the slope The variance associated with the slope depends on the distance Xh X The estimation is more accurate near X a See KNNL pages 52 55 for details Application of the Theory We estimate 02l7 h with 323 s2 It follows that t tni2 proceed usual 95 Con dence Interval for EYh A h Jztcsl 1 where F tn20975 A NOTE Signi cance tests can be performed for Yh but they are rarely used in practice Example See program nknwOGO sas for the estimation of subpopulation means The option elm to the model statement for con dence limits for the mean Yh data a1 infile HStatS12DatasetsCh01ta01dat g input size hours data a2 size65g output size100g output data a3 set a1 a2 proc print dataa3 run proc reg dataa3 model hourssizeclm id size run Dep Var Predicted Std Error Dbs size hours Value Mean Predict 957 CL Mean 25 70 3230000 3122800 97647 2920803 3324797 26 65 2944290 99176 2739129 3149451 27 100 4193861 142723 3898615 4489106 Section 25 Prediction of YMWU We wish to construct an interval into which we predict the next observation for a given Xh will fall a The only difference operationally between this and EY is that the variance is different I In prediction we have two variance components 1 variance associated with the estimation of the mean response Y5 and 2 variability in a single observation taken from the distribution with that mean YEWwa 7 1 7 th 6 is the value for a new observation with X Xh We estimate 1th starting with the predicted value if This is the center of the confidence interval just it was for EY However the width of the CI is different because they have different variances VarYnew Varl Vare s2pred s2l39hs2 1 X h X 2 2 2 8 Ted s 1 7 p 71 X2 mew Y3 spred spred denotes the estimated standard deviation of a new observation with X Xh It takes into account variability in estimating the mean Yh well variability in a single observation from a distribution with that mean tn72 Notes The procedure can be modified for the mean of m observations at X Xh see 239a on page 60 Standard error is affected by how far X5 is from X7see Figure 23 As was the case for the mean response prediction is more accurate near X See program nknw065sas for the prediction interval example The 211 option to the model statements SAS to give confidence limits for an individual observation cf clb and clm data a1 infile HStatS12DatasetsCh01ta01 dat input size hours data a2 size65g output size100g output data a3 set a1 a2 proc reg dataa3 model hourssizecli run Dep Var Predicted Std Error Dbs size hours Value Mean Predict 957 CL Predict 25 70 3230000 3122800 97647 2092811 4152789 26 65 2944290 99176 1913676 3974904 27 100 4193861 142723 3141604 5246117 Notes a The standard error Std Error Mean Predict given in this output is the standard error of 73 not shred That s why the word mean is in there The CL Predict label tells you that the con dence interval is for the prediction of a new observation I The prediction interval for Yth is wider than the con dence interval for 73 because Section 26 it has a larger variance WorkingHotelling Con dence Bands for Entire Regression Line This is a con dence limit for the whole line at once in contrast to the con dence interval for just one Yh at a time Regression line bu leh describes EEOh for a given Xh We have 95 CI for EY r pertaining to speci c Xh We want a con dence band for all X this is a con dence limit for the whole line at once in contrast to the con dence interval for just one Yh at a time The con dence limit is given by r J W sl where V2 2F27n21 Since we are doing all values of X at once it will be wider at each X than 01 s for individual Xh The boundary values de ne a hyperbola The theory for this comes from the joint con dence region for gJ which is an ellipse see Stat 524 19 Residual 107200 I We are used to constructing 01 s with is not VV s Can we fake it a We can nd a new smaller a for F that would give the same result kind of an effective alpha that takes into account that you are estimating the entire line I We nd W2 for our desired a and then nd the effective a to use with tc that gives VVa tca Con dence Band for Regression Line See program nknw067 sas for the regression line con dence band data a1 n25 a1pha10 dfn2 dfdn2 w22finv1a1phadfndfd wsqrtw2 alphat21probtwdfd tstartinv1a1phat2 dfd output proc print dataa1run Note 1probt w dfd gives the area under the t distribution to the right of in We have to double that to get the total area in both tails Dbs n alpha din dfd w2 w alphat tstar 1 25 01 2 23 509858 225800 0033740 225800 data a2 infile HSystemDesktopCH01TA01DAT input size hours symboll vcircle irlclm97 proc gplot dataa2 plot hourssize Estimation of EYh Compared to Prediction of Yh 31 bub1Xh 7 2 82Yl1 8211 Xh X 71 X27 2 2 1 X2 s prod 8 11 See the program nknw067xsas for the clm mean and 11 individual plots data a1 infile HSystemDesktopCH01TA01 DAT input size hours Con dence intervals Figure 2 Working Hotelling 95 Con dence Region for the Line symboll vcircle irlclm95 proc gplot dataa1g plot hourssize Prediction Intervals symbol vcircle ir101195 proc gplot dataa1g plot hoursgtksizeg run Section 27 Analysis of Variance ANOVA Table I Organizes results arithmetically 0 Total sum of squares in Y is SSy 20 1732 0 Partition this into two sources Model explained by regression Error unexplained residual Yi Y YE YEHGE Y 02 W 20 2 22 W cross terms eaneel see page 65 Figure 3 95 Con dence Intervals for the Mean Total Sum of Squares I Consider ignoring Xh to predict E04 Then the best predictor would be the sample mean Y SST is the sum of squared deviations from this predictor SST SSy 202 17 I The total degrees of freedom is dfT 71 1 o MST SSTdfT o MST is the usual estimate of the variance of Y if there are no explanatory variables also known as s2Y 0 SAS uses the term Corrected Total for this source Uncorrected is 2Y3 The term corrected means that we subtract off the mean Y before squaring Model Sum of Squares SSM Ed Y o The model degrees of freedom is dfM 1 since one parameter slope is estimated a MSM SSMdfM o KNNL uses the word regression for what SAS calls model I So SSR KNNL is the same SS Model SAS I prefer to use the terms SSM and dfM because R stands for regression residual and reduced later which I nd confusing Figure 4 Prediction Intervals Error Sum of Squares 39 SSE 202 YEP The error degrees of freedom is de 71 2 since estimates have been made for both slope and intercept MSE 5313de o M SE s2 is an estimate of the variance of Y taking into account or conditioning on the explanatory variables ANOVA Table for SLR Source df SS MS Model Regression 1 20 17 2 Error 71 2 20 32 Total 71 1 20 Yi2 W Note about degrees of freedom Occasionally you will run across a reference to degrees of freedom without specifying whether this is model error or total Sometimes is will be clear from context and although that is sloppy usage you can generally assume that if it is not speci ed it means error degrees of freedom Expected Mean Squares o MSM MSE are random variables o EMSM 0392 b fSSX I EMSE 0392 I When Hg 51 0 is true then o This makes sense since in that case if Y F test o F MSMMSE Fifi de Fm See KNNL pages 69 70 a When H1 7 1 0 is false MSM tends to be larger than MSE so we would want to reject H1 when F is large 0 Generally our decision rule is to reject the null hypothesis if F 2 FC Fdegde1 I F17n72095 o In practice we use p Values and reject H1 if the p Value is less than 0 Recall that t b18b1 tests H1 7 1 0 It can be shown that t2 F145 The two approaches give the same p Value they are really the same test 0 Aside When H1 7 1 0 is false F has a noncentml F distribution this can be used to calculate power ANOVA Table Source df SS MS F p Model 1 S S M M S M ill38g Error 71 2 SSE MSE Total 71 1 See the program nknw073sas for the program used to generate the other output used in this lecture data a1 infile HSystemDesktopCH01TA01 DAT input size hours proc reg dataa1g model hourssizeg run Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr gt F Model 1 252378 252378 105 88 lt 0001 Error 23 54825 2383 71562 Corrected Total 24 307203 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr gt ltl Intercept 1 62 36586 26 17743 2 38 0 0259 size 357020 034697 1029 lt0001 Note that t2 10292 10588 F Section 28 General Linear Test a A different View of the same problem testing 7 1 0 It may seem redundant now but the concept is extremely useful in MLR I We want to compare two models Yz 7 5 le39 5 6139 full mOdOI Y1 7 61 Teduced model Compare using the error sum of squares Lct SSEF be the SSE for the Full model and let SSER be the SSE for the Reduced Model F 33133 SSEWDWmm dfmm SSEFdeF Compare to the critical value EC FdeltRyideF7deF1 a to test H0 7 1 0 vs Hu 7 1 11 Test in Simple Linear Regression 53133 2in Y SST SSEF SST SSMthe usual SSE deR 71 1 deF 71 2 dfmm dfmm 1 SST SSE1MSM H F SSEn2 7MSESametestasbef01e This approach full vs reduced is more general and we will see it again in MLR 25 Pearson Correlation p is the usual correlation coefficient estimated by T o It is a number between 1 and 1 that measures the strength of the linear relationship between two variables Xamp XXE v EXXi XV 202 YV Notice that EXXi XV T b1 72 201 Y 5181 8y Test H0 7 1 0 similar to H0 p 0 R2 and r2 I R2 is the ratio of explained and total variation R2 SSMSST I T2 is the square of the correlation between X and Y b zw v 1 0 Y SSM SST In SLR 7 2 R2 are the same thing However in MLR they are different there will be a different 7 for each X variable but only one R2 R2 is often multiplied by 100 and thereby expressed as a percent In MLR we often use the adjusted R2 which has been adjusted to account for the number of variables in the model more in Chapter 6 Sum of Mean Source DF Squares Square F Value Pr gt F Model 1 252378 252378 10588 lt0001 Error 23 54825 2383 C Total 24 307203 RSquare 0 8215 SSMSST 1 SSESST 252378307203 Adj Rsq 08138 1 MSEMST 1 238330720324


Buy Material

Are you sure you want to buy this material for

25 Karma

Buy Material

BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.


You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

Why people love StudySoup

Bentley McCaw University of Florida

"I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

Jennifer McGill UCSF Med School

"Selling my MCAT study guides and notes has been a great source of side revenue while I'm in school. Some months I'm making over $500! Plus, it makes me happy knowing that I'm helping future med students with their MCAT."

Steve Martinelli UC Los Angeles

"There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

Parker Thompson 500 Startups

"It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

Become an Elite Notetaker and start selling your notes online!

Refund Policy


All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email


StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here:

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.