### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# LINEAR REGRESSION ANALYSIS M 374G

UT

GPA 3.67

### View Full Document

## 52

## 0

## Popular in Course

## Popular in Mathematics (M)

This 9 page Class Notes was uploaded by Reyes Glover on Sunday September 6, 2015. The Class Notes belongs to M 374G at University of Texas at Austin taught by Staff in Fall. Since its upload, it has received 52 views. For similar materials see /class/181461/m-374g-university-of-texas-at-austin in Mathematics (M) at University of Texas at Austin.

## Reviews for LINEAR REGRESSION ANALYSIS

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 09/06/15

M 374G384G Fall 2004 SELECTING TERMS Supplement to Section 115 Consider a regression problem where EY l x nTu is the correct model for the mean function Often such a model has too many terms to be usable Can some terms be deleted without important loss of information One problem that might result from dropping terms is that the resulting mean estimator might be biased For example ifthe correct model is EY l x me T llu1 T lzu2 T k1uk1 where T k1 0 and we fit the model EY l x yo ylu1 yzu2 yk2uk2 by least squares to get fitted values J31 then since the least squares estimates are unbiased for the model used Yo Yiuii qui2 Ykzuik2a which might not be the same as no nluil leuaz nkluik1 EY i Xi The difference between the expected value of the estimate and the parameter being estimated is called the bias of the estimator bias 2 EJZ EYl Xi However dropping terms might also reduce the variance Sometimes having biased estimates is the lesser of two evils One way to address this problem is to evaluate the model by a measure that includes both bias and variance This is the mean squared error The mean squared error of a fitted value is the expected value of the square of the error between the fitted value for the submodel and the true conditional mean at xi M31302 EJZ EY l X012 Please note Do not confuse with another use of MSE to denote RS Sdf Mean Square for Residuals on regression ANOVA table We would like MSE t to be small To understand MSE better we will examine for fixed i the variance offl EY l xi Var EY l X0 EJZ EY l X012 EJZ EY l Xa2 M31303 EJZ EY l X012 MSE31 bias 0 Also since EY l xi is constant Var J31 EY l xi Var J31 Thus MSE31 Varjzl bias 031 So MSE really is a combined measure of variance and bias Now see Section 1015 5 A 02 vamnj SU U l R2 f f where SUJUj is de ned like SXX and Rj2 is the coefficient of multiple determination for the regression of uj on the other terms in the model Notice that the first factor is independent of the other terms Adding a term usually increases Rjz deleting one usually decreases Rjz Thus adding a term usually increases Var 17 deleting a term usually decreases Var 17 ie gives a more precise estimate of 1 Since fit is a linear combination of the 17139s the effect will be the same for Var J31 Summarizing Deleting a term typically decreases Var fit but increases bias So we want to play these effects off against each other by minimizing MSE J31 But we need to do this minimization for all is so we consider the total mean squared error J M31302 11 Z Vary bias 212 11 We want this to be small Since it s a parameter we need to estimate it It works better to estimate the total normed mean squared error Yor F Jo392 where 0392 is as usual the conditional variance of the full model Remember that t is the tted value for the submodel so 7 depends on the submodel To emphasize this we will denote y by VI where I is the set of terms retained in the submodel If the submodel is unbiased then Y1162 Z Var 11 Now appropriate calculations show that U652 Var kl 11 the number of terms in 1 whether or not the submodel is unbiased Try doing the calculation for kI 2 ie when the submodel is a simple linear regression model using the formula for Var t in that case Thus having VI close to kI suggests that the submodel is unbiased Summarizing A good submodel has VI i small to get small total error ii near kI to get small bias Putting together and gives 1 k U01 2 bias 212 11 It turns out that n kI A7 72 where 7 is the conditional variance of the submodel is an appropriate estimator for Z bias 0312 so the statistic 11 n k 6262 CIkI 1 is an estimator of VI CI is called M allow 393 CI statistic It is sometimes called Cp where p k1 Some algebraic manipulation results in the alternate formulation A72 C1k1n39k1 A 39n39kl 7 L291 2kIn 7 Thus we can use Mallow39s statistic to help identify good candidates for submodels by looking for submodels where CI is both i small suggesting small total error and ii S kI suggesting small bias Comments 1 Mallow s statistic is provided by many software packages in some modelselection routine Arc gives it in both Forward selcetion and Backward elimination Other software e g Minitab may use different procedures for Forward and Backward selectionelimination but give Mallow s statistic in another routine 2 Since CI is a statistic it will have sampling variability It might happen for example that CI is negative which would suggest small bias It also might happen that CI is larger than kI even when the model is unbiased but there is no way to distinguish this situation from a case where there is bias but CI happens to be less than VI INTRODUCTION TO SMOOTHING M 374G384G One aspect of regression is to see how the quotcenterquot of the conditional distributions varies as a function of the explanatory variable e g to express EYlX x as a function of x A smooth is a curve constructed to go through or close to all points x EYlX x a quotmean smoothquot or through or close to all points x medYlX x a quotmedian smoothquot Example In the fish data we have seen both a median smooth transparency and a lowess mean smooth constructed by arc Note The median smooth was easy to construct for the fish data since there were just a few values of the explanatory variable Example In trying to construct a median smooth for the haystack data we need to choose the number of quotslicesquot introducing the idea of a smoothing parameter Note 1 What does the haystack smooth help us see in the data 2 Arc also has a quotslide smoothquot function illustrating how a parameter in involved in creating a smooth The lowess locally ighted atterplot smoother smooth can be found on most statistical software Outline 0fh0w the lowess curve is calculated 0 Start with data points x1 yl xn yn 0 Select a smoothing parameter f between 0 and l We39ll use f 05 for illustration 0 For each i a Look at the half iff 12 14 iff 14 etc ofthe data with x values closest to b Fit a line using weighted least squares we may talk about this later to these points in a way that give more weight to points with x closest to xi c Replace yi with yi39 the yvalue of the point on this line corresponding to xi So yi39 quotadjustsquot yI to be in uenced by nearby data points 0 After doing this separately for each i repeat the procedure using points x yi39 so the effect of points away from the trend will probably be less 0 After a few iterations of this process connect all the current quotadjustedquot points M 374G384G SELECTING TERMS Supplement to Section 115 Consider a regression problem where EY X nTu is the correct model for the mean function Often such a model has too many terms to be usable Can some terms be deleted without important loss of information One problem that might result from dropping terms is that the resulting mean estimator might be biased For example if the correct model is EY X 2710 Thu1 nzu2 nkJuk1 where nkrl 0 and we fit the model EY X 2 Y0 lu1 quz Huk2 by least squares to get fitted values y then since the least squares estimates are unbiased for the model used 2 Y0 Yluu qui2 Ykrzuikrz which might not be the same as quot0 Tliuu Tlqu leriumi EY I Xi The difference between the expected value of the estimate and the parameter being estimated is called the bias of the estimator bias 1 E EY I x Here y is the estimate from the submodel However dropping terms might also reduce the variance Sometimes having biased estimates is the lesser of two evils Try drawing a picture to illustrate this One way to address this problem is to evaluate the model by a measure that includes both bias and variance This is the mean squared error The expected value of the square of the error between the fitted value for the submodel and the true conditional mean at Xi MSE i EW EY l X2 Note 1 MSE a is deefined like the sampling variance ofyi 2 Thus if y is an unbiased estimator of EY X then MSE 52 3 Do not confuse with another use of MSE to denote RSSdf 2 Mean Square for Residuals on regression ANOVA table We would like MSE 52 to be small To understand MSE better we will examine for fixed i the variance ofyl EY Xi Varyi EY Xi Ef i EY I X2 Ef EY I X2 M51367 Ef i EY 151012 MSE bias 912 Also since EY X is constant Varyi EY Xi Varyi Thus MSEyi Varyi bias 912 So MSE really is a combined measure of variance and bias Now see Section 1015 the sampling variance of 1 in the submodel is 02 1 SUUi Rj Var j where SUj Uj is defined like SXX and Rf is the coefficient of multiple determination for the regression of uj on the other terms in the model Notice that the first factor is independent of the other terms Adding a term usually increases Rz deleting one usually decreases Rjz Thus adding a term usually increases Var j deleting a term usually decreases Var j ie gives a more precise estimate of 7 Since y is a linear combination of the ms the effect will be the same for Var a Summarizing Deleting a term typically decreases Var a but increases bias So we want to play these effects off against each other by minimizing MSE 1 But we need to do this minimization for all is so we consider the total mean squared error J E MSE 1 i1 2 E Varyi bias 912 1 We want this to be small Since J involves the parameters EY Xi we need to estimate it It works better to estimate the total normed mean squared error yor I J02 where 02 is as usual the conditional variance of the full model Remember that y is the fitted value for the submodel so Y depends on the submodel To emphasize this we will denote Y by Y1 where I is the set of terms retained in the submodel If the submodel is unbiased then Y1102 E Var i i1 Now appropriate calculations show that 102 Varjxik1 i1 the number of terms in 1 whether or not the submodel is unbiased Try doing the calculation for kI 2 ie when the submodel is a simple linear regression model using the formula for Var 32 in that case This implies that an unbiased model has Y1 kI Thus having Y1 close to kI suggests that the submodel is unbiased Summarizing A good submodel has Y1 i small to get small total error ii near kI to get small bias Putting together and gives YI k1 102 E bias 912 i1 It turns out that n kI 712 572 where 672 is the conditional variance of the submodel is an appropriate estimator for E bias 352 so the statistic i1 CI kI n kav av2 A Z O is an estimator of Y1 CI is called Mallow 393 CI statistic It is sometimes called Cp where p 2 k1 Some algebraic manipulation results in the alternate formulation 62 CIkIn kI 6 12 n k1 RSS 2 A21 2kI n a Thus we can use Mallow s statistic to help identify good candidates for submodels by looking for submodels where CI is both i small suggesting small total error ii 5 kI suggesting small bias Comments 1 Mallow s statistic is provided by many software packages in some model selection routine Arc gives it in both Forward selcetion and Backward elimination Other software e g Minitab may use different procedures for Forward and Backward selectionelimination but give Mallow s statistic in another routine 2 Since CI is a statistic it will have sampling variability It might happen for example that CI is negative which would suggest small bias It also might happen that CI is larger than kI even when the model is unbiased but there is no way to distinguish this situation from a case where there is bias but CI happens to be less than Y1

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "I signed up to be an Elite Notetaker with 2 of my sorority sisters this semester. We just posted our notes weekly and were each making over $600 per month. I LOVE StudySoup!"

#### "There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

#### "It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.