### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# Regression Analysis STA 108

UCD

GPA 3.69

### View Full Document

## 43

## 0

## Popular in Course

## Popular in Statistics

This 10 page Class Notes was uploaded by Carmen Mayer on Tuesday September 8, 2015. The Class Notes belongs to STA 108 at University of California - Davis taught by Staff in Fall. Since its upload, it has received 43 views. For similar materials see /class/191918/sta-108-university-of-california-davis in Statistics at University of California - Davis.

## Reviews for Regression Analysis

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 09/08/15

1 Onetailed statistical test and Twotailed statistical test If we have null and alternative hypotheses like H0 3 51 S 510 Ha 3 51 gt 510 This is one tailed statistical test because the alternative hypothesis speci es that the population mean 61 is strictly greater than 610 Similarly If we have H0 3 51 Z 510 Ha 3 51 lt 510 This is also a one tailed statistical test If the hypotheses are H0 3 51 510 Ha 3 51 7 510 This is a two tailed hypothesis because in the alternative hypothesis we want to show 61 is either larger or smaller than 610 2 Test statistics and Decision rule H0 3 51 510 Ha 3 51 lt 5100THa 3 51 gt 510 We calculate the test statistics tgtk 581 Decision rule If tgtk 2 7t17 0471 7 2 00710ludeH0 lf tgtk lt 7t17 0471 7 2 concludeHa or when Ha 61 gt 610 If tgtk S t1 7 0471 7 2 00710ludeH0 lf tgtk gt t1 7 04 71 7 2 concludeHa 3 p value De nition p value for a speci ed statistical test is the probability of observing a value of the test statistic that is at least as contradictory to the null hypothesis and supportive of the alternative hypothesis as the actual one computed from the sample data For example we want to test 71 16 a 05 H0 61 5 Ha 61 gt 5 while the test statistic tgtk 23 Since the test is one tailed and the alternative hypothesis of interest is Ha 61 gt 5 values of the test statistic even more contradictory to H0 than the one observed would be values larger than t23 Therefore p value for this test is p valuePt 2 23 0019 which is the area under the t distribution with degree of freedom n 2 to the right of t23 We conclude that the test result is 77 statistically signi cant The test result disagrees with the null hypothesis H0 61 5 and favor Ha B gt 5 Hence the probability of observing a t value as large as 23 is only 0019 if in fact the true value of 61 is 5 As a result 1 If the test is onetailed the p Value is equal to the tail area beyond t in the same direction as the alternative hypothesis For example if the Ha is of the from gt the p Value is the area to the right of or above the observed t Value t 2If the test is twotailed the p Value is equal to twice the tail area beyond the observed tValue in the direction of the sign of t In other words if t is positive the pvalue is the area to the right of or above the observed t value Handout 8 Matrix approach to regression analysis Recall that the simple linear regression model is given by Yi 50 61Xi5i7i 1Mn7 Where X17 Y17 in X77 Yn are the observations7 and 51 an are independent With mean zero and variance 021 This model can be rewritten as Y X5 57 Y1 1 X1 51 Y2 1 X2 5 52 WhereY 1 7X 1 1 0 ande i 51 Yn 1 Xn an Random vectors and matrices A random vector is a vector of random variables So if Y17 Y2 and Y3 are random variables7 then Y Y1 Y2 is a random vector If the means of Y17 Y2 and Y3 are 1 M2 and 3 then the mean of the random Y3 1 vector is EY M2 3 The variancecovariance matrix of the random vector Y is de ned to be VaTY1 CovY17 Y2 CovY17 Y3 02Y VaTY C0vY2Y1 Va39rY2 C0vY2Y3 Com3 Y1 Cove3 Y2 woe Some basic results Let Y be a random vector and A is a matrix Let W AYi Note that the number of columns of A should be equal to the number of rows of Y in order to be able to de ne W1 Then the following is true Fact a AEY7 b VaTW AVaTYAi Y1 1 Example Let Y Y2 be a random vector With the mean vector 0 and variance covariance Y3 3 1 5 0 1 2 1 Y 2Y Y matrix 5 2 i1 iLetA iThenWAY 1 2 3 1S0 W1 0 1 1 1 71 0 Y1 7Y2 Y1 2Y2 Y3 and W2 Y1 7 Y2 Then 1 WPAEWFG fl 3 0gtlt gtASOEW1gt3WltWQH 3 Now 1 2 1 1 5 0 1 1 12 4 2 6 VaTW 7 AVaTYA 7 5 2 1 2 71 7 7 i 1 71 0 7216 2 0 1 1 1 0 Hence VaTW1 12147 C0vW1W2 7216 and VaTW2 Simple linear regression model In vector matrix form7 the simple linear regression model is Y X a where Y is n X 17 X is n X 27 is2gtlt1andeisngtltli Note that Ee 07 where 0 is a n X 1 vector of zeros and Vare 021 where is the n X n identity matrix We can also say that EY X5 and VaTY 0211 Normal equations Calculations will show that X X 1 EXi X Y i Z Xi Z X Z Xin39 The normal equations are 71170 ZXibl Zyi ZXibo 7 2X25 7 ZXZYZ Clearly the normal equations can be rewritten as X Xb X Yi So b X X 1X Yi Fitted values and residuals The vector of tted values is Y Xb7 and the vector residuals is e Y 7 Xb Y 7 XX X 1XY I 7 HY where H XX X 1Xi The matrix H is symmetric and HH H a matrix with the last property is called an idempotent matrix Recall that the vector residuals e estimates the vector of unknown errors at We know that Ee 0 and Vare 02L For the vector of residuals it can be shown that Ee 0 and Va39re I 7 HUQi Analysis of variance SSTo 7 202 7 Y2 7 2Y3 7 n37 7 Y Y 7 M7 SSE 7 Em 7 Y2 7 Y 7 Y 7 Y 7 Y 7 Xb Y 7 Xb 7 Y Y 7 b X Yi SSR 7 20 7 Y 7 SSTO 7 SSE 7 b X Y 7 m7 Note that all the sums of squares are quadratic forms in Y1 lnferenoe Recall that b X X 1XYi Here is an important fact Fact i Eb ii Va39rb X X 1a2 and7 iii 32b X X 1MSEi Here an important fact that comes up in a lot of cases Fact Let 9 u Where u is a known 2 X 1 vector Then S u bi We then have 1 199 7 9 11 Vm 7 u VaTbu 7 u X X 1ugtlt02 11032 9 7 1132 bu 7 u X X 1ugtltMSE For the housing data we have 19 29367 1429 7 7 X X 7 X Y 7 2Y3 7 Y Y 7 108032 X 7 151719 and Y 7 29367 4735174 2258313 7512111 X5071 lt 611084 713852 b XX71XY lt 23981 gt 713852 10245 21941 Fitted regression line Y 28981 21841X1 SSTO 7 202 7 17 7 EYE 7 71172 7 Y Y 7 72172 7108032 7 191517192 7 556108 SSE Y Y 7 b X Y 108032 710782883 203117 SSR SSTO 7 SSE 3521911 MSE 7 711195121 7310020 7416040 7416040 2929 So 32120 7310020 32121 12929 and 020121 74160401 Estimation of mean response at Xh 1851 525 7 X X 1MSE 7 Let Xh 7 1 so 14 7 be thl 7 X213 7 8313901 1 1 Xh 135 320 7 X252bXh 7 218938 and 30 71170111 A 95 con dence interval for the mean selling price at Xh 1815 is Yb i 11975 n 7 23Yhi1e1831390 i21110117011ile183139 i 3159i1e1 81190 8398 Prediction interval when house size Xh 1815 1 7 1 Xh 135 32pred MSE 32 MSE X232 bXh 1119512 28938 1418450 3pred 3185291 A 95 prediction interval for the selling price at Xh 1815 is Yb i 11975 n 7 2spred i1e183139 i 21110318529 Le 8339 i 313 Le 75126 9152 Let Xh 7 lt so 14 7 be thl 7 X213 7 8313901 Handout 8 Matrix approach to regression analysis Recall that the simple linear regression model is given by 11 50 61Xi 5 Where X17 Y17 in X77 Yn are the observations7 and 51 in an are independent With mean zero and variance 021 This model can be rewritten as Y X5 57 Y1 1 X1 51 Y2 1 X2 3 52 WhereY 1 7X t t 5 0 ande t 51 Yn 1 Xn 5n Random vectors and matrices A random vector is a vector of random variables So if Y1 Y2 and Y3 are random variables7 then Y Y1 Y2 is a random vector If the means of Y1 Y2 and Y3 are 1 M2 and 3 then the mean of the random Y3 1 vector Y is EY 2 3 The variancecovariance matrix of the random vector Y is de ned to be VaTltY1gt C0vY1Y2 C0vY1Y3 02Y VaTY C0vY2Y1 Va39rY2 C0vY2Y3 Comm Com339 woe Some basic results Let Y be a random vector and A is a matrix Let W AYt Note that the number of columns of A should be equal to the number of rows of Y in order to be able to de ne Wt Then the following is true Fact a AEY7 b VaTW AVaTYAt Y1 1 Example Let Y Y2 be a random vector With the mean vector 0 and variancecovariance Y3 3 l 15 0 1 2 1 matrix 15 2 11 1 Let A lt 1 Then 0 1 1 7 Y 2Y Y WAYlt1 23gt YrY2 So W1Y12Y2Y3 andW2Y17Y21 Then 1 EWAEYlt1 f1 0 3 So EW1 4 and EW2 11 Now 1 2 1 1 5 0 1 1 12 4 2 6 VaTW 7 AVaTYA 7 lt gt 15 2 11 2 71 lt 7 1 71 0 7216 2 0 11 1 1 0 Hence VaTW1 12147 C0vW1W2 7216 and VaTW2 21 Simple linear regression model In the vectormatrix form7 the simple linear regression model is Y X 5 Where Y is n X 17 X is ngtlt27 is2gtltlandeisngtlt11 Note that Ee 07 Where 0 is a n X 1 vector of zeros and Va39re 02L Where I is the n X n identity matrix We can also say that EY X5 and VaTY 0211 Normal equations Calculations Will show that X Xlt 1 2X1 X Ylt EYi EXi EX EXiYi The normal equations are 71110 ZXibl Zyi ZXibo Zng1 7 ZXiiZ Clearly7 the normal equations can be rewritten as X Xb X Y1 So b X X 1XY1 Fitted values and residuals The vector of tted values is Y Xb7 and the vector residuals is e 7 Y 7 Xb 7 Y 7 XX X 1X Y 7 I 7 HY Where H XX X 1X i The matrix H is symmetric and HH H a matrix With the last property is called an idempotent matrix Recall that the vector of residuals e estimates the vector of unknown errors a We know that Ee 0 and Vare 021 For the vector of residuals7 it can be shown that Ee 0 and Va39re I 7 H027 Analysis of variance SSTO 7 202 7 Y2 7 2Y3 7 n37 7 Y Y 7 n37 SSE 7 Em 7 Y2 7 Y 7 Y 7 Y 7 Y 7 Xb Y 7 Xb 7 Y Y 7 b X Yi SSR 7 20 7 Y 7 SSTO 7 SSE 7 b X Y 7 m7 Note that all the sums of squares are quadratic forms in Y7 Inference Recall that b X X 1XYi Here is an important fact Fact i Eb ii Va39rb X X 102 and7 iii 32b X X 1MSEi Here an important fact that comes up in a lot of cases Fact Let 9 u Where u is a known 2 X 1 vector Then 3 u bi We then have i E 7 9 ii Vm 7 u VaTbu 7 u X X 1ugtlt02 iii 329 7 u 32bu 7 u X X 1ugtltMSE For the Housing data we have XX 19 29867 7XY 1429 29867 4735774 2258313 2Y3 7 Y Y 7108032 X 715719 and Y 7 7572117 X5071 lt 61084 73852 b XX71XY lt 28981 gt 73852 70245 27941 Fitted regression line Y 287981 27841Xi SSTO 7 202 7 Y 7 EYE 7 M72 7 Y Y 7 M72 7108032 7 191577192 7 556708 SSE 7 Y Y 7 b X Y 7108032 710782883 7 203717 SSR 7 SSTO 7 SSE 7 3527917 MSE 7 711795127 7370020 746040 746040 2929 So 32120 73700207 32121 72929 and 020121 74760407 5203 7 X X 1MSE 7 Estimation of the mean response at Xh 185i 1 7 1 Xh 7 185 so Yb 7 b0 7 thl 7 X213 7 83390 Let Xhlt 32m X252bXh 28938 and 3Yh 1870118 A 95 con dence interval for the mean selling price at Xh 185 is Yb i X975 n 7 2sYh iiei 83390 i 281101i7011 ie 8339 i 35913 8190 8898 Prediction interval when the house size is Xh 185 z 1 7 1 Xh 7 185 so Yb be thl X213 833908 82p7 6d MSE S2 MSE X282 bXh 119512 28938 1484507 8pred 38529 A 95 prediction interval for the selling price at Xh 185 is Yb i X975 n 7 2spred ie 8339 i 281103i8529 ie 8339 i 8 13iie 75826 9152 Let Xhlt

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "When you're taking detailed notes and trying to help everyone else out in the class, it really helps you learn and understand the material...plus I made $280 on my first study guide!"

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.