### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# Statistical Methods and Computing 22S 105

UI

GPA 3.72

### View Full Document

## 22

## 0

## Popular in Course

## Popular in Natural Sciences and Mathematics

This 7 page Class Notes was uploaded by Vance Bode Sr. on Friday October 23, 2015. The Class Notes belongs to 22S 105 at University of Iowa taught by Mary Cowles in Fall. Since its upload, it has received 22 views. For similar materials see /class/228075/22s-105-university-of-iowa in Natural Sciences and Mathematics at University of Iowa.

## Reviews for Statistical Methods and Computing

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 10/23/15

2281105 Statistical Methods and Computing Introduction to Inference for Regression Lecture 22 Apr 187 2008 Kate Cowles 374 SH7 33570727 kcowles statuiowaedu Idea of linear regression 0 We are considering a population for which a response variable and an explanatory varir able are of interest 0 Example 7 population adult Americans 7 response variable systolic blood pressure sbp explanatory variable age 0 Each value of the explanatory variable de nes a subpopulation of the whole popula tion 7 example subpopulations are all 217yrrolds7 all 227yrrolds7 etc 4 2 Simple Linear Regression o If a scatterplot suggests a linear relationship between 2 variables7 we want to summarize the relationship by drawing a straight line on the plot 0 A regression line summarizes the relation ship between a response variable and an ex planatory variable 7 Both variables must be quantitative 0 de nition A regression line is a straight line that describes how a response variable Y changes as an explanatory variable X changes 7 often used to predict the value of Y that corresponds to a given value of X 0 Each of the subpopulations has its own mean of the response variable7 MY XM example population mean sbp in 217yrr old Americans is some xed but unknown number MY X21 o The means for all these subpopulations lie on a straight line Other ideas of linear regression o The distribution of the response variable in each subpopulation is normali example sbp in 21ryrrold Americans has a normal distribution sbp in 617yrrold Americans also follows a normal distribution7 but with a different mean WX61 o The standard deviation of the response vari able is the same in all the subpopulationsi The population regression line 0 We can write the population regression line as MYXz a W o a and are unknown population parameters 0 is the slope of the line i For a lrunit increase in X7 we would expect a change of units in Y slope is rise over run 6 What s so great about all this We can describe the means of all the subpopur lations by describing one straight line o It takes only 2 numbers to specify a straight liner 0 We can use sample data to estimate these 2 numbers 0 The estimated line summarizes the relation ship between the two variables in our sample data 7 similar to how i summarizes sample vale ues of a single variable 0 We can use the estimated line to predict fur ture values of the response variable based on the explanatory variable 8 o a is the intercept of the line i This is MY X0 Often the notion of a subpopulation for which X 0 is not meaningfuli Example There are no adults of age 0 7 In these cases7 consider the intercept to be the number that makes the line t core rectly in the range of observed X valuesi 9 Example Powerboats and manatees in Florida Data on powerboat registrations in 1000s in Florida and the number of manatees killed by boats in Florida OBS YEAR POWERBT KILLED 1 1977 2 1978 460 21 3 1979 481 24 4 1980 498 16 5 1 981 513 24 6 1 982 512 20 7 1983 526 15 8 1984 559 34 9 1985 585 33 10 1 986 614 33 11 1987 645 39 12 1988 675 43 13 1 989 71 1 5O 14 1 990 719 47 11 Using sample data to estimate the in tercept and slope oWe will write an estimated regression line based on sample data as gabz o a is the estimated intercept7 and b is the es timated slope 0 Example the estimated regression line for the manateesrandrpowerboats problem is g 7414 0125x o This means that for a lrunit increase in power boat registrations we would expect 01125 more manatees to be killed 7 Since we are measuring powerboat regisr trations in 10007s7 this means for every ad ditional 1000 powerboat registrations7 we expect 01125 more manatees to be killed 10 12 Scatterplot Plot of KILLEDPOWERBT Symbol used is KILLED 60 40 20 O 77 777777 77 777777 77 777777 77 777777 77quot 400 500 600 700 800 POWERBT 0 Note that it makes no sense in this problem to say that the intercept 74114 is the numr ber of manatees that we would expect to be killed in a year when there were no power boat registrations 0 An estimated regression line is mean ingful only for the range of X values actually observed 7 In the manatee problem7 this is 450 to 725 thousands The estimated intercept makes the linear relationship come out right over this range of X values 13 Prediction using an estimated regres sion line Example What is the predicted number of man atees killed in a year when there are 600 thou sand powerboat registrations 7414 0125600 336 Qgt Notation Recall o yz is the observed value of the response vari able for subject 1 o is the value predicted by the regression line for subject 1 yAZ39 1 Jr o A residual is the difference between an ob served value and a predicted value of the re sponse variable 779239731239 14 Least squares choosing the best es timated line a and b are estimated by choosing a line as folr lows o for each observed value yz in the sample data7 compute the distance from yz to the line 0 square each of the distances 0 add up all the squared distances 0 choose the line that makes the sum of these squared distances the smallest 16 How well does the regression line pre dict the response variable 0 The coef cient of determination or R2 i the square of the correlation coef cient be tween the response variable and the ex planatory variable the proportion of the variability among the observed values of the response vari able that is explained by the linear regres sion 0 Example in the manatee data7 R2 08864 7 886 of the variability in number of man atee deaths is explained by number of power boat registrations 17 Inference about the slope and intercept o The least squares estimates of the intercept and slope based on our data are the point esr timates of the population intercept and slopel a is the point estimate of the population intercept a i b is the point estimate of the population slope 0 As usual7 we also need to estimate the varir ability in our point estimates in order to come pute con dence intervals and carry out hy pothesis tests ilel7 we need the standard errors of a and 7 These depend on the sample standard de viation of the data 19 Con dence intervals for the regression slope o The population slope usually is the paramr eter in which we are most interested in regres sion 0 We need not only the point estimate 1 but also an interval that expresses the amount of uncertainty in the estimate 0 As usual7 the form of the con dence interval is estimate i tSEestZmate b i tquotSE12 18 sy x 7 the sample standard deviation from regression o This is the estimate of the common Gym in all the subpopulations 1 s n 7 2 restdualz2 1 2 7172292 12 o n 7 2 is the degrees offreedom Recall that a bzzu That is7 there are two estimated quantities7 a and 197 in volved in calculating the 322s The degrees of freedom is the sample size 71 minus the number of estimated quantitiees that are involved in calculating the sample standard deviation 20 o The standard error of the leastrsquares slope bis 8 7 y I SE12 7 VELIWZ39 i o For a twosided7 level C con dence interval7 tquot is the upper 130 cutoff for a t distribution with n 7 2 degrees of freedoml Example the manatee data proc reg data model kllled p run gt Model MODEL1 Dependent Varlable KILLED Analysls of Varlance um o an Source DF Squares Square F Value Model 1 171197866 171197866 93615 Error 12 21944991 1828749 0 Total 13 193142857 Root MSE 427639 Risquare 08864 Dep Mean 2942857 Adj R sq 08769 cv 1453141 i From Table C7 this is 211791 0 So our 95 con dence interval is 01249 i 217900129 01249 in 002811 00968 01530 0 We are 95 con dent that the unknown pop ulation slope lies in this interval manatee owerbt clb clb Optlon prlnts confldence 1ntervals or regresslon Coefflclents Parameter Estlmates Parameter Standard T for 0 Varlable DF Estlmate rror Parameter0 Prob gt m INTERCEP 1 7411130439 741221723 75589 00001 POWERBT 1 0124862 001290497 9675 00001 Parameter Estlmates Varlable DF 95 Confldence lelts Intercept 1 75758027 72528060 powerbt 1 0 09674 015298 0 s 4276 W o The estimated slope b 012491 0 SE12 00129 0 To construct a 95 con dence interval for the unknown population slope we need the upper 1025 cutoff for at distribution with n 7 2 12 degrees of freedonli 24 Testing the hypothesis of no linear re lationship 0 We often want to test the null hypothesis that there is no linear relationship between the explanatory variable and the response variable Hoi 0 o If the slope is 07 the regression line is horir zontali This says that the means of all the subpopulations are the sanlel That is7 there is no linear relationship no correlation be tween the two variables 0 Usually the alternative hypothesis of interest is tworsidedi HZ 37 0 o The test statistic is a t statistic b t SE 25 26 o The prvalue is obtained by comparing the ob Example the manatee data served value of the t statistic to a t distribur Parameter Estlmates tion With 71 7 2 degrees of freedomi Parameter Standard T for H0 Variable DF Estlmate Error Parameter0 Prob gt m INTERCEP 1 41V430439 7111221723 5V589 0390001 PUWERBT 1 0124862 001290497 939675 0390001 o Let7s carry out the hypothesis test at the a 05 signi cance level 0 The t statistic value is 96757 and the prvalue is less than 0 0001i o Therefore7 we would have had less than 1 chance in 107000 of obtaining sample data that produced a t statistic this far away from 0 or farther if the true population slope was 0

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "Selling my MCAT study guides and notes has been a great source of side revenue while I'm in school. Some months I'm making over $500! Plus, it makes me happy knowing that I'm helping future med students with their MCAT."

#### "There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

#### "Their 'Elite Notetakers' are making over $1,200/month in sales by creating high quality content that helps their classmates in a time of need."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.