### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# STATISTICAL METHODS II STAT 516

GPA 3.93

### View Full Document

## 43

## 0

## Popular in Course

## Popular in Statistics

This 4 page Class Notes was uploaded by Shane Marks on Monday October 26, 2015. The Class Notes belongs to STAT 516 at University of South Carolina - Columbia taught by B. Habing in Fall. Since its upload, it has received 43 views. For similar materials see /class/229660/stat-516-university-of-south-carolina-columbia in Statistics at University of South Carolina - Columbia.

## Reviews for STATISTICAL METHODS II

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 10/26/15

possibly incomplete list 0 Topics Covered in Chapters 7 and 8 Chapter 7 Simple Linear Regression The regression model equation as given on page 290 The four assumptions for linear regression as given on pages 291292 How to visualize the linear regression model gure 72 on page 292 Independent vs dependent variable Regression and Causality Regression and Extrapolation Difference between the model parameters and their estimates 30 31 E vs 30 31 e Regression to the mean and why we look at slices of x The estimates of 30 and 31 are gotten by minimizing the sum of the squared residuals SSE The analysis of variance table and VMSE TSS is the total errorvariation in y without taking x into account SSE is the errorvariation when we use the line we got from the data SSR TSS SSE is the amount of error we explained by using the regression line the MS are like variances TMS which doesn t show up on the table is the variance of y or what the variance of the errors would be for a at regression line A 2 2 MSE IS the est1mate ofthe varlance ofthe errors 78 se when the regressmn 11ne IS used because we assumed the residuals are normal we can use the MS to do an F test How to nd the mean squares if you are given the sum of squares and degrees of freedom How the degrees of freedom are found for total and error it is the sample size minus the number of parameters estimated for the regression it is df for the total the df for the error F MSIUMSE When we accept or reject 31 0 based on the ANOVA table Interpreting 31 VMSE and the pvalue The ttest and confidence interval for 31 and how to find MSE S xx from the SAS output The predicted value for y at a given x The confidence interval for the mean response regression line MW page 304305 The prediction interval for a new observation yy x page 304305 What the different parts of the standard error for 0ij and 9ij are due to Why isn39t the variance for the predicted value of y given x just the MSE The correlation as a single number summary of regression r is the slope of the regression line after adjusting for the scale of y and x r2 coefficient of determination percent of variation in y explained by the regression larger r2 implies larger F statistic A feeling for what r2 value different data sets will have by looking at the y vs x plot How to use the residual vs predicted plot to check that the linear form is appropriate mean of errors is zero and that the errors have constant variance both at each level of the independent variable How to use the QQ plots to check that the errors are approximately normally distributed That linear regression is robust and what that means Reading the SAS output NOT The test and confidence interval for the correlation coefficient on page 318 Chapter 8 Multiple Regression Uses m independent variables instead of 1 so that there are ml The coefficients in multiple regression re ect the change in the value of the dependent variable when one of the independent variables changes and the rest stay the same This is sometimes impossible That the ANOVA table ijSreS and r2 work the same as in simple linear regression The hypotheses that the ANOVA table Type I Tests and Type III Tests test in the supplement to 83 Using the logarithm or square root of the dependent variable to stabilize variance not in text Taking logarithm of x and logarithm of y changes the linear regression into a multiplicative one pg 375378 Multicollinearity is when several of the predictor variables are highly correlated This can lead to l parameter estimates being unintuitively negative 2 concluding a variable doesn39t in uence the dependent variable when it actually does Note that multicollinearity doesn39t violate any of the assumptions though The tests actually work they just don39t answer the question we quotwant them toquot VIF can be used to measure the multicollinearity of a variable No variance in ation is a value of l A VIF of 10 or more is commonly taken to mean that there is severe multicollinearity In reality much smaller values can be associated with trouble Why getting more data can help reduce multicolinearity Why scaling things appropriately for rate of in ation population etc can help reduce multicolinearity Why r2 is bad for comparing models with different numbers of x variables and why we need to have an adjusted r2 Using Mallow s CF to remove some variables Ifthe model with all of the variables is good then any model with Cp near pl or less p terms left in the model should be unbiased eg an ok choice Might then choose one with as few variables as possible if concerned with being easy to explain or choose one with the biggest adjusted r2 if you want to predict accurately and you might have to keep certain variables in or out for political reasons Potential Leverage Hat Diagonal h measures the potential of the observation to change the regression based only on the x values That is it indicates how similar the observations x values are to the other observations It indicates a possibly troublesome x value if it is much larger than the other values Rule ofthumb is to be concerned if gt 2mln H in SAS Externally Studentized Residuals the residuals rescaled to take into account the MSE so that they don t depend on the scale of y and so that they are calculated without the observation in question Should be distributed similarly to a standard normal distribution so that values larger than 2 should be uncommon and larger than 3 should be rare RT in SAS In uence 7 DFITTS combines the leverage and studentized residual and gives a measure of how much removing the observation would change the model Values Greater than 21km l n are typically taken to indicate removal will have a significant effect F in SAS Why a group of 2 or more points might be outliers but not show up as having leverage or in uence What you can do if you have an outlier NOT The material in the text for 82 and 83 as opposed to the supplement which you do need to know NOT The partial correlation on page 364365 NOT Polynomial regression on page 370374 although its an interesting topic and you should read about it possibly incomplete list 0 Topics Covered in Chapters 1 t0 6 Chapter 2 Simple Linear Regression The regression model equation as given on page 14 The four assumptions for linear regression as given on page 14 or in class Independent vs dependent variable Regression and Causality eg storks bringing babies example Regression and Extrapolation The estimates of B0 and B1 are gotten by minimizing the sum of the squared residuals The analysis of variance table SSW is the error when we assume H0 is true the error in Y without taking X into account SSres SSE is the error when we use the line we got from the data SSng SSm SSres is the amount of error we explained by using the regression line the MS are like variances MSW is the estimated error variance syz when 51 0 MSres MSE is the estimated error variance sy xz when the regression line is used because we assumed the residuals are normal we can use the MS to do an F test Finding the mean squares given the sum of squares and degrees of freedom How the degrees of freedom are found for total and error it is the sample size minus the number of parameters estimated for the regression it is df for the total the df for the error F MSreg MSres When we accept or reject B1 0 based on the ANOVA table How we find the SS for the ANOVA table How the MS relate to the variances of the errors The ttest and confidence interval for 51 and how to find sbl from the SAS output The predicted value of y given x The confidence interval for the line of means for the regression line page 3738 The con dence interval for a new observation for predicting an individual page 3839 What the different parts of the variances sYhat and sYneW are due to Why isn39t the variance for the predicted value of y given x just the MSE The correlation as a single number summary of regression larger r2 implies larger F and smaller jMSreS r2 coefficient of determination percent of variation in the dependent variable that is explained by the regression using the independent variable and r is the slope of the regression line after adjusting for the scale ofy and x How to use the residual vs predicted plot to check that the linear form is mean of the errors is zero and that the errors have constant variance both at each level of the independent variable How to use the QQ plots to check that the errors are approximately normally distributed That linear regression is robust and will still generally perform well if there are small violations of the assumptions NOT Comparing two slopes pg 2728 Chapter 3 Multiple Regression Uses k independent variables instead of 1 so that there are kl Bs The coefficients in multiple regression re ect the change in the value of the dependent variable when one of the independent variables changes and the rest stay the same This is sometimes impossible That the ANOVA table 1lMS 76S The hypotheses that the ANOVA table Type I Tests and Type III Tests test in the supplement and R2 work the same as in simple linear regression NOT Dummy Variables Matrix Notation Polynomial Regression or Interactions pg 73 106 Chapter 4 Outliers and Transformations Using the logarithm of the dependent variable to stabilize variance Taking logarithm of X and logarithm of y changes the linear regression into a multiplicative one The common transformations to fix the case of nonlinearity the diagram put up in class What you can do if you have an outlier Potential Leverage Hat Diagonal h measures the potential of the observation to change the regression based only on the x values It indicates a possibly troublesome x value if it is much larger than the other values Rule of thumb is to be concerned if gt 2kln H EXtemally Studentized Residuals the residuals rescaled to take into account the MSE so that they don t depend on the scale of y and so that they are calculated without the observation in question Should be distributed similarly to a standard normal distribution so that values larger than 2 should be uncommon and larger than 3 should be rare RT In uence 7 Cook s D combines the leverage and internally studentized residual and gives a measure of how muchremoving the observation would change the model Greater than 1 are possibly worth some additional attention and gt 4 seriously affect the model D Why a group of 2 or more points might be outliers but not show up as having leverage or in uence Chapter 5 Mnlticolinearity Multicollinearity is when several of the predictor variables are highly correlated This can lead to 1 parameters being unintuitively negative 2 concluding a variable doesn t in uence the dependent variable when it actually does Note that multicollinearity doesn t violate any of the assumptions though The tests actually work they just don t answer the question we quotwant them toquot VIF can be used to measure the multicollinearity of a variable No variance in ation is a value of l A VIF of 4 should be of concern and a VIF of 10 or more indicates that there is very severe multicollinearity Why getting more data can help reduce multicolinearity its in the readings NOT Centering 203210 Principal Components 219237 Chapter 6 Model Selection Why we need to have an adjusted R2 Using Mallow s CF to select a good model To avoid bias want CF to be near kl for k terms left in That a good regression model depends on more than having good fit statistics It has to be simple enough to understand and ii has to make sense to the users NOT Press 251 252

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "I signed up to be an Elite Notetaker with 2 of my sorority sisters this semester. We just posted our notes weekly and were each making over $600 per month. I LOVE StudySoup!"

#### "There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

#### "Their 'Elite Notetakers' are making over $1,200/month in sales by creating high quality content that helps their classmates in a time of need."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.