# Week 5 Notes AMS 5

UCSC

GPA 3.24

## About this Document

This 4 page Class Notes was uploaded by Sandy Nguyen on Monday October 26, 2015. The Class Notes belongs to AMS 5 at University of California - Santa Cruz taught by Prof. Bruno Mendes in Fall 2015.

Date Created: 10/26/15

Chapter 9 More Correlation Tuesday October 20 2015 548 PM Correlation Coefficient r pure number without units amp is not affected by 1 Interchanging the 2 variables 2 Adding or subtracting a constant to or from all the values of one variable 3 Multiplying or dividing all the values of one variable by a positive constant r is the average of the products of x amp y after being converted to standard units The products are not dependent on the order of the factors By converting x amp y to standard units before calculating r the value of r is independent of the units used Correlation coefficient measures linear association only Correlation Coefficient and Standard Deviation How a scatter diagram looks depends on the SDs The correlation coefficient measures clustering in relative terms relative to the SD Utilizing Correlation Coefficient Only for quotfootball shapedquot scatter diagrams Outliers can alter the value of r Ecological Correlation based on rates amp averages often used in political sciences and sociology Often tend to overstate the strength of an association There is considerable amount of variation between individuals Taking the rate or averages of groups removes some of the variation and makes it seem like there is more clustering Data on a group level ecological data With data on group level we cannot really make claims about subgroups or individuals Conclusions on individuals requires data on individuals Summary Relationship between two variables x amp y can be summarized by each average each SD and the correlation coefficient r Correlation coefficient r measures linear association between x amp y r average of x in standard units x y in standard units r is always between 1 and 1 The closer r is to 1 or 1 the stronger the linear association gt If r is closer to O the weaker the linear association r can only be used for quotfootball shapedquot scatter diagrams Ecological data compares data on the group level Chapter 10 Regression Tuesday October 20 2015 629 PM Regression Two correlated variables With knowledge of the value of one variable we can make better predictions about the value of the other variable The Regression Line For V on x estimates the average value of y corresponding to each value of x f gt Increase in 1 SD in x results in an increase of only r SDs in y If graph of averages follow a straight line the line is the regression line gt The regression line is a smoothed version of the graph of averages fr Line that goes through the point of the averages amp has slope 2 Should not be used when there is a non linear association between the variables Thl IIFAII o IJYJ Red line x on y gt Begins on the horizontal x axis 39 39J r Black line V on x 1 g gt Begins on the vertical y axis 4A Regression Method way of using the correlation coefficient r to estimate the average value of y for each value of x Regression estimate resulting value ofy Regression Line Equation Standard equation quotyquot prediction variable slope x quotxquot given variable intercept Slope r x SDy SDX Intercept averagey slope x averagex Calculating Regression Estimate Method 1 1 Convert x to standard units find 2 value of x by calculating the 2X 2 Multiply the correlation coefficient r x zX 2V 3 Convert Zy back to original units of y zy x SDy averagey Method 2 1 Find slope of regression line 2 Find intercept of regression line 3 Use the standard equation for the regression line Regression for Percentiles amp Percentile Ranks Change regression method to use without x gt Instead use the percentile Not interested in finding y gt Instead we look for the percentile rank 1 Find Zx by using normal table at back of text book 2 Calculate r x zX 2V 3 Convert zyto percentile rank by using the normal table Do not use average or SD Only use normal table and correlation coefficient since everything is calculated in standard units Use the normal table only if the scatter diagram is quotfootball shapedquot Regression Fallacy Regression effect in virtually all test retest situations the bottom group on the first test will on average show some improvement on the second test Chapter 11 RIVIS Error for Regression Tuesday October 20 2015 647 PM rms error for regression line used to determine how precise estimates are when using regression method to predict value of y from a given value ofx Actual values are different from predictions due to errors Error actual value predicted value The error is simply the distance of a point above or below the regression line Rms error is smaller than SD of y gt Regression line gets closer to the points than the horizontal line gt It is smaller by the factor 1 r i 1 rms error for the regressmn line ofy on x v1 r x SD gt Units for rms error are same as the units for the variable being predicted 6595 rule L39 1 Approx 68 of points on a scatter diagram are quot within 1 rms error from the regression line 739 397 Approx 95 of points on a scatter diagram are within 2 rms errors from the regression line Rms Error and the Correlation Coefficient Both rms error of regression line amp correlation coefficient describe the spreadclustering of points around the regression line r measures clustering gt If r is close to 1 or 1 the points are tightly clustered gt Measures clustering relative to the SD Rms error measures distance of points to the regression line gt If rms error is small points are closer to the line gt Measures spread in original units of y Summary Regression line can be specified by two descriptive statistics the slope amp the intercept The slope of the regression line for y on x is the average change in y per unit change in x The intercept of the regression line equals the regression estimate for y when x O

