Linear Regression and Correlation
Linear Regression and Correlation ENGR 0020: Probability and statistics for Engineers I
Popular in Probability and Statistics for Engineers 1
Popular in Engineering and Tech
One Day of Notes
verified elite notetaker
This 6 page Class Notes was uploaded by Emily Binakonsky on Friday April 3, 2015. The Class Notes belongs to ENGR 0020: Probability and statistics for Engineers I at University of Pittsburgh taught by Maryam Mofrad in Spring2015. Since its upload, it has received 79 views. For similar materials see Probability and Statistics for Engineers 1 in Engineering and Tech at University of Pittsburgh.
Reviews for Linear Regression and Correlation
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 04/03/15
Simple Linear Regression Emily Binakonsky and Correlation 1 Simple Linear Regression and Correlation a Introduction to Linear Regression It s the area of statistics that deals with investigating the relationship between two or more variables related in a nondeterministic way 3 30 3135 An example of this type of relationship is the linear relationship between velocity and acceleration v u aAt Relationships are unknown and typically too complicated to be explained by only a few explanatory variables Thus we use statistical models to approximate these relationships The objective of regression analysis is to use information about one variable to study and make inferences another variable Using the simple linear regression of X and Y a The simplest deterministic mathematical relationship between X and Y is the linear relationship y230 1x Introduction t0 Linear Regression iquot j 13 It I lgirt ELLI Al1near rellatmnshlp u imp slopre b The Simple Linear Regression SLR Model If x and y are not deterministically related then for a fixed value of x the value oy y is random The variable whose value is fixed by the experimenter denoted x is the independent predictor explanatory variable For a fixed x the second vaiable will be a random variable Y with observed value y referred to as the dependent response variable There exists parameters 30 Bl and 02 st for any fixed value of x the dependent variable Y is related to x through the model Y2 0 1x Simple Linear Regression Emily Binakonsky and Correlation Where 8 is a normal random variable random deviation With E 0 and Var 02 The Simple Linear Regression SLR Model 3 i L39Tme Regrese n Una E39 if lief515 5r Figum quot2 llg p lh li 11quot Liam SL llEl d 3111qu lhi true mgrm mi n Hi inr H 5 c Least Squares and the Fitted Model An important aspect of regression analysis is to estimate the parameters 80 and 81 the estimate regression coefficients The method of estimation is the Least Squares method The fitted regression line is y 90 blx Where the 3 7 is the predicted fitted value and the b0 and b1 are the estimated regression coefficients The ith residual ei is given by ei yi 371 The vertical deviation of the point xi yi from the line y 90 b1XiS yi 90 The sum of the vertical deviations from the points X11311 3621312 I x11 yn is n n n SSE 2 e 201i M 2m b0 blx 2 i1 i1 i1 Simple Linear Regression Emily Binakonsky and Correlation 1 Estimating the Regression Coefficients 192304331X bob1x b B 2131351393 139 21xi21i113 i 1 1 2 i11xi2 Zi3911xi Z1yi b121xi 9030 n 237 19135 a Create a scatter plot first then calculate formulas b Have at least 4 decimal places after the decimal c The LS method is a way to find the line that minimizes the distance between the observed yvalues and the yvalues on the line d The bestfit line is then the one having the smallest possible sum of squared deviations from the line 3 Fitted Values and Residuals The fitted predicted values yl jquot are obtained by substituting x1 xn into the equation of the estimated regression line yl 230438135 n n nxn The residuals are the vertical deviations 3 1 yl yn yn d Properties of the Least Squares Estimators Values of b0 and b1 based on a given sample of n observations are only estimates of true parameters Boand 81 The resulting estimates will differ from experiment to experiment These different estimates may be viewed as values assumed by the random variables B0 and B1 while b0 and b1 are specific realizations Simple Linear Regression Emily Binakonsky and Correlation Values of x remain fixed the values of B0 and B1 depend on the variations in the values of y or in the values of random variables Yi where i1 2 n Mean and variances estimators A 2 xi 373quot B l 1 l 206139 302 1 Mean a 31 81 2 Variance 2 02 a VaT39B1 031 206 BO 30 3 Bl 1 Mean a 30 g 2 Variance 02296 a VaTBO 050 Estimation of 02 n n Sxx 20 332 Syy 201i 3 02 i1 i1 Sxy 205i 3Eyi 1 n n SSE 207i yi2 Syy b15xy i1 i1 5x where 191 y Sxx The 02 parameter tells us the amount of variability inherent in the regression model 1 A large spread tells us that the data points are not tight around the regression line 2 A small spread tells us that the data points are close to the regression line Simple Linear Regression Emily Binakonsky and Correlation 11 A2 SSE y 31 b15xy 5sn2 1H 5 1H M5 The error sum of squares SSE can be interpreted as a measure of how much variation in y is left unexplained by the model that is how much cannot be attributed to a linear relationship and is attributed to variation in y itself e Inferences Concerning the Regression Coefficients The assumptions of the simple linear regression model imply that the standardized variable T Blg l M Confidence Interval for the Slope 3981 has n 2 degrees of freedom bl tg S ZVSXX ZVSXX Where tg is a value of tdistribution With n2 degrees of freedom 2 Hypothesis Testing on the Slope l Null Hypothesis H0 l 310 2 Test Statistic value 191 310 S N tn Z V Sxx Rejection Region level a test t 3 Alternative Hypotheses Ha531 gt 310 t Z ta n 2 Ha81 lt 310 t S ta n 2 HaI l 3t lo t S ta 0quot t 2 ta Simple Linear Regression Emily Binakonsky and Correlation Statistical Inference on the Intercept 1 The assumptions of the SLR model imply that the standardized variable Bo 30 T H S igl1xi2 nSxx 2 Confidence Interval for the Intercept 3980 b S 1 xi2 b S 1 xi2 ta lt lt ta 0 3 nSxx o 0 3 nSxx Where the tg is a value oft distribution With n2 df 2 3 Hypothesis Testing on the Slope a Null Hypothesis H0 30 800 b Test Statistic value b t o 300 NM S il1xiz nSxx 4 Alternative Hypotheses Rejection Region level a test Hai o gt 300 t Z ta n 2 Hai o lt 300 t S ta n 2 HaI O 3t oo t S t n2 0739 t 2 15 n2 A Measure of Quality of Fit Coefficient of Determination 1 The quantity R2 is called the coefficient of determination 2 This quantity is a measure of the proportion of variability explained by the fitted model SEE R2 1 ssr Where n A 2 SSE 201 yi the error sum of squares i1 Tl 2 SST 201 y the total corrected sum of squares i1
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'