Elementary Statistical Methods
Elementary Statistical Methods STAT 30100
Popular in Course
Popular in Statistics
This 26 page Class Notes was uploaded by Bailey Macejkovic on Saturday September 19, 2015. The Class Notes belongs to STAT 30100 at Purdue University taught by James Dobbin in Fall. Since its upload, it has received 19 views. For similar materials see /class/207940/stat-30100-purdue-university in Statistics at Purdue University.
Reviews for Elementary Statistical Methods
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 09/19/15
Lecture 10 Chapter 13 TwoWay Analysis of Variance Twoway AN OVA compares the means of populations that are classified two ways or the mean responses in twofactor experiments Examples 1 J The strength of concrete depends upon the formula used to prepare it An experiment compares six different mixtures Nine specimens of concrete are poured from each mixture Three of these specimens are subjected to 0 cycles of freezing and thawing three are subjected to 100 cycles and three specimens are subjected to 500 cycles The strength of each specimen is then measured Four methods for teaching sign language are to be compared Sixteen students in special education and sixteen students majoring in other areas are the subjects for the study Within each group they are randomly assigned to the methods Scores on a final exam are compared Why is it better to do a TwoWay ANOVA than to just do 2 separate OneWay AN OVAs It is more efficient to study two factors simultaneously rather than separately Your sample size does not have to be as large so experiments with several factors are an efficient use of resources We can reduce the residual variation in a model by including a second factor thought to in uence the response variable lurking variable We are reducing o and increasing the power of the test We can investigate the interactions between factors Assumptions for TwoWay ANOVA 1 We have two factors We have I factor levels for the first factor call N be it factor A and J factor levels for the second factor call it factor B We have I x J combinations of individual factor levels We have independent SRSs of size 111 from each of I x J populations Each of the Ix J populations are normally distributed Each of the I x J populations have the same standard deviation 5 Lecture 10 Chapter 13 Page 1 Model for the TwoWay AN OVA Let xyk represent the kth observation from the population having factor A at level I and factor B at level j The statistical model is 2 5 xi c 1 Uk fori l Iandj 1 Jand k 1 my where the deviations Silk are from an N0o distribution Examples For the two examples above identify the response variable and both factors and state the number of levels for each factor I and J and total number of observations N l The strength of concrete depends upon the formula used to prepare it 2 Four methods for teaching sign language are to be compared Sixteen students in special education and sixteen students majoring in other areas are the subjects for the study Lecture 10 Chapter 13 Page 2 General form of the twoway AN OVA table Source Degrees Sum of Mean square F df df pValue of Square Freedom s Factor A DFA SSA MSA SSA MSA Main 7 A 7 11 DFA MSE effect of DFA DFE A Factor B DFB SSB MSE SSB FEMSB Main Jl DFB MSE effect of DFB DFE B A amp B DFAB SSAB MSAB SSAB MSAB Interactio 7 AB 1nteraction I l J l DFAB MSE n of A DFAB DFE and B Error DFE SSE MSE SSE Within N U DFE Total N l SST MST SST DF T The three hypotheses that are tested are H 0 I main effect of Factor A is zero H 0 I main effect of Factor B is zero H 0 I interaction between A and B is zero Doing the ANOVA is not sufficient Also look at graphs of the marginal means of the combinations to interpret the results Graphical example of no effect effect of A Interaction between A and B Lecture 10 Chapter 13 Page 3 effect ofB Example Exercise 1314 One way to repair serious wounds is to insert some material as a scaffold for the body s repair cells to use as a template for new tissue Scaffolds made from extracellular material ECM are particularly promising for this purpose Because they are made from biological material they serve as an effective scaffold and are then reabsorbed One study compared 6 types of scaffold material Three of these ECMs and the other three were made of inert materials There were 3 mice used per scaffold type The response measure was the of glucose phosphated isomerase Gpi cells in the region of the wound A large value is good indicating that there are many bone marrow cells sent by the body to repair the tissue Here are the data for 2 weeks 4 weeks and 8 weeks after the repair Material Gpi 2 weeks 4 weeks 8 weeks ECM1 7O 55 6O 75 7O 65 65 7O 65 ECM2 6O 6O 6O 65 65 7O 7O 65 6O ECM3 80 75 7O 6O 7O 80 75 75 7O MAT1 5O 20 1 5 45 25 25 5O 25 25 MAT2 5 5 1 O 1 O 1 O 5 1 5 5 5 MAT3 3O 1 O 5 25 1 5 1 5 25 1 O 1 0 Lecture 10 Chapter 13 Page 4 a Make a table giving the sample size mean and standard deviation for each of the materialbytime combinations Is it reasonable to pool the variances Because the sample sizes in this experiment are very small we expect a large amount of variability in the sample standard deviations Although they vary more than we would prefer we will proceed with the AN OVA SPSS DataSplit le select compare groups move one categorical variable material into groups box Analyzecompare means VIeans move the other categorical variable weeks into independent list move gpi into dependent list Descriptive Statistics Dependent Variable gpi material weeks Mean Std Deviation N ECM1 2 Weeks 7000 5000 3 4 Weeks 6500 8660 3 8 weeks 6333 2887 3 Total 6611 6009 9 ECM2 2 weeks 6500 5000 3 4 weeks 6333 2887 3 8 weeks 6333 5774 3 Total 6389 4167 9 ECM3 2 Weeks 7167 10408 3 4 weeks 7333 2887 3 8 weeks 7333 5774 3 Total 7278 6180 9 MAT1 2 weeks 4833 2887 3 4 weeks 2333 2887 3 8 weeks 2167 5774 3 Total 3111 13411 9 MAT2 2 weeks 1000 5000 3 4 weeks 667 2887 3 8 weeks 667 2887 3 Total 778 3632 9 MAT3 2 weeks 2667 2887 3 4 weeks 1167 2887 3 8 weeks 1000 5000 3 Total 1611 8580 9 Total 2 weeks 4861 24363 18 4 weeks 4056 28330 18 8 weeks 3972 28619 18 Total 4296 26961 54 Lecture 10 Chapter 13 Page 5 b Make a table giving the sample size mean and standard deviation for each of the material types Give a short summary of the Gpi depends on material SPSS DataSplit le remove the compare groups Analyzecompare means VIeans move material into independent list move gpi into dependent list Repott gpi material Mean N Std Deviation ECM1 66 1 1 9 ECM2 63 89 9 4167 ECM3 7278 9 6180 MAT1 3111 9 13411 MAT2 778 9 3632 MAT3 1611 9 8580 Total 42 96 54 26 961 3 Make a table of the sample size mean and standard deviation for the weeks after the repair Give a short summary of the Gpi depends on the weeks after repair SPSS same only move weeks into independent box Repott gpi weeks Mean N Std Deviation 2 weeks 4861 18 4 weeks 4056 18 28330 8 weeks 3972 18 28619 Total 4296 54 26 961 Lecture 10 Chapter 13 Page 6 d Run the analysis of variance Report the F statistics with degrees of freedom and P Values for the main effects and the interaction What are the hypotheses you are testing What do you conclude SPSS Instructions AnalyzegtGeneral LineargtUnivariate Move the appropriate variables to the dependent and xed factor boxes Select Model and do not remove the checkmark beside the Include intercept in model Tests of BetweenSu bjects Effects Dependent Variable gpi Type III Sum Source of Squares df Mean Square F Sig Corrected Model 37609259a 17 2212309 86883 000 Intercept 99674074 1 99674074 3914473 000 material 35659259 5 7131852 280087 000 weeks 867593 2 433796 17036 000 material weeks 1082407 10 108241 4251 001 Error 916667 36 25463 Total 138200000 54 Corrected Total 38525325 53 a R Squared 976 Adjusted R Squared Lecture 10 Chapter 13 Page 7 e Make a plot of the means of the combinations Describe the main features of the plot SPSS Instructions Again AnalyzeGeneral LinearUnivariate Move the appropriate variables to the dependent and xed factor boxes To do plots select the Plots box and move factor variables to appropriate horizontal axis or lines boxes click add then click continue then click OK Lecture 10 Chapter 13 Page 8 Lecture Notes For Chapter 11 Multiple Regression Multiple Regression In multiple linear regression more than one explanatory variable is used to explain or predict a single response variable Many of the ideas of linear regression one explanatory variable one response variable carry over to multiple linear regression Multiple Linear Regression Model The statistical model for multiple linear regression is yz o 1x1 2x2 6pxp8 p is the number of explanatory variables in the model The deviationserro r 8 are independent and normally distributed with mean 0 and standard deviation 5 The parameters ofthe model are H 3 p and 5 So what do we do when we have more than one X variable 1 Look at the variables individually Graph stem plot histogram each variable determine means standard deviations minimums and maximums Are there any outliers 2 Look at the relationships between the variables using the correlation and scatter plots Do a scatterplot determine a correlation between each pair of data To determine a correlation between each pair enter all the variables the y and all the x s into SPSS the selectAnalyzegtgtCorrelategtgtBivariate The higher the correlation between 2 variables the lower the Sig 2tailed the better This will help you determine which are the stronger relationships between the y and an x Lecture 13 Sections 111 amp 112 Page 1 3 Do a regression to define the relationship of the variables I start with all potential explanatory variables and the response variable the regression results will indicateconfirm which relationships are strong For multiple linear regression a least squares procedure is used to estimate the parameters 3 31 2 p and 5 The sample hasn observations Perform the multiple regression procedure on the data from the n observations 90 b1 b2 bp denote the estimators of the population parameters p Another notation is b the jth estimator of 8 the jth population parameter wherej 0 l 2 p and p is the number of explanatory variables in the model For the ith observation the predicted response is A y 190 191in ble2 bpxlp The ith residual the difference between the observed and predicted response is el observed response predicted response y yl The method of least squares minimizes q 2 or Z yl y 139 2 The parameter 7 measures the variability of the response about the regression equation It is estimated by Eel2 2 s2 n p l The quantity npl is the degree of freedom associated with S2 Lecture 13 Sections 111 amp 112 Page 2 Con dence Intervals and Signi cance Tests for 8 A level C con dence interval for 8 is b il kSEbj where SE13 is the standard error of b and t is the value for the tnp1 density curve with area C between t and t b is the jth estimator j varies from 0 top To test the hypothesis H 0 I 6 0 compute the t statistic r 1 SE J In terms of a random variable T having the tn p1 distribution the PValue for a test of H 0 against Haz jgt0 is PT2z Ha8lt01s PTsz Hai jio is 2PT2t Lecture 13 Sections 111amp 112 Page 3 AN OVA table for multiple regression Source Degrees of Sum of Mean F Freedom squares square Model p 20 2 SSMDFM MSMMSE Error npl 201 12 SSEDFE Total l 20 2 Analysis of Variance F Test In the multiple regression model the hypothesis Hoz lz zz 6p0 is tested by the analysis of variance F statistic F MSMMSE The Pvalue is the probability that a random variable having the F pnp1 distribution is greater than or equal to the calculated value of the F statistic The Squared Multiple Correlation The statistic SSM 2 1 5f 2 SST 2 207 is the proportion of variation of the response variable y that is explained by the explanatory var1ables x1 x2 x3 xp Even if the pvalue is small you need to look at the R2 If the R2 is small it means the model you are using does not do a good job of explaining the variation Cy Lecture 13 Sections 111amp 112 Page 4 4 Interpret the results Check the regression model assumptions Assumptions 0 Linearity The regression equation must be of the right form to describe the true underlying relationship among the variables To check for linearity look at those scatterplots of y against each X Constant variance The variability of the residuals must be the same for all values of the X variables To check for constant variance make a scatterplot of residuals against predicted values Independence Each explanatory variable should be independent of all the other explanatory variables To check this plot residuals vs each of the explanatory variables What out for collinearity when any of the explanatory variables has a strong linear relationship with another explanatory variable Check those scatterplots of each x vs x the correlations and the R2 Normality The distribution of the residuals must be Normal for the t tests on the coefficients to follow the t distribution exactly To check the normality assumption make a normal probability plot of residuals 5 Re ne the model if necessary You are only interested in keeping the variables which have strong relationships So try deleting deleting the variable with the largest pvalue the weakest relationship and rerun the regression You may have to do this again and again each time deleting a variable with a weak relationship To determine which is the best model look at o the R2 it should not drop too much from the full model 0 pvalues should be lowest Any x variables left in the equations should have a significant pvalue from their t test of their coefficient confidence intervals should not contain 0 o s the standard deviation should be as small as possible 0 the Ftest statistic from AN OVA should be the largest o and pvalue from the AN OVA Ftest should be the smallest Lecture 13 Sections 111amp 112 Page 5 Example 1 As cheddar cheese matures a variety of chemical processes take place The taste of mature cheese is related to the concentration of several chemicals in the final product In a study of cheddar cheese from the La Trobe Valley ofVictoria Australia samples of cheese were analyzed for their chemical composition and were subjected to taste tests Data for one type of cheese manufacturing processes appears in below The variable Case is used to number the observations from 1 to 30 Taste is the response variable of interest The taste scores were obtained by combining the scores from several tasters Three chemicals whose concentrations were measured were acetic acid hydrogen sulfide and lactic acid For acetic acid and hydrogen sulfide natural log transformations were taken Thus the explanatory variables are the transformed concentrations of acetic acid Acetic and hydrogen sulfide H28 and the untransformed concentration of lactic acid Lactic These data are based on experiments performed by G T Lloyd and E H Ramshaw of the CSIRO Division of Food Research Victoria Australia Case Taste Acetic H28 Lactic 1 4 543 3135 0 86 2 209 5159 5043 153 3 39 5366 5438 157 4 479 5759 7496 181 5 56 4663 3807 099 6 259 5697 7601 109 7 37 3 5 892 8 726 1 29 8 219 6078 7966 178 9 181 4898 385 129 10 21 5 242 4 174 1 58 11 349 574 6142 168 12 572 6446 7908 19 13 0 7 4 477 2 996 1 06 14 259 5236 4942 13 15 549 6151 6752 152 16 409 6365 9588 174 17 159 4787 3912 116 18 64 5412 47 149 19 18 5247 6174 163 20 389 5438 9064 199 21 14 4564 4949 115 22 152 5298 522 133 23 32 5 455 9242 1 44 24 567 5855 10199 201 25 168 5366 3664 131 26 116 6043 3219 146 27 265 6458 6962 172 28 0 7 5 328 3 912 1 25 29 134 5802 6685 108 30 55 6176 4787 125 Lecture 13 Sections 111 amp 112 Page 6 a Look at each variable individually using graphs and descriptive statistics Any outliers Enter data in SPSS then selectgt Analyze gtDescriptive StatisticsgtExplore Select Plots options Statistics options M Lecture 13 Sections 111 amp 112 Page 7 3 Lecture 13 Sections 111 amp 112 Page 8 Lecture 13 Sections 111 amp 112 Page 9 134 1L1 822 112 O b Look at a scatterplot and a residual plot of taste versus acetic taste versus H28 and taste versus lactic Do you see any problems j Lecture 13 Sections 111 amp 112 Page 11 Lecture 13 Sections 111amp 112 Page 12 Lecture 13 Sections 111amp 112 Page 13 0 Which explanatory variables X s Acetic HZS Lactic are most strongly correlated to the response variable y taste Select Analyze gt gtC0rrelate gt gtBivariate Correlations taste Acetic H28 Lactic taste Pearson H H H Correlation 1 550 756 704 Sig 2tailed 002 000 000 N 30 30 30 30 Acetic Pears 550 1 618 604 Correlation Sig 2tailed 002 000 000 N 30 30 30 30 H23 Correlation 756 618 1 645 Sig 2tailed 000 000 000 N 30 30 30 30 Lactic Pearson H H H Correlation 704 604 645 1 Sig 2tailed 000 000 000 N 30 30 30 30 quot Correlation is significant at the 301 level 2tailed d Find the LSR line for predicting taste using the three variables Acetic H28 and Lactic Coefficientsa Unstandardized Standardized npff39 ipnt npffir ipnt Model B Std Error Beta t Sig 1 Want 28877 19735 1463 155 Acetic 328 4460 012 073 942 H23 3912 1248 512 3133 004 Lactic 19671 8629 367 2280 031 a Dependent Variable 39l aste Taste 28877 328 Acetic 3912 H28 19671 Lactic e Give the 95 confidence intervals for the regression coefficients of your explanatory variables Do any of the intervals contain the point 0 Lecture 13 Sections 111 amp 112 Page 14 1 Should all the explanatory variables be used in the model What are the t statistics and pvalues for the tests of the regression coefficients of your explanatory variables g What percent of the variation in taste is explained by the LSR line Model Summaryb Adjusted R Std Error of Model R R Square Square the Estimate 1 807a 652 612 1013071 a Predictors Constan b Dependent Variable ANOVAb Sum of Model Squares df Mean Square F Sig 1 Segressw 4994476 3 1664825 16221 000a Residual 2668411 26 102631 Total 7662887 29 a Predictors Constant b Dependent Variable 652 of the variation in taste is explained by the LSR line h What is the value of s the estimator for standard deviation s 1013071 Lecture 13 Sections 111 amp 112 Page 15 i Find the LSR line for predicting taste using the two variables H28 and Lactic Coefficientsa Unstandardized Standardized npff39 ipnt npffir ipnt Model B Std Error Beta t Sig 1 Want 27592 8982 3072 005 H28 3946 1136 516 3475 002 Lactic 19887 7959 371 2499 019 a Dependent Variable 39l aste Taste 27592 3946 H28 19887 Lactic j What is the percent of the variation in taste explained by this LSR line Model Summaryb Adjusted R Std Error of Model R R Square Square the Estimate 1 807a 652 626 994236 a Predictors Constan b Dependent Variable ANOVAb Sum of Model Squares df Mean Square F Sig 1 Segressw 4993921 2 2496961 25260 000a ReSidual 2668965 27 98851 Total 7662887 29 a Predictors Constant b Dependent Variable 652 of the variation in taste is explained by this LSR line k Did removing the explanatory variable acetic improve the overall model Lecture 13 Sections 111 amp 112 Page 16 1 Is the assumption of model normality meet Normal QQ Plot of Unstandardized Residual Expected Normal Value 1 E 1 1 1 1 m u m 2n Observed Value Lecture 13 Seeuons 11 1 a 11 2 Page 17 m The F statistic reported for the second model is F 252600 State the null and alternative hypotheses for this statistic Give the degrees of freedom and the PValue for this test What do you conclude Lecture 13 Sections 111 amp 112 Page 18
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'