Statistical Analysis

by: Mr. Alex Berge

Statistical Analysis STAT 401

Marketplace > University of Idaho > Statistics > STAT 401 > Statistical Analysis
Mr. Alex Berge
GPA 3.55

Christopher Williams

About this Document

Christopher Williams
Class Notes
25 ?




This 3 page Class Notes was uploaded by Mr. Alex Berge on Friday October 23, 2015. The Class Notes belongs to STAT 401 at University of Idaho taught by Christopher Williams in Fall.


Date Created: 10/23/15
1 Visualization checking assumptions for ANOVA As with regression analysis it is important both to visualize the data and to assess the assumptions of the ANOVA model We can use boxplots to visualize data for ANOVA models and we can look at residual bypredicted plots and normal plots to check the assumptions of homogeneous variance and normality See the computer programs for the cuckoo data for details 2 Multiple Comparisons If the ANOVA null hypothesis H0 M1 M Mk is rejected we often wish to learn more about which populations means differi For example for the cuckoo egg data we may wish to further address which species have different mean cuckoo egg lengths in their nests This involves testing other hypotheses such as H0 M1 2 that involve a subset of the group means The concern then arises about controlling the Type I error rate if we make many such tests There is a vast literature concerning multiple comparison tests enough to generate entire texts and to occupy semesterlength courses We will cover only a small amount of this material consisting of a few common proceduresi Three important questions that we must address when performing multiple comparison tests are 1 Did we propose the hypotheses to be tested before collecting the data a priori or after looking at the data post hoc 2 If the comparisons are a prion do they involve independent orthogonal comparisons 3 Are the comparisons pairwise or do they involve more complicated hyptheses such as H0 1 22 3 The best tting line In our previous lecture we considered 110 10 to predict yl calories from xi fat One measure of how well this line ts the data is given by 5 91 171 92 i 772 93 i 7792 94 i 7702 95 i 775 i 7239 i1 which is called the sum of squared errors7 or SSE Note that the SSE is a function of the slope and intercept that we are using7 so for the linear equation yi 60 61 8139 we can write that SSElt6061gt Zol 271 2m 7 60 61 i1 11 For our line above for the cereal data we get SSEWO 11061 10 110 7110 110 7120 120 7120 130 7130 150 7114 200 ls this the smallest SSE possible Note that other criteria exist besides SSE for choosing a best tting line We can use calculus to nd which slope and intercept will yield the smallest SSE The solutions7 called the least squares estirnators7 are given by BleH ma a Z iifz For the cereal data the results are 2 TWM 7 7 71414 lt414 404 l66 1626 7200 and n 2 if 7142 742 742 62 162 52 so then i1 Bl 7252 1385 and BO 124 7138514 1046 so the least squares line is 1046 1385 When we calculate the SSE for the least squares line we nd it is equal to 12317 so it does have smaller SSE than the previous line The least squares estimates can be calculated in SAS in PROC REG other SAS procedures can also be used Look at the SAS code example and output for this lecture A best tting line that had SSE 0 would indicate that the line perfectly t the data A large value of SSE can result from either i much variation in the errors 8139 7 or ii when the true regression model is incorrect


