CMPTR PROCESSG DATA
CMPTR PROCESSG DATA STAT 479
Popular in Course
Popular in Statistics
This 13 page Class Notes was uploaded by Giovani Ullrich PhD on Saturday September 26, 2015. The Class Notes belongs to STAT 479 at Iowa State University taught by Mervyn Marasinghe in Fall. Since its upload, it has received 50 views. For similar materials see /class/214393/stat-479-iowa-state-university in Statistics at Iowa State University.
Reviews for CMPTR PROCESSG DATA
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 09/26/15
Stat 479 Fall 2010 Assignment 7 Chee Bing LOH Question 1 Scatterplot matrix Scatterplot Matrix for Air Population Data Correlation matrix From the correlation matrix and the scatterplot matrix we can tell that x2 and x3 are the highest correlated with p09553 x6 and x5 are the second highest correlated with p04961 x1 and x6 are third highest correlated with p04302 and x1 and x5 are the fourth highest correlated with p 03863 The more highly correlated independent variables are the more difficult it is to determine how much variation in Y each X is responsible for X1 x5 and x6 are correlated and also x2 and x3 are highly correlated It is hard to determine which variable is responsible to explain variation in y if the variables are highly correlated X2 and X3 might be a problem and x1 x5 and x6 could also be the cause of multicollinearity problems that may occur when fitting the full model Question 2 Parameter Estimates Parameter Standard Prgt t Variance Variable DF Estimate Error tValue Inflation Intercept 1 11172848 4731810 236 00241 0 x1 1 126794 062118 204 00491 376400 x2 1 006492 001575 412 00002 1470365 x3 1 003928 001513 260 00138 1434083 x4 1 318137 181502 175 00887 125552 x5 1 051236 036276 141 01669 340492 x6 1 005205 016201 032 07500 344365 We can see that x1 x3 x4 and x6 have negative signs x2 and x5 have positive signs Number of population should increase the concentration of the sulfur dioxide and no decrease it It is difficult to determine which variable is responsible for variation in Y if there is multicolinearity among variables As a result the standard errors for the correlated variables become very large We can see there is large The more correlated the X variables are with each other the bigger the standard errors Standard errors for regression coefficients will be big and the coefficients become less likely to be statistically significant We can see there is large standard error in x1 and x4 We also can see that x6 and x5 is not significance to the model The variance inflation factor also shows that there is high correlation between x2 and x3 and x1 x5 and x6 We can see that correlation the x2 and x3 are causing the variance of the estimate to be 14 times larger than it would be if the independent variables were uncorrelated It is similar for x1 x5 and x6 Question 3 Parameter Estimates Parameter Standard Prgtt Variance Variable DF Estimate Error tValue Inflation Intercept 1 12311833 3129070 393 00004 0 x1 1 161144 040137 401 00003 138455 x2 1 002548 000454 562 lt0001 107524 x4 1 363024 189234 192 00630 120243 x5 1 052423 022941 229 00283 119975 From the fitted reduced model we can observe that the standard errors of the coefficients are much smaller than the full model The variance inflation factor also shows that there is small value in all the variables in the reduced model we can assume that the variables might be independent to each other Question 4 Yhe REE Procedure Model NDDELI Partial Regresslnn Resldual Plat 2 Ihe REE Prncedure Nudel MEDELI Partlal negrESSInn Resldua PInL y In I so I 40 I II 20 I I I I I I 0 III I I I I I I I I II I I III I I I I I II I I 20 I I I 40 s o o 2 o z o a o s o s o Io o Iz o m o Is o Is o 2o o zz o za o zs o za o Th IIE raw nudeI MEDELX MIMI aggressmn IIESHIIIEI pm y 56 40 I 2o I I 2 I I II I I I I I I I 0 I 2 I II I I I I I I II I I I I 20 I I 40 39 39 I I I I I I I I I I I I I ao 25 20 II5 IA0 vo5 M as Io Is an 25 30 35 40 Yhe REE Prncedure Nudel MEDELI Partial Regresslnn Resldual Plat From the residual plots of each variable we can tell that the homogeneity ofvariance is violated for x1 and x5 There is no obvious pattern in residual plots of x2 and x4 which indicate that the linear model is suitable for this data set Question 5 a All possible models RSquare Cp AIC MSE SSE 05863 75586 2275756 23991145 911663526 RSquare Cp I would choose the model with only x2 and x3 In this model we could observe the smallest Cp among the other possible model The RSquare of the reduced model is 05863 it is could explains the 5863 variation in y by x2 and x3 By this model by choosing the least number of variables also could help to reduce the dimensionality of the dataset b By backward deletion method Analysis ofVariance c By Stepwise Met Sumof Mean Source DF Squares Square FValue PrgtF Model 2 12921 64606335 2693 lt0001 9 Error 38 91166352 23991145 6 Corrected Total 40 22038 Parameter Standard Variable Estimate Error Type II 55 FValue PrgtF Intercept 2632508 384044 11273 4699 lt0001 x2 008243 001470 75480237 3146 lt0001 8 x3 005661 001430 37595224 1567 00003 8 hod Analysis of Variance Sum of Mean Source DF Squares Square FValue PrgtF Model 2 12921 64606335 2693 lt0001 9 Error 38 91166352 23991145 6 CorrectedTotal 40 22038 Parameter Standard Variable Estimate Error Type II SS F Value Pr gt F Intercept 2632508 384044 11273 4699 lt0001 x2 008243 001470 75480237 3146 lt0001 8 x3 005661 001430 37595224 1567 00003 8 SAS Code Used data air infile CDocuments and Settingsairpollutiontxtquot input city y x1x6 run ods listing close ods rtf filequotCDocuments and Settingsscatterplotrtfquot stylestatistical proc sgscatter dataair title quotScatterplot Matrix for Air Dataquot matrix x1 x2 x3 x4 x5 x6 run ods rtf close proc reg dataair corr model y x1x6 vif run proc reg dataair model y x1 x2 x4 x5 sse vif cp partial run proc reg corr model y x1x4selectionb sls005 model y x1x4selectionstepwise sle010 sls005 model y x1x4selectionrsquare sse cp run Example F13 SAS Program data solder infile quotCDocuments and SettingsmervynMy DocumentsClassworkstat479fistuffexl7 22txtquot input operator machine strength run proc mixed datasolder noclprint noinfo methodtype3 cl class machine operator model strength machine ddfmsatterth random operator machineoperator lsmeans machine estimate 39BLUP l Oper 339 intercept 4 machine 1 1 1 1 operator 0 0 4 machineoperator 0 0 1 0 0 1 0 0 1 0 0 1divisor4 estimate 39BLUP72 Oper 339 intercept 4 machine 1 1 1 1 operator 0 0 4divisor4 estimate 39LSMEAN for Mach l39 intercept 3 machine 3 0 0 0 operator 1 1 1 machineoperator 1 1 1 0 0 0 0 0 0 0 0 Odivisor3 title 39Analysis of Strength of Solder in Computer Chips using PROC MIXED39 run SAS Output Analysis of Strength of Solder in Computer Chips using PROC MIXED The Mixed Procedure Type 3 Analysis of Variance Sum of Source DF Squares Mean Square machine 3 12458333 4152778 operator 2 160333333 80166667 machineoperator 6 44666667 7444444 Residual 12 45500000 3791667 Type 3 Analysis of Variance Expected Mean Square Error Term Source machine VarResidual 2 MSmachineoperator Varmachineoperator Qmachine operator VarResidual 2 MSmachineoperator Varmachineoperator 8 Varoperator machineoperator VarResidual 2 Varmachineoperator Residual VarResidual MSResidual Type 3 Analysis of Variance Error Source DF F Value Pr gt F machine 6 056 06619 operator 6 1077 00103 01507 machineoperator 12 196 Residual Covariance Parameter Estimates Cov Parm Estimate Alpha Lower Upper operator 90903 005 105784 287590 machineoperator 18264 005 26505 63032 Residual 37917 005 19497 103320 Analysis of Strength of Solder in Computer Chips using PROC MIXED The Mixed Procedure Fit Statistics 2 Res Log Likelihood 1007 AIC smaller is better 1067 AICC smaller is better 1082 BIC smaller is better 1040 Type 3 Tests of Fixed Effects Num Den Effect DE DE F Value Pr gt F machine 3 6 056 06619 Label BLUP1 Open 3 BijOmP3 LSMEAN for Mach Effect machine machine machine machine machine 1 2 3 4 Estimate 21071 21054 20683 Least Squares Means Estimate 20683 20717 20667 20850 Estimates Standard Error 06775 09343 07949 Standard Error DF 127 6 78 12 t Value 31100 22534 26018 t Value 10008 10025 10000 10089 PP gt t lt0001 lt0001 lt0001 PP gt t lt0001 lt0001 lt0001 lt0001
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'