Statistical Analysis STAT 401
Popular in Course
Popular in Statistics
This 23 page Class Notes was uploaded by Mr. Alex Berge on Friday October 23, 2015. The Class Notes belongs to STAT 401 at University of Idaho taught by Brian Dennis in Fall. Since its upload, it has received 23 views. For similar materials see /class/227936/stat-401-university-of-idaho in Statistics at University of Idaho.
Reviews for Statistical Analysis
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 10/23/15
Sampling distributions Probability distribution of Y serves as a model of a population of quantities u 02 7r etc are parameters constants usually unknown which characterize properties of the distribution Y1 Y2 Y a random sample independent identically distributed random variables Statistic quantity calculated from Y1 Y2 Yn amp possibly known parameters usually for the purpose of estimating an unknown parameter Examples of statistics 1 Y 501 Y2Yn sample mean Y1 72Y2 72Yn 72 sample variance 2 1 3771 1 Statistics are themselves random variables with probability distributions TRUE FACTS about 1739 1 If Y has any probability distribution with mean u and variance 02 then E7M7M A 2 0397 v 2 If Y has a normalu 02 distribution then 2 37 N normal it 0 n 3 Central Limit Theorem CLT if Y has any probability distribution with mean u and variance 02 then the distribution of l7 converges to a normal u distribution as n voo PUY S a gtPZ S 2 where Z N normal0 1 If 37 is thought of as as an estimate of u this property is called asymptotic normality 4 Law of Large Numbers LLN If Y has any distribution with mean u and variance 02 the probability that l7 is within 6 of u where e gt O converges to 1 as n voo P7 ul lt e gt1 as n voo In other words the distribution of l7 concentrates around u If 37 is thought of as an estimate of u this property is called statistical consistency Variants of TRUE FACTS 13 for sums WKnmn 1 Y any distribution mean u variance 02 then nil VW 2 n02 2 Y N normalu 02 then W N normal nu n02 3 Y any distribution mean u variance 02 then the distribution of W converges to a normalnu n02 distribution ex Draw 10 students at random from UI and nd out their SATmath scores In the nation a randomly drawn SAT math score has a normal distribution with a mean of 500 and a standard deviation of 100 If UI students were similar to students in the nation at large what is the probability that Y would be greater than or equal to 560 2ZJLjm19Qamazoow7 Uh5 100MB CLT applied to the binomial distribution Independent successfailure trials 71 prob of success I 1 iftrial is a success 0 if trial is a failure EI7r O391 7rl7r VI7rl 7r 0 7T21 7T1 7T27T Y 2 1 2 In N binomialn 7r So Y is a sum distribution of Y can be approximated by a normaln7r n7r1 7r distribution Approximation is good if MT 2 5 and n1 7r 2 5 ex Guess the suit of the top card in a shuf ed deck Repeat shuf eguess 100 times What is the chance of 30 or more correct guesses 7r 2 025 n 100 N 30 10025 PY 2 30 N PltZ 2 PZ 2115 0125 Correction for continuity improves normal approximation to binomial N 295 10025 PY 2 30 N PltZ 2 PZ 2104 0149 True probability is 14954 Multiple regression Situation more than one independent variable want to predict Y from 171 172 wk ex 0 IRS predicts the amount of money to be recovered in an audit using among other variables amt of deduction for charitable gifts amt of real estate losses etc House appraiser predicts sale price of a house based on sq ft bedrooms ave sale price in neighborhood etc Idea mean of Y is taken to be a linear function of the predictor variables E0 30 51171 32172 quot 5161716 With just two predictor variables not functionally dependent this equation is a plane Model Y N n0rma1lt o 51171 1 32172 quot 3161716 02 Different types of predictor variables 0 ordinary quantitative variables indicator variables AOV is a regression 3 treatments means in M2 M3 1 2 1 if observation is from trt 1 0 otherwise x2 2 1 if observation is from trt 2 O otherw1se 130 30 51171 1 52172 M1 30 31 M2 30 32 M3 50 nonlinear terms eg E0 30 3117 32172 Note this is still a linear statistical model because the parameters appear linearly interactions eg 130 30 31171 52172 53171172 Estimates of unknown parameters are conveniently represented with matrix notation your task learn to multiply matrices learn what an inverse matrix is Data y1 17113 17123 3 mm 312 1721 1722 my 17 yna 7117 7127 397 Matrices 1 1711 1712 171k 1 1721 1722 172k X 1 mnl ng 39 39 39 mnk X0645 design matrix 3 yl A l y y 6 3 ML LS JnJ W1 Matrix representations E k k1x1 y X3 predicted values 6 y residuals SSresiduals y X3 y sum of squared errors minimized at 6 1 11 12 g yl 1 21 5522 m X Z Z 1 31 32 6 y 93 1 11041 5542 m o 111 25512 X 50 515521 3622 go 3031 525332 go 12041 1642 1 1 1 1 1 11 12 1 X X 11 21 31 41 1 12 22 32 42 1 41 42 n Exil Exiz 2 Elam 25 Emilm 2 253 ZZUHZUQ Xxx XX 1XX 1 00quot 0H0 H00 X X matrix of sums of squares amp crossproducts of the predictors symmetric k l X k l X X y normal equations minimizing the sum of squared errors results in a system of k 1 linear equations in k l unknowns E x xrlx39y ML amp LS estimates of the parameters g l k A2 1 A A a n k 1lty X B y X B unbiased adjusted ML estimate of a2 Note different ways of writing the sum of squared residuals result from matrixalgebraic rearrangements y X3 y X3 y y lX y xU Hh where I 2 identity matrix H XX X1X hat matrix Inferences for multiple regression The AOV table source of variation SS df MS SSregression regress1on SSregress1on k SSresidual error SSres1dua1 n k 1 m total SStotal n 1 SSregression JyltXB Jy SSresidual y XE y SStota1 y iny Jy where J is an n X n matrix of 139s Test of the model vs the mean H03 12 22 k20 just the overall mean is used for predicting Y Ha 7E all predictor variables are included in the model f SSregressionk Test Statistic from W Rejection region reject H0 if f 2 fa where the F distribution has k and n k 1 df Coefficient of determination 2 SSregression 1 SSresidual T SStota1 SStota1 proportional reduction in the prediction error attained by using the multiple regression model instead of the overall mean Three factor AOV model factors A B C with a b c levels respectively Yijkm M 01139 j 7k 065 OHM Yjk a Yijk Eijkm u grand mean ai j yk main effects amij awik jk second order interactions a yhjk third order interaction Effects amp interactions summed on any subscript add to zero AOV table balanced design n obs per cell source SS df MS f ssA MSA A SSA a 1 Mhigeargrir B SSB b 1 b1 Msmor ssc MSC C C 1 C1 MSerr0r AB SSAB a 7 1b 7 1 gs ss AC MS AC AC SSAC 1 7 1c 7 1 TEN MS rroB ss BC MS BC BC SSBC b 7 1c 7 1 Ham Ms rroZ SSABC MSABC a71b71c71 MSerr0r ABC SSABCa 1b lc l SSerr0r abcn 1 error SSerror abcn 1 total SStota1 nabc 1 Unbalanced design effects are no longer orthogonal which means that SS for effects are no longer independent of each other There is no unique SS for a factor and SS do not add up The SS that can be attributed to a factor depend on which terms have already been entered eX SS due to A alone SS due to B alone do not necessarily equal SS with both A and B included Think of SS as reduction of error SS as a result ofentering that factor Notation SSEA B means the error sum of squares for the model with A and B included SAS MODEL YABC SS1 882 883 Type I SS is the sequential SS source type I SS A B C SSEA B SSEA B C AB AC SSEA B C AB SSEA B C AB AC BC ABC Type 11 SS compares the particular effect with all other terms that do not contain that particular e ect source type 11 SS A B C SSEA B AB SSEA B AB C AB AC SSEA B C AB BC SSEA B C AB BC AC BC ABC Type 111 SS enters the particular effect last source type 111 SS A B C SSEA B AB AC BC ABC 7 SSEA B C AB AC BC ABC AB AC SSEA B C AB BC ABC 7 SSEA B C AB AC BC ABC BC ABC Type IV SS same as type 111 if there are no empty cells Inferences for population variances Sampling distribution of 5392 Suppose Y N norma1u 02 Y1 Y2 Yn is a random sample and l n 1 is the sample variance Then 71 1 32 Y1 Ygt2ltY2 Ygt2ltYn Ygt2 8392 N chisquaren 1 chisquare distribution with n 1 degrees of freedom Properties 2 Data n y s 7 1001 a CI for 0392 n 1527 n 152gt lju Xa22 X12 a2 Also is a1001 a CI for a Hypothesis tests H0 02 002 known constant 2 2 Ha a lt 00 Test statistic Rejection region 2 X3 reject H0 if X2 i The F distribution IfU N chisquarej and V N chisquarek and U and V are independent then U j Vk has an F distribution named after R A Fisher with j k degrees of freedom We write F N Fj Two populations one is normalp1 012 the other is normalu2 022 From each a ranilom sample will be taken yielding n1 Y1 83912 n2 Y2 83922 Then 1 l US 2 2 2 quot1 1 012 1 S S F 2 1 n2 1 2 Z 12 22 N Fn1 1712 l mg 1 7 2 S2 01 02 Note property of the F distribution is that fans 1 1P m P n17 yla 812 n2 27 822 1001 a CI for 2 2 2 8 8 S lgfl a2as 12fa2gt u l 2 2 Hypothesis test Bartlett39s test Hm H0 1 012 022 mm Ha Mqlb q Ha HV Ha l A EE 1010 H qq wwww V Test statistic Rejection region 2 fa reject H0 lff S f1a2 or 2 fez2 Note X2 and Ftests for variances are not robust to departures of populations from normality ttests for means are robust to moderate normality departures
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'