Statistical Methods I
Statistical Methods I BTRY 6010
Popular in Course
Popular in Statistics
This 26 page Class Notes was uploaded by Casey Emmerich on Saturday September 26, 2015. The Class Notes belongs to BTRY 6010 at Cornell University taught by R. Strawderman in Fall. Since its upload, it has received 60 views. For similar materials see /class/214344/btry-6010-cornell-university in Statistics at Cornell University.
Reviews for Statistical Methods I
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 09/26/15
TwoSample Inference for Location Parameters FCSM Chapter 6 Dependent samples 64 Sample size 66 TWO Sample Methods BTRY 6010 amp ILRST 6100 1 III Example Growth Hormones The paper Grth Hormone Increase During Sleep After Daytime Exercise 1974 J of Endocrinology 473478 reports results of an experiment involving 6 healthy male squeCIs Blood samples were drawn from each participant during sleep on two different nights using a venous catheter Night 1 no strenuous exercise the day before Night 2 strenuous exercise the day before Two Sample Methods BTRY 6010 amp ILRST 6100 51 The data hormone levels mgdl Subject No Exercise Exercise 1 00 136 2 126 147 3 216 318 4 194 200 5 147 192 6 136 173 I Is there evidence of a difference in growth hormone levels between nights after strenuous exercise and nights atter no strenuous exerCIse 1 I Twosided test H0 LleX rest VS Ha Hex 7 5 lurest TWO Sample Methods BTRY 6010 amp ILRST 6100 3 El 1 Exemse 1V DN t Test 10 Assummg unequa variances Dl alence 4357 REED 1322127 TWO sample ttest shows no SmEnDII 3303 DF 9124934 Upper cwquot 11522 Pmbgtm 02193 dlfference Whether or not LowerCLDil 73039 mun 01092 Con dence 095 Proud 03903 we assume equal variances p 02183 p 02156 11351 1 170 Assuming equawanances Di erance 4 367 Ram 1322127 sm arm 3 303 DF 10 What other assumptions UpperCLDW 11725 Prob 02155 LowerCLDif 72 992 Flat 01073 LL have we made Cun dence U85 Pruuq 03922 10 75 o 5 10 Two Sample Methods BTRY 6010 amp ILRST 6100 4 III I Wrongl Care is required here because the observations within each subject are certainly dependent the observations from the exercise groups night 1 vs night 2 being compared are from the same subjects thus far from independent I In fact notice that there is an increase in hormone level from night 1 to night 2 within each subject I Relationship between observations within a subject must be accounted for in the analysis I Appropriate analysis in this situation a paired t test or matched pairs t test Two Sample Methods BTRY 6010 amp ILRST 6100 III Paired Data Methods I Evaluation of difference between two means using airs of observations on a set of ex erimental units I If unit is a single subject each subject serves as its own control However these methods are more broadly applicable Example in twin studies unit is the pair of twins and paired data methods can be used to study withinpair differences Here each treated unit has its own similar correlated control I Local control describes actions used to reduce experimental error increase observational accuracy amp establish inference base of study These include technique selection of units blocking more later TWO Sample Methods BTRY 6010 amp ILRST 6100 6 Basic idea I Suppose we have 2n observations that can be sensibly paired with each other eg pre vs posttests measurements on both eyes measurements on twins etc I Let yij represent the jth observation for the it pair where i1n and j12 I Consider the difference in sample means 1 1 Y1 Y2 niz1yi1 ni21yi2 Two Sample Methods BTRY 6010 amp ILRST 6100 I Using a little bit of algebra 1 n 1 n dzd I y1y2 I With paired data a difference between sample averages can be naturally expressed as the average of the differences between the observations on the same pair I Assume that CllStlnCt pairs are Independent men a paired ttest amounts to using a standard one sample ttest on the differences d1 d d n TWO Sample Methods BTRY 6010 amp ILRST 6100 8 Example Growth Hormones cont H0 Iuex Iurest 0 Ha ex Iuresl 0 J HO ud 0 Ha yd 20 13450229 11 5 3 73257233 5mm 325707 luwerQEMMezn naussms quotes 5 25 also N T 5 521 Result Is very different from twosample ttest it leads to conclusion that there is indeed a difference at level a 005 p 00228 BTRY 6010 amp ILRST 6100 9 Two Sample Methods I51 I Compare Difference i71 y2 vs Mean d Slides 4 amp 9 these are equal slide 8 I But Std Err Dif and Std Err Mean on Slides 4 amp 9 are actually quite different 3303 vs 1346 Note onesample ttest is performed using d1 d2 d Du S 1 n Std Err Mean d where s l d d 2 J d n1 I I Accounting for pairing here leads to a 25fold reduction in variance compared to usual two sample ttest The resulting test statistics are very different 132 vs 324 TWO Sample Methods BTRY 6010 amp ILRST 6100 10 III Aside Var for dependent I Let X amp Ybe two random variables In general the variance ofX Y or VarX Y is given by lVarX Y VarX VarY 2 CovXY l where CovXY denotes covariance ofXamp Y I We say Xamp Yare uncorrelated if Cov XY O correlated if CovXY 2 O positively correlated if Cov X Y gt O negatively correlated if CovXY lt O I Independence ofXamp Y implies Cov XY 0 but not vice versa TWO Sample Methods BTRY 6010 amp ILRST 6100 11 I So ifCov XY 0 whih is Iw tr whn Xamp Yare independent then i VarX Y Var X Var Y ie add variances as with twosample tstatistic Hormone data I Hormone levels rise Within each subject across the two nights A high level one night suggests high level for the other I LetXamp Ydenote GH levels for exercise and no exercise for a random subject Data suggest that CovXY gt 0 Le positive correlation if true formulas show that taking differences will decrease variance compared to situation ofXamp Y independent TWO Sample Methods BTRY 6010 amp ILRST 6100 12 Result Exact Sam linv Distribution of Normally Distributed Differences Let 5n be the sample mean difference of two observations taken on a SRS of n independent pairs Assume the differences follow a Nud039d2 distribution Then Dnludt Dnlud If n 2 30 A NW The result above forms the basis for Cls and hypothesis tests in a pa d U TWO Sample Methods BTRY 6010 amp ILRST 6100 13 Testing paired data o unknown Test Statistic t w As usual we have three possible sets of hypotheses 0 H03 dsldo Vs Ha3 dgt d0 gt RRiStgttn1a 8 P Ptn1gtt iiH031d2d0 Vs Ha3 dlt d0 2 RR is tlt 39tnm ampP Ptn1ltt iii H0 bud do VS Habud do 2 RR is Ifl gt tn 1a2amp p 2P tn1 gt WI Usual comments apply If n 2 30 we can use normal critical points in place of t If the sample size is very small and normality of differences is sus ect use the ttest with ske ticism or use non arametric methods Wilcoxon signedrank TWO Sample Methods BTRY 6010 amp ILRST 6100 14 hi a Cl calculation aired data aunknown Growth hormone example 95 CI for ud on slide 9 obtained as 95 Cl 4367 2571E 43672571 J6 J6 TWO Sample Methods BTRY 6010 amp ILRST 6100 15 hi Example Growth Hormones i Distrihu uns Nurmai Ouaniiie Pint Nurmai Guaniiie Pint Obligatory check no obvious deviations from normality Twu sampie Methods ETW6D1DampiLRSTEiDD 1s Example Fertilized Tomatoes Does a new fertilizer improve tomato yield Concerns about impact of differences in soil type light moisture led to following experiment I 30 plots of tomato plants 10 plants per plot I New amp existing fertilizer a I lied within each I lot 5 plants each order randomized Average yield for each fertilizer is computed within each plot I Should we use a paired or unpaired ttest to evaluate difference in yields in pounds TWO Sample Methods BTRY 6010 amp ILRST 6100 17 Distributions l l Differsquot E Test Meanvalue 154 n 85 BL 39 m U a g Hypnlheslzed Value n E Actual Esllmala 2 99mm i a U 3 9 3 8m Dev 3 M3766 on 55 2 l Tesi Teslsxausun 47095 39 n a n 1 Fish gt ll IEII1 Prob gti e num quot9 1 F39mh lt1 1mm 1 54 n as l 74737271n1234 l l l l 5 u 5 in l Fit v by x Group lEivaviale FilofAvg Yield New av Plot lBivariale Fil ofAvg Yield Old By Plot lEivariate Fil of Difference Ev Plot Important to control for 5 plot variation in yield is partly explained by 2 e the plot variable Two Sample Methods BTRY 6010 amp ILRST 6100 18 Comments I Pairing matching can be useful in observational studies as a way to control for confounding the impact of measured and unmeasured variables that may be associated with both response and group variable It serves to reduce uncontrolled sources of variability l Pairing is an example of blocking a term originating from experimental design Blocking so vw w reduce unpac of uncontrolled sources of variability on a comparison of treatments eg new vs existing fertilizer post vs pre exerCIse by Tll St creating blocks ie groups of similar or relatively homogeneous units such as tomato plants plots or subects and then assessing treatment effects by using withinblock differences TWO Sample Methods BTRY 6010 amp ILRST 6100 19 I Settings involving rs of measuvc repwse a special case of the more general problem of repeated measurements two or more measurements per unitblock Such data can arise in multiple ways eg multiple treatments per block longitudinal data on each subject and so on I As in the paired setting one expects the measurements on one unitblock to be more correlated with each other than with measurements on different unitsblocks I Methods of analysis must deal with the various levels of correlation that may exist otherwise one can easily obtain incorrect assessments of sampling variability leading to impaired statements of statistical significance andor confidence levels among other problems TWO Sample Methods BTRY 6010 amp ILRST 6100 20 III Sample Siz Determination Paired Samples Use onesample formulas from before FCSM p 275 1001 00 CI for pd p1 Hz 2 2 20d zaz n E2 1 E 5 llllelV39dl WquI n palrs Onesided test for pd 111 p2 2 2 n Ayd npairs Twosided test for ud u1 p2 n 05za2 z 2 TWO Sample Methods BTRY 6010 amp ILRST 6100 21 Aud npairs Example Paired Data Sample Size In 1994 a study was proposed to examine whether a 1995 soil remediation would lead to a 10year reduction in the heavy metal content of wood ug gram The study planned to take a pair of measurements on n trees once in 1995 once in 2005 It was considered desirable to detect a reduction of A 4 ug gram What sample size ie number of trees is required to detect this difference at a power of 90 and a level of significance or 5 Assume the population standard deviation of the concentration difference is ad 3 Onesided test a P Type Error 005 8 P Type II Error 1 power 1 090 010 n 03 z 2 32zo05 2012 9164512802 2 2 48125 2 5 A 4 16 Take pairs of measurements on n 5 trees BTRY 6010amp ILRST 6100 22 Two Sample Methods Sample Size Two Independent Samples Ideas are similar formulas are more complicated due to presence of two sample sizes and two variances 1001 a CI for the mean difference u1 p2 Assume we desire to have n2 m n1 where m gt 0 Also assume 01 and 02 are known and we want a confidence interval no wider than 2 E Then use 2 039 012 2 252 77 n1 2 E2 n2mn1 TWO Sample Methods BTRY 6010 amp ILRST 6100 23 III For example ifm 1 ie we want n1 n2 2 2 0392 2 0 Z 1 2 2 2 2 m a 01 02 2052 E2 E2 I71 Ifm 1 and 01 02 039 we obtain the formula on page 274 of FCSM where n is size of each group 2 2 0392 2 039 Z 2 2 m 1 20 Zm TWO Sample Methods BTRY 6010 amp ILRST 6100 24 hi i Tests for the mean difference u1 p2 Again Assume we desire 172 m n1 where m gt O we wish to detect a difference on A I 1 2 Suppose 0391 and 02 are known and that we want Type amp errors of aand 8 power 1 8 2 01 2a z i quotquotquotquot quotquotquotquotquotquotquot quot if 3952 75 iii 71 rjiF iii Vii 767 a 2hquot 6 Iii 25 2 3 6 TWO Sample Methods BTRY 6010 amp ILRST 6100 25 II For exam le ifm 1 we obtain onesided case 2 03922 2 m 0102zaz n n n 1 A2 A2 2 1 Ifm 1 and 01 02 039 we again obtain formulas on page 274 of FCSM For example in a 2sided setting n nn 1 A2 A2 2 1 2 012 20quot 262 202 202 262 Note Formulas 9 11 on FCSM p 283 repeat those given on p 274 n refers to the size of each group In contrast formula 12 given for the paired data setting specifies the total of observations not the number of pairs compare to FCSM p 275 and Slide 21 BTRY 6010 amp ILRST 6100 Two Sample Methods 25