BUSINESS STRATEGY Mgmt HC 210
Popular in Course
Popular in Health Care Mba
This 81 page Class Notes was uploaded by Isaias Gaylord on Saturday September 12, 2015. The Class Notes belongs to Mgmt HC 210 at University of California - Irvine taught by Staff in Fall. Since its upload, it has received 72 views. For similar materials see /class/201952/mgmt-hc-210-university-of-california-irvine in Health Care Mba at University of California - Irvine.
Reviews for BUSINESS STRATEGY
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 09/12/15
Statistics 210 Statistical Methods Hal S Stern Department of Statistics University of California Irvine sternhuciedu 0 Dictionary de nitions statistic a single term or datum a quantity that is computed from a sample statistics a branch of mathematics dealing With the collection analysis interpretation and presentation of masses of numerical data 0 Points of emphasis numbers With a context inference from sample to population interaction between statisticians and subject area scientists Statistics 210 Broad outline Data collection and randomization Comparative two population studies Analysis of variance gt 2 samples Blocking pairing to reduce variance Simple linear regression Multiple regression Analysis of covariance relation of regression and ANOVA Factorial experiments return to ANOVA and maybe Nested designs random effects Stat 210 Prerequisites Calculus and linear algebra Introductory or basic statistics class descriptive statistics elementary probability basic inference for means and proportions Undergraduate probability statistics random variables probability distributions moments joint distributions basic inference theory References for review of basic statistics Stat 7 The Basic Practice of Statistics Moore Statistics Freedman Pisani Purves Adhikari References for review of undergrad prob stat Stat lZOABC Mathematical Statistics and Data Analysis Rice An Introduction to Mathematical Statistics and its Applications Larsen Marx Data collection 0 Experiment vs observational study Experiment investigator intervention to determine the level of one or more factor for each experimental unit e g drug A or B Observational study determination of factor levels is outside investigator s control eg smoking Experiment with randomized assignment of factor levels to experimental units has the potential to establish cause and effect Experiments with nonrandom assignment or observational studies have a difficult time establishing cause and effect Why confounding Observational studies are still useful study diff t groups establish association as step on causal chain Data collection 0 Selection of experimental units from the population Random sample from population can draW inferences for population from sample Nonrandom sample no inference to population ex 1936 Literary Digest Poll Simple random sample of size 71 every subset of size n in the population has the same probability of being chosen We assume simple random samples Other Viable approaches exist for obtaining a representative sample Data collection ldealized story de ne population of interest random sample of units for study random assignment of factor levels to experimental units analyze data making assumptions about the population as needed statistical inference about population Common compromises observational study for ethical or other reasons eg smoking samples of convenience eg Psych 100 students these compromises are OK as long as limitations are realized Comparative studies two samples Experiments 0 Experiment key feature is investigator intervention to change the level of one or more factor keeping others xed Cause effect relationships can be determined if experiment is done well 0 Terminology experimental unit unit to Which treatment is applied person animal student class factor quantity item thought to affect outcome diff t levels treatment combination of levels of factors response outcome Comparative studies two samples Experiments Principles of experimentation control eliminate factors not under study eg use control group randomization random assignment of treatment to experimental units replication repeat on many units blocking matching pairing divide experimental units into homogeneous groups and apply each treatment in each group rst three of these are needed for causal inference Comparative studies two samples Experiments Example Salk Polio Vaccine 1954 eld trial two main designs NFIP this was not randomized and a randomized design NFIP grades 13 control grade 2 with consent vaccine grade 2 without consent control problems with NFIP gtllt grades 1 and 3 not valid controls polio contagious gtllt grade 2 no consent not valid control diff t types of people randomized ask for consent from everyone consent randomly assign 12 to vaccine and 12 to control no consent not in study other control double blind controls receive placebo doctors don t know assignment Comparative studies two samples Experiments 0 Example Salk Polio Vaccine results Randomized sample rate per group size 100K Treat 200K 28 Control 200K 71 No consent 350K 46 NFIP sample rate per group size 100K Grade 2 vacc 225K 25 Grade 13 control 725K 54 Grade 2 no cons 125K 44 lines 1 and 3 are similar as expected line 2 s are diff t NFIP includes some people that would not have consented 10 Comparative studies two samples Experiments 0 Example Salk trial role of randomization Randomization actually plays two roles in statistical studies gtllt remove effect of confounding factors gtllt provides a basis for inference How does randomization provide a basis for inference 56 children got polio in trt group 142 children got polio in control group gtllt if the vaccine has no effect then children will have same outcome in either group ie same 198 children would get polio gtllt because we randomized we can easily calculate the probability that 142 of these 198 would show up in one group by chance 1 in a billion this proves vaccine is effective 11 Comparative studies two samples Observational studies 0 Observational studies observe the population no intervention rely on sample of population random sample is best compare existing groups causal inference is dif cult Observational studies example 1 Nurses Health Study 10000 nurses female age 20 30 at start food intake diaries nd association between fat and heart disease no control genetics exercise weight stress useful info but not cause and effect 12 Comparative studies two samples Observational studies 0 Observational studies example 2 Baltimore Housing Study 1950s 400 families in housing project 2 deaths 600 families in slums 10 deaths all 1000 had applied for housing project the 400 were selected in some way dif cult to reach any conclusion 13 Comparative studies two samples Notation for two sample studies 0 Scenarios experiment with two treatments observational study with two diff populations Notation sample Y11Y12 Ylm of size 711 from population 1 summary statistics 7 1 1 1 1 7 31 31 population parameters for population 1 mean M1 variance of corresponding notation for population 2 sample Y21Y22 Y2n2 of size 712 summary statistics 172 and 82 population parameters 2 and 03 14 Comparative studies two samples Inference questions 0 Basic inference questions we address point estimation estimates for m 2 M1 2 etc interval estimation con dence intervals for m 2 and other quantities tests of hypotheses does m 2 0 Also ask are assumptions valid effects on inference if assumptions are not valid methods for addressing failed assumptions 15 Comparative studies two samples Randomization inference 0 Assume randomized experiment 0 Why randomize reduce eliminate bias recall polio example creates a probability distn under hypothesis of no difference Randomization inference under null hypothesis no treatment effect each unit has same response in either group observed difference is g1 332 is observed difference large compare to other randomizations very powerful approach no modeling assumptions con dence interval is possible but hard Permutation test same idea When treatment is not randomized 16 Comparative studies two samples Two sample example writing study 0 experimental units writers 0 treatments questionnaire before assignment questions emphasize intrinsic or extrinsic rewards 0 random assignment 24 intrinsic 23 extrinsic 0 response avg of 12 ratings on 40 pt scale 0 data intrinsic 120 120 129 136 166 172 175 182 191 193 198 203 205 206 213 216 221 222 226 231 240 243 267 297 extrinsic 50 54 61 109 118 120 123 148 150 168 172 172 174 175 185 187 187 192 195 207 212 221 240 17 Comparative studies two samples Two sample example writing study cont d data summaries g1 198881 157482 525 data display histograms boxplots stem and leaf plots observed difference in means is 414 1000 randomizations under null hypothesis 5 1000 randomizations have values gt 414 or less than 414 extremely unlikely to see a difference this big by chance two tailed p value 005 conclusions questionnaire on intrinsic rewards leads to more creative writing no random sample can t necessarily infer that this is true on a bigger population hope sample is representative of bigger population 18 Comparative studies two samples Model based inference 0 Suppose we assume Y117 Y217 Y1n1 iid NltM102 iid Ni202 two samples are independent note that we are assuming constant variance Equivalent to linear model Yz j M Ez39j With q N N0 02 iid 19 Comparative studies two samples Model based inference cont d 0 Some results Y1 172 is an estimator for m 2 unbiased best linear unbiased g1 32 is an estimate for m 2 Varl71 Y2 0392 1 s2 1 2 Si m 72013qu 2 is a pooled estimator for common variance 02 Key result under assumptions tY1 Y2ltH1 H2 Sp LL n1 772 N tn1n2 2 Where tn1n22 refers to Student s t distn With 711 712 2 df t is symmetric like the normal distn t has longer tails too is normal distn 20 Comparative studies two samples Model based inference cont d 1001 00 con dence interval for m 2 follows from t distn result on previous slide l 1 1 Y1 Y2 j tn1n2 21 a28p n l 71 2 classical frequentist interpretation lib12 are xed unknowns 171172810 are random variables con dence interval is a random interval in repeated samples 95 of such intervals contain the true value a procedure With good long terrn frequency properties strictly speaking can t say 95 probability for one particular interval but this casual interpretation is generally OK note Width depends on df con dence level a sample size 21 Comparative studies two samples Model based inference cont d Tests of hypotheses null hypothesis H0 m 2 no difference alternative hypothesis Ha m y 2 two sided or m gt L2 or m lt 2 one sided test statistic t Y1 Y2 SWmi1 P value probability of obtaining a value of the test statistic as big or bigger than the observed value if H0 is true gtllt srnall P value means either i H0 is true and we were very unlucky OR ii H0 is false can think of as a measure of evidence 96 96 people often use 05 as a formal cutoff BAD IDEA P value is NOT the probability that H0 is true P value says nothing directly about the 96 96 alternative calculated assuming H0 is true 22 Comparative studies two samples Model based inference cont d 0 Two Views of testing Hypothesis testing decision procedures gtllt Neyrnan Pearson approach gtllt x a Prreject H0 When true gtllt develop rejection region eg ltE1 if2310V 1711 1n2l gt t1n2 21a2 conclusion is reject or not no P Valuell Signi cance testing gtk Fisher gtllt calculate test statistic gtllt compute P Value gtllt report P Value as evidence against null 23 Comparative studies two samples Model based inference cont d 0 Relationship of tests and con dence intervals powerful idea duality between tests and Cls test two sided Will reject at 05 level P value less than 05 if and only if 0 is not in 95 con dence interval can be used for example to get randomization con dence intervals gtllt consider possible value for 6 m 2 gtllt subtract 6 from every value in sample 1 gtllt perform randomization test to determine if Ylj 6 s have different mean than ng s gtllt if don t reject H0 put 6 in CI 24 Comparative studies two samples Writing study example cont d Model based inference 95 Cl 414 i20144851i 21 3 129 699 t t t t 293 es Ismswe P value 0052 sided or 0251 sided Randomization vs model based inference probability by randomization vs probability by assumed population model randomization requires no model for population randomization tests require more computation randomization Cl s based on test interval relationship require even more model based test and C1 are easy an approximation to randomization inference model based approach provides insight into study design next slide 25 Comparative studies two samples Study design Sample size for con dence intervals half Width of CI assuming n1 n2 n is t2n 11 a28p27 can nd 71 to achieve speci ed half Width one dif culty is that n enters twice df and sample size one idea compute initial guess no using 21a2 in place of t critical value and then improve guess 115ng t2n0 11 a2 26 Comparative studies two samples Study design cont d Power Sample size for test type I error reject H0 when it is true type 11 error fail to reject H0 when it is false let oz Prtype I error let 6 Prtype ll error or is known as the size level signi cance 1 6 is known as the power probability of correct rejection of H0 a size level signi cance in advance at 05 for xed 04 power or 6 is determined by true effect size 6 m 2 true variance 02 and sample sizes key formula two sided test w equal sample sizes 71 221a2 Z1g20262 two ideas gtllt given sample size can nd 6 for different 6 s power function determine sample size to achieve a desired 6 Table B5 gives power for t test BUT uses 6 ml M20 W 27 Comparative studies two samples Model diagnostics o Model based approach makes a number of assumptions indep samples from each population two populations are independent equal variances normal distribution Plan of attack diagnose Whether assumption is valid or not graphical and statistical tools understand the effects of violated assumptions remedies modify transform data or revisit model 28 Comparative studies two samples Model diagnostics independence o Diagnosis usually design study to achieve independence could fail if units are related students in same class check by looking at residuals Yij Within possible clusters check by looking at residuals vs possibly relevant variables like time Effects Varl71 Y2 y 021711 1712 t procedures are in trouble Remedies if clustering reanalyze using the correct unit if time effects need new time series models 29 Comparative studies two samples Model diagnostics equal variances o Diagnosis histogram of residuals in the two samples not a powerful diagnostic tool test for equality of variance F test gtllt F 81283 has F distn with 711 1712 1 df Table B4 gtllt BUT F test is very sensitive to normal assumption hence not recommended Levene s test t test comparing absolute deviations in the two groups pp 112 114 in text rule of thumb OK if variances are within a factor of two Effects of unequal variances minor if sample sizes are the same can be important if 711 y 712 worst case small sample size with larger variance is a hypothesis about the means relevant here 30 Comparative studies two samples Model diagnostics equal variances o Remedies for unequal variances transformation gtllt replace data Y with Z gY gtllt perform inference on Z s gtllt choosing the transformation trial and error lnY or W Box Cox family of transformations YA A y 0 and lnY A 0 optimize within family maximum likelihood estimation transformation based on science sqrt of area cube root of volume gtllt interpretation of results can be harder mean of logarithms is not the logarithm of the mean gtllt questions should we account for transformation are we snooping through the data to nd signi cance 31 Comparative studies two samples Model diagnostics equal variances o Remedies for unequal variances cont d approximate t distn gtllt use separate sample variances for the two samples then tY1 Y2 H1 M2 is a rox t 312 S pp 1 771 n Where s S 2 n1 2 V 1 2 2 1 2 2 711 1 712 1 is the Cochran Satterthwaite approximation With minn1 1712 1 g V 3 711 712 2 gtllt tests or Cl based on this approximation 32 Comparative studies two samples Model diagnostics normality o Diagnosis histogram of residuals pool the residuals from the two samples if we believe equal variance consider separately if not normal probability plot of residuals 96 order data residuals here from smallest to largest say X1 Xn 96 nd what we would expect if normal from tables or approximation 96 i 1 39375 Blom approxlmabthn 1239 i I 225 scatterplot of X 139 vs 11 is straight line if 96 data are normal curves indicate non normal tails 96 skewness EY Mfg03 zero for normal kurtosis EY li4a4 three for normal excess kurtosis kurtosis 3 eg in SAS 33 Comparative studies two samples Model diagnostics normality cont d o Diagnosis statistical tests gtllt many exist ShapiroWilk Kolmogorov Smirnov Anderson Darling skewness kurtosis gtllt Table B6 in text gives critical values for correlation of normal probability plot similar to Shapiro Wilk Effects of non normal data large samples no problem because of CLT sensitive to outliers not resistant problem if two distn have different shapes if two distn have same shape and equal sample sizes then skewness is not a problem if two distn have same shape and unequal sample sizes then skewness can be a problem 34 Comparative studies two samples Model diagnostics normality cont d o Remedies for non normal data examine outliers gtllt analyze data With and Without to see if conclusions change gtllt remove only if one can argue that observations are from a diff t population transformation gtllt discussed earlier under unequal variances 35 Comparative studies two samples Model diagnostics normality cont d o Remedies for non normal data Nonparametric tests Wilcoxon rank sum 96 96 96 96 96 96 96 combine two samples and put in order smallest to largest replace each observation by its rank in combined sample 1smallest 711 712 largest test statistic is sum of ranks in sample 1 test statistic is approx normal with mean 711011 712 12 and variance 711712011 712 112 does not assume normality equal variance basically a test on median not mean very effective test hard to get CI 36 Oneway ANOVA gt 2 samples Introduction Independent random samples from 7quot populations Examples randomized experiment With 7quot treatments observational study With 7quot different groups Tvvo sources of variation in measurements variability among observations in a group variability among groups Question are differences among groups large relative to variation Within groups Randomization inference is possible but we won t discuss it Model based inference called ANOVA model 37 Oneway ANOVA gt 2 samples Notation Z j jth Observation in ith sample i1rj1ni Data summaries for ith sample gtllt sample size m gtllt sample mean Yij gtllt sample standard deviation Si ijj W total sample size N m overall mean 17 Y Z YZj pooled variance estimate Sig N1 7 1Sz392 38 Oneway ANOVA gt 2 samples Model Assume Yij N Nri02 Equivalent to YZ j m Eij With Eij N N0 02 known as the cell means model Observations independent Within group Observations independent across groups Alternative version of model YZ39j MOZ Z j Wlth either Eagl 0 or 2mm 0 known as factor effects version 0 measures difference between 2th group mean and overall mean effect 39 Oneway ANOVA gt 2 samples ANOVA table 0 Variation and sums of squares SStotal ZjOz39j Y Zj sometimes known as total corrected for overall mean SSbetween Zj Y2 Y2 variation among group means SSwz thz n total Within group variation sometimes called 886M let eij Yij denote the residual for the ijth observation difference between observed and estimated tted value note that SSWthin Zj 6 key reSUIt SStotal SSbetween SSwz thz n 40 Oneway ANOVA gt 2 samples ANOVA table cont d o Sums of squares are recorded in ANOVA table source of degrees of sums of mean variation freedom squares square between groups 7quot 1 SSbetween SS df Within group N r SSWthin S S df total N 1 SStotal 0 Note that MS are computed as corresponding SS divided by appropriate degrees of freedom df o M Swithm is sometimes known as M Sam or M SE 0 Under the model EMSmthm 02 EltMSbetween 02 wi 1 21 nilM 02 Where I1 niM 7 F MSbetweenMSmthm Will be near one if all W s are equal and larger than one When they are not 41 Oneway ANOVA gt 2 samples ANOVA table cont d Testing the null hypothesis H0 r1 2 7 Under the null MSwz39thz39n N UQvarltN 7quot MSbetween N 02X72n 1lt7quot 1 M Sbetween and M Swithm are independent F M Sbetween M Su thin has the central F distn With 7quot 1 and n 7quot degrees of freedom P Values from Table B4 Key point Rejecting the null is not the end of the analysis It is not interesting to say we found a model that doesn t t We take a brief digression and then return 42 Oneway ANOVA gt 2 samples Fixed effects and random effects Fixed effects 7quot treatments are of direct interest only treatments under consideration eg two drugs four pesticides Random effects 7quot groups are just a sample from population real questions are about population same model With the additional assumption that m N N0 0393 don t estimate m estimate a and 0 i 2 2 2 intraclass correlation 7 altUM a 43 Oneway ANOVA gt 2 samples Fixed effects and random effects cont d Random effects example sample 8 AP high school statistics classes these are the groups treatments sample 10 students in each class give intro stat course nal exam to each not just interested in these 8 classes a measures overall effectiveness of AP statistics classes 0 measures variability among AP classes intraclass correlation measures variability among classes relative to variation among students For now we used xed effects Will make occasional comments about random effects and return to the topic late in the quarter 44 Oneway ANOVA gt 2 samples Example Cash offers for cars problem 1610 Three groups of sellers young rniddle aged old 12 individuals in each group Each tried to sell sarne used car Response is price offered by dealer thousands Observational study Why Data summaries 8101110 71239 9239 8239 young 12 2150 173 middle 12 2775 129 old 12 2142 168 ANOVA table source df SS MS age 2 31672 15836 error 33 8217 249 total 35 39889 F 15836249 636 P Value lt 001 way less reject H0 Now What Which groups are different obvious answer here 45 Oneway ANOVA gt 2 samples Comparisons and contrasts 0 Assume we reject hypothesis of equal means 0 Types of questions comparison inference for a single group mean pairwise comparisons linear combinations contrasts as a special case 0 Inference for a single group mean 1001 00 CI 1 i tNT71a2 Msmw nz usual analysis With pooled variance estimate and extra degrees of freedom 46 Oneway ANOVA gt 2 samples Comparisons and contrasts cont d Pairwise comparisons compare two means using usual two sample procedures With the common variance 02 estimated by M SeM OT 1001 a or Yz39 i tN m a2 MSerror test that corresponds to the above interval With all m equal the quantity tNT71a21MSeM0T is known as the LSD least signif difference often see means listed in order with underlining used to indicate similar means by LSD 47 Oneway ANOVA gt 2 samples Comparisons and contrasts cont d o A possible problem each pairwise comparison has type I error level oz or con dence level 1001 oz we do such comparisons if r is large some signi cant differences are expected by chance even if all of the means are the same known as the multiple comparisons problem more to come on this subject 48 Oneway ANOVA gt 2 samples Comparisons and contrasts cont d Linear combinations contrasts interested in a linear combination 7 ciri contrasts are the special case Where 2 Ci 0 then 7 is zero if all m are equal pairwise comparisons are also a special case cz 1cj 1 and other c s are zero point estimate 7 cZYZ standard deviation sd7 0W standard error 367 MSeMOT 1001 aCl 7 j tNT71a23e7 test H0 7 0 by computing t 7se7 compare to t distn With n r df above test is of great interest if c is a contrast contrasts are 1 df tests of usually prespeci ed hypotheses contrasts can be used to decompose the SSbetween sum of squares for contrast c is SSC 7222 49 Oneway ANOVA gt 2 samples Comparisons and contrasts cont d 0 Where do contrasts come from illustrate assuming 7quot 5 comparing subgroups gtllt pairwise comparison eg c 1 1000 gtk groups 123 vs groups 45 c a V H1 M2M3 M4M5 3 2 expected trends gtllt linear trend in means c 2 10 12 gtllt quadratic trend in means c 2121 2 orthogonal polynomials like above can be used to analyze ordered treatments any hypothesis can become a contrast gtk if we expect means M1 312 M3 714 M5 4 then subtracting the overall mean 5 yields weights c 22 2 1 1 50 Oneway ANOVA gt 2 samples Comparisons and contrasts cont d Orthogonal contrasts two contrasts given by weights c and b are orthogonal if bicim 0 estimates for such contrasts are uncorrelated r 1 pairwise orthogonal contrasts will completely decompose SSbetween there is more than one possible choice for the r 1 orthogonal contrasts return to example Cash offers gtllt gtllt 96 96 96 3 age groups have natural offering natural to decompose SSbetween into linear and quadratic contributions linear c 10 1 SSC 04 don t reject H0 cirz 0 quadratic c 12 1 SSC 31668 reject H0 means are not equal and appear to follow a quadratic pattern obvious from graph 51 Oneway ANOVA gt 2 samples Multiple comparisons Typical study has a relatively small number of planned comparisons or contrasts Often investigators consider a large number of unplanned comparisons eg all pairwise comparisons among treatments Problem is that there can be many such unplanned comparisons Need to adjust tests so that experiment Wide type I error rate is reasonably small to avoid falsely signi cant ndings What to do Basic approach is to adjust the t NT71a 2 quantity used in 1001 00 con dence intervals and tests of size or 52 Oneway ANOVA gt 2 samples Multiple comparison procedures 0 Bonferroni if we have m tests intervals then use oz m instead of oz in each test interval conservative experiment wide type I error rate lt oz easy to do need to know number of comparisons m o Scheffe works for any number of actually all possible tests intervals most conservative still relatively easy use in place of tnma2 o F protected LSD weak do overall F test rst if reject H0 then use LSD for pairwise comparisons differences 53 Oneway ANOVA gt 2 samples Multiple comparison procedures cont d Tukey Kramer studentized range an exact solution for all pairwise comparisons experiment Wide error rate is oz over all of the possible comparisons assumes equal sample size n conservative if not distn of 17quot N r maxi mini Sp is in Table 39 this is the most extreme pairwise comparison use qrN 731 oz in Cls for tests take times usual t test statistics and compare to q distn 54 Oneway ANOVA gt 2 samples Multiple comparison procedures a different look 0 False discovery rate previous approaches attempt to control experimentvvise error rate most appropriate if serious concern that all null hypotheses are true alternative is to control percentage of discoveries statistically signi cant results that are false FDR Benyamini and Hochberg JRSS B 1995 order P values from smallest most signi cant to largest gtllt nd k largest index such that Pa 3 2 gtllt qm Where q is desired FDR and m is number of tests reject hypotheses corresponding to smallest k P values Problem FDR as above assumes independent tests gtllt later publication suggests replacing q by q 2371 1j to correct for non independence 55 Oneway ANOVA gt 2 samples Model diagnostics Model assumptions indep samples from each population populations are independent equal variances normal distributions 56 Oneway ANOVA gt 2 samples Model diagnostics independence Similar to two sample case Diagnosis usually design study to achieve independence could fail if units are related students in same class check by looking at residuals Yij vvithin possible clusters check by looking at residuals vs possibly relevant variables like time 0 Effects of non independence vvrong variances for rneans vvrong error terms for tests 0 Remedies for non independence include the source of the correlation in the model 57 Oneway ANOVA gt 2 samples Model diagnostics equal variances o Diagnosis histogram of residuals in the samples test for equality of variance Bartlett s test M 110g53 21719 110gSz392 Oddw Zz gtllt X2 MC is approx X2 on r 1df gtllt other tests Hartley Levene gtllt BUT tests are very sensitive to normal assumption hence not recommended rule of thumb OK if variances Within a factor of four 0 Effects of non constant variance minor if sample sizes are the same can be important if very unequal sample sizes is a hypothesis about the means relevant here 58 Oneway ANOVA gt 2 samples Model diagnostics equal variance cont d o Remedies for nonconstant variance transformation gtllt gtllt 96 96 gtllt replace Yij by fYZj rules of thumb data are all positive use log Yij data are proportions use arcsin data are counts use or if VarYZj 9Ele then variance stabilizing transformation is l hyocm d2 examples if var oc mean then 92 2 and 99 if var oc mean2 then 92 22 and lny With many groups can estimate 92 if var oc mean then A is slope in plot of log Sl Vs log compute log Sl and logYi for each group estimate A from scatterplot transform is 914 or lny if A 1 59 Oneway ANOVA gt 2 samples Model diagnostics equal variance cont d o Remedies for nonconstant variance weighted least squares easiest to describe in regression context basic idea is to weight observations in each group according to 1822 gtk more later nonpararnetrics Kruskal Wallis test 96 combine all groups 96 rank the observations ANOVA on the ranks KW SSbetweenltN 1SStotal cornpare KW to X734 distn 96 96 related to usual F test on ranks 96 96 pairwise comparisons are possible generalizes two sarnple rank sum procedure 60 Oneway ANOVA gt 2 samples Model diagnostics normality o Diagnosis normal probability plot of residuals pooled or separate in each group statistical tests see two sarnple notes for details o Effects inaccurate inferences if badly non norrnal ANOVA tests are sensitive to outliers 61 Oneway ANOVA gt 2 samples Model diagnostics normality cont d o Remedies for non normality transformation see comments under non constant variance 96 96 hope that same transformation achieves both goals 96 achieving constant variance is more important non parametrics gtk Kruskal Wallis test described above 62 Oneway ANOVA gt 2 samples Study design Power calculation assume completely randomized design assume 7quot populations equal sample size 71 select oz Prtype I error key to calculating power is non centrality parameter of the F test measuring degree to which means differ gem m2 power table 311 in NKNW gives power for given 0465 note that V1 r 1V2 N r lt15 example r 4 m 12Lg 13M3 18L4 21 a 25n 5 a 217 329 gtllt V1 3V2 15a01 gtk power 100 63 Oneway ANOVA gt 2 samples Study design 0 Sample size calculation can get sample size by trial and error gtllt for example on previous slide 71 2 gives power 8 n 3 gives power 99 sample size tables exist gtllt Table B12 uses Aa Where A max Li min m as key quantity in place of 5 gtllt Nelson Journal of Quality Technology 1985 uses qb like quantity 64 Pairingblocking to reduce variance Introduction Key to inference is comparing observed difference to LLgt TLZ39 TLj measure of variability eg t Yjsp If a and hence sp is large then it is hard to declare differences signi cant 0 is a measure of heterogeneity of population Idea create homogeneous groups and compare treatments Within the homogeneous groups Examples matched pairs for comparing two treatments blocking in ANOVA 65 Pairingblocking to reduce variance Paired responses 0 Example agricultural experiment in different elds With treatments A and B applied to half of each eld Example medical study nd pairs of people With same gender and age randomly assign treatments Within the pair Notation Ylk 7 response to treatment 1 in kth pair ng response to treatment 2 in kth pair k n responses are not independent usually positively correlated m mean for population under treatment 1 2 mean for population under treatment 2 de ne Dk Ylk ng difference 66 Pairingblocking to reduce variance Paired responses inference One sample inference using differences Note Md M1 2 quantity of interest Let D Di and 8d D2 1001 00 CI for M1 2 is D i tn 11 a28d test for H0 m 2 equivalent to Md 0 is one sample t test compare t Dsd to t distn With n 1 if 67 Pairingblocking to reduce variance Paired responses vs unpaired responses To keep things simple suppose a 0 Also suppose we know common 02 and the correlation of Ylk and ng call it p VarDk 03 2021 p Paired analysis 2 Dxafln Two indep sample analysis 2 Y1 Y2202n If p 0 then same test statistic but paired analysis has fewer df If p gt 0 then 03 lt 202 and increased precision will likely more than compensate for loss in df More detailed analysis in Snedecor and Cochran above is simplistic Paired analysis provides no bene t for estimating m or 2 alone 68 Pairingblocking to reduce variance Paired analysis model diagnostics 0 Model assumptions need pairs independent of each other differences have normal distribution o Independence usually guaranteed by design big problem if this fails need to improve model 69 Pairingblocking to reduce variance Paired analysis model diagnostics o Non normality diagnosis probability plot on differences statistical tests as before effects gtllt OK if sample size is large gtllt may produce invalid inference in small samples gtllt sensitive to outliers rernedies gtllt transformations may not work well because differences can be negative 70 Pairingblocking to reduce variance Paired analysis model diagnostics o Non normality remedies cont d nonparametric tests gtllt Wilcoxon signed rank test order absolute value of differences assign ranks test statistic T sum of ranks corresponding to positive differences T is approximately normal for large 71 mean nn 14 and variance nn 12n 124 not quite as good as t test for normal data but much better than t for other distn 71 Pairingblocking to reduce variance Paired analysis model diagnostics o Non normality remedies cont d nonparametric tests gtllt sign test ignore zero differences count of positive differences test hypothesis that proportion of positive differences is 05 use binomial probability distn ex suppose 7 of 8 differences are positive P value Pr7 or 8 out of 8 When p 05 070 easy but not as powerful as Wilcoxon 72 Pairingblocking to reduce variance Randomized complete block design Generalization of the paired response idea Assume we have one way ANOVA with J treatments new notation Group experimental units into I blocks of size J Within each block randomly assign J treatments to units Each block is essentially a repetition of experiment Remarks if block has too few units for full repetition then we can use incomplete block design if block has many units some multiple of J then we can apply each treatment more than once 73 Pairingblocking to reduce variance Randomized block design Model Yij u 7j6 j i 1 I indexes blocks j 1 J indexes treatments Tj are treatment effects With Z Tj 0 62 are block effects With z 0 all effects are assumed to be xed effects additive model same treatment effect in each block ANOVA table source of degrees of sums of variation freedom squares blocks I 1 J2 K 172 treatments J 1 121Yj 172 error I 1J 1 211 21Y j j Y2 total IJ 1 31 EgoM Y2 74 Pairingblocking to reduce variance Randomized block design cont d Inference usually no test for block effects because blocks are known to be different J EltMStreatments 02 Zj1 7 2 EltMSerr0r 02 test for treatment effect MStreatments F MSeMOT IS FJ 11 1J 1 assuming F test is signi cant inference for means pairwise comparisons contrasts proceeds as in one way ANOVA 75 Pairingblocking to reduce variance Blocks as random effects Often Wish to think of blocks as random effects Model now assumes N N 0 0 Comments made about random effects With one way design apply here Analysis is essentially unchanged except that intraclass correlation may now be of interest 76 Pairingblocking to reduce variance Ef ciency of blocking In randomized block design dfb M Sam To assess effectiveness of blocking we need to gure out What error variance might have been Without blocking Snedecor and Cochran give I 1MSblocks 1MS 7 7 07 IJ 1 A2 i Jeri as an unbiased estimate of error variance for completely randomized design proof in Cochran and Cox 1957 Then din77 is ef ciency of blocking big values mean big bene ts One complication is that there are different degrees of freedom associated With these estimates 7 Fisher measured information by multiplying variance estimates by df 3 df 1 to adjust for difference in df 77 Pairingblocking to reduce variance Randomized block design diagnostics Assumptions blocks should be independent homogeneous error variance normality block and treatment effects are additive no interaction o Independence usually arranged by design big problems if violated 78 Pairingblocking to reduce variance Randomized block design diagnostics o Constant variance normality diagnosis gtllt check by examining residuals eij YijY YjY gtllt normal probability plot effects gtllt similar to effects for one way design remedies gtllt transformations gtllt nonparametric test replace data by ranks Within blocks and analyze by ANOVA 79 Pairingblocking to reduce variance Randomized block design diagnostics o Additivity diagnosis Tukey s test for non additivity compute 7 2 212106 gtlt j 4 214 Y ENG Y compute adjusted SSeMOT as SSCT T OT J7a SSCT T OT SSW1 39 F39teSt Of SSnaMserrorma on land I 1J 1 1 df effects SSna more important than normality and constant variance dif cult to interpret treatment effects because effects are different in different blocks remedies try transformations to remove interaction 80 Pairingblocking to reduce variance Randomized block design study design 0 Power and sample size determination for test regarding treatments is as in one way ANOVA except 02 is likely smaller after accounting for blocks error df Will be smaller To determine number of blocks can pick I to get desired accuracy for speci ed treatment mean or contrast eg Varl7j 02 need estimate of 02 and can then determine I for needed accuracy 81
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'