### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# Statistical Methods STAT 51100

Purdue

GPA 3.63

### View Full Document

## 86

## 0

## Popular in Course

## Popular in Statistics

This 107 page Class Notes was uploaded by Bailey Macejkovic on Saturday September 19, 2015. The Class Notes belongs to STAT 51100 at Purdue University taught by Michael Levine in Fall. Since its upload, it has received 86 views. For similar materials see /class/207937/stat-51100-purdue-university in Statistics at Purdue University.

## Similar to STAT 51100 at Purdue

## Popular in Statistics

## Reviews for Statistical Methods

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 09/19/15

511 Statistical Memods Purdue University Dr Levine Fall 2006 Lecture 16 Tests abaut a Pnpula on Pmpar on Devora Section 83 Ann 2066 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 LargeSample Tests 0 Letp denote the proportion of individuals or objects in a population who possess a specified property thus each object either possesses a desired property 8 or it doesn t F 0 Consider a simple random sample X1 7Xn If the sample size n is small relative to the population size the number of successes in the sample X has an approximately binomial distribution If n itself is also large both X and the sample proportion 13 Xn are approximately normally distributed a Largesample tests concerning p are a special case of the more general largesample procedures foran arbitrary parameter 6 We considered such a largesample test before for the mean u of an arbitrary distribution AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 0 Some basic properties of are 1 Estimator is unbiased p 2 Second it is approximately normal and its standard deviation SD is Up p1 pn 3 Note that 0p does not include any unknown parameters This is not always the case It is enough to remember the largeSample test of the mean where 0 0X 0271 which is in general unknown unless 02 is specified AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 a Let us consider first an uppertailed test It means having a null hypothesis H0 p p0 vs an alternative Ha p gt 1 0 Under the null hypothesis we have E p0 and 0p xp01 p0n therefore for large n the test statistic i3 p0 p01 p0n has approximately standard normal distribution a The rejection region is clearly z 2 Za for a test of approximately level 05 AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 o The lowertailed test has a rejection region Z g 2a a The twotailed test has a rejection region Z Zag The last expression is a concise way of saying that Z 2 Z042 or 2 g za2 0 These tests are applicable whenever the normal approximation of the binomial distribution is reasonable npo 2 10 n1 p0 gt 10 AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 Example 0 Consider example 811 from Devore The null hypothesis here is thatp 03 The alternative would be p gt 30 a Note that the rule of thumb is satisfied npo 41153 gt 10 abd n1 p0 41157 gt 10 0 Thus we use the largesample test with Z I 03 3 37n o For a significance level 05 1 we use 2a 128 The sample proportion isp 12764115 310 Plugging this value in Z we obtain 140 gt 128 Thus the null hypothesis is rejected AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 Type II Error and sample size determination a Type II Error probability can be computed exactly as before If H0 is not true the true proportion p pl 7 p0 Under Ha p pl we have Z is still approximately normal however amp EltZgt 1901 p0n and 29 1 i9 n VltZgt 1901 P0n AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 o The formulas for the type II error are very similar to what we saw before for the mean test We only give the uppertailed test formula Ha p gt 190 290 p Zoz po1 pon pu yvn and the lowertailed test formula Ha p lt p0 po pl ZQ po1 po7 L 1 pu pvn a Sample size formulas can also be easily derived In the twotailed case the formula is approximate as before AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 Example 0 Consider example 812 from Devore The null hypothesis is H0 19 09 vs Ha I lt 09 How likely is it thata test of level 01 based on n 225 packages detect a departure of 10 from the null value a Forp 08 we have M8 1 I 39 9 8 233y91225 0228 82225 0 Thus the probability of type II error under the alternative pl 08 is about 23 AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 Small Sample tests 0 These are test procedures for proportions when the sample size n is small They are based directly on the binomial distribution rather than the normal approximation 0 Consider the alternative hypothesis Ha p gt p0 and let X be yet again the number of successes in the sample size n For a test level 05 we find the rejection region from PX 2 cwhen X N Binnp0 1 PX g c 1 when X N Binnp0 1 Bc 1np0 AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 a It is usually not possible to find an exact value of C in this case the usual way out is to use the largest rejection region of the form 6 C 1 n satisfying the bound on the Type I error a To compute the Type II error for an alternative pl gt 190 we first note that X N Binnpl if the alternative is true Then pl PX lt cwhen X N Binnpl BC 1npl Note that this is a result of a straightfonvard binomial probability calculation AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 Example 0 A builder claims that heat pumps are installed in 70 of all homes being constructed today in the city of Richmond VA Would you agree with this claim if a random survey of new homes in this city shows that 8 out of 15 had heat pumps installed Use a 01 level of significance 0 H0 2 07vs Ha p lt 07with Oz 010 o The test statistic is X N Bin07 15 AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 a We have a 8 and npo 15O7 105 Thus we must find csuch that PX 2 c1 Bc 11507 01 for X N Bin07 15 It is easy to check that the rejection region will be 137 147 15 AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 Lecture 20 Two Sample Test for Proportions and the Variance Test D39evore Section 9495 AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 Difference between Population Proportions 0 Consider two different populations with proportions of individuals possessing a desired property being p1 and p2 respectively We denote the number of individuals in each sample possessing the desired property by X and Y respectively o If the respective sample sizes m and n are small we can assume that X N Binmp1 and Y N Binnp2 o The obvious estimator for p1 p2 is the difference in sample proportions if 131 fni and 132 then 131 132 can be used to estimate p1 p2 AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 o It is easy to show that E151 132 p1 p2 and 1911 291 2921 292 ml 232 m n a The commonly used notation is qr 1 pi i 1 2 a When m and n are large enough we can claim that 131 132 191 192 is approximately standard normal and use this fact to construct Z a test AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 A LargeSample Test Procedure 0 We consider H0 2191 p2 0 vs Ha 2191 p2 7 0 0 Under H0 denote p1 p2 p and Q1 q2 1 p Then the variable p1 p2 0 m has a standard normal distribution under H0 Z o It cannot be used for testingwhy Because p and q are not known AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 0 Therefore we estimate AXY m A A p mn mnp1 mnp2 and use the statistic 231 232 Z Wi which is approximately standard normal when both m and n are large AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 Example 0 Does pleading guilty in a criminal trial result in a more lenient sentencing outcome 0 The first group of m 191 consists of defendants who plead guilty as 101 of them were sentenced to prison terms with 151 529 The second group ofn 64 plead not guilty y 56 of them received prison terms with 132 875 o ThuswehaveH0 p1 p2 0vsHap1 p2 y 0 At level 04 001 H0 should be rejected if z 2 258 or Z g 258 The combined estimate of the success proportion isp 101 56191 64 616 AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 o The value of the test statistic is Z 529 875 494 616384 i 614 and therefore H0 must be rejected Nota that Pvalue is approximately 00004 c As a remark this outcome had been confirmed for many different types of crimes burglary robbery etc as well as for defendants with different prior record non some but no prison prison etc AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 Type II Error Probabilities 0 Under the alternative hypothesis there are differentpl y p2 p q p q Thus alarm o The expressions are rather complicated for example for Ha 2191 p2 gt 0 ZaxW m 171 p1 292 0131 32 ltp17p2 1 AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 Sample Size Calculations o For a specified alternative p1 p2 d the common sample size needed to achieve p1p2 can be easily determined As an example for an uppertailed test we have 71an p2ql Q22 ism191611 p2Q2l2 quot7171 d2 AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 Example 0 Consider 1954 Salk polio vaccine trialsThe experiment was a doubleblind one 0 p1 and p2 are the probabilities of getting a paralytic polio in the trial and control groups respectively We test H0 2191 p2 0 vs Ha 2191 p2 gt 0 lfthe true value is p1 00003 p2 000015 would be a success a We use a level 05 005 test and want to find sample sizes such that 0 01 for the above values of p1 and p2 o If the sample sizes are equal then applying the sample size formula we have n m 171 000 In reality samples of about 200 000 were used and z 643 AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 A Large Sample Confidence Interval for p1 192 a It is easy to derive a simple 1001 00 CI for the difference of proportions as A A A 19191 19292 Pl P2iZa2 m 72 As a remark note that the estimated standard deviation for 131 132 is not equal to the one used under the null hypothesis o It is usually recommended in practice to use 151 and the respective 4 instead of traditional 13 and Q because of the problems with coverage AUG 2006 Statistics 511 Statistical Methods Dr Levine Example Purdue University Fall 2006 0 Consider the effectiveness of the combined cancer treatment radiation and chemotherapy vs just chemotherapy Chemotherapy Chemotherapy and radiation 15 year survival 76 98 Less than 15 year survival 78 66 a Sample proportions are 131 76154 494 and 132 98164 598 AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 o The 99 confidence interval for the difference of proportions using the traditional interval is 494598 598402 154 164 247039 494 598i258 o This is somewhat inconclusive since 0 is inside the confidence interval You can easily check that the improved version in this case produces nearly identical Cl AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 Inferences about Two Population Variances o F distribution depends on the two parameters V1 and V2 they are called the number of numerator degrees of freedom and the number of denominator degrees of freedom It is also nonnegative o For two independent chisquared variables X1 and X2 the ratio XlVl F X2V2 has an F distribution AUG 2006 Figure 1 The F Distribution Density Cinne Propetty lquot lnv1v3 My F density curve K Shaded area a q Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 o The density is not symmetric however it can be shown that Fl oz11V2 1Foz11V2 o The last property allows us to tabulate only upper or lower tail critical values AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 The Main Property 0 Consider X1 Xn is taken from a normal distribution with variance 03912 and Y1 Yn sampled from another normal distribution with variance 0 independently of X s Then Sfo39 8303 F has an F distribution with V1 m 1 and V2 n 1 0 Clearly it would be an indication of a being rather different from 0 if this ratio is very different from 1 AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 The Test for Equality of Variances a We test H0 0 0 vs one of the three possible alternatives 0 Test statistic value is o For Ha 0 gt 0 the rejection region is f 2 Fa7m17n1 For the lower tailed test it isf g F1a7m17n1 and forthe twotailed test it is f 2 Fa27m17n1 orf g F1a27m17n1 a Note that F tables a little39 harder to use than ttables because of the two parameters The table we have in Devore s book gives only a very limited choice of four values for 05 AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 Lecture 15 Tests about a Population Mean Devore Section 82 Aug 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 A Normal Population with known 0 o This case is not common in practice We will use it to illustrate basic principles of test procedure design a Let X1 Xn be a sample size n from the normal population The null value of the mean is usually denoted MO and we consider testing either of the three possible alternatives ugtuowltuoandu7 uo o The test statistic that we will use is X H0 a It measures the distance of X from no in standard deviation Z units Aug 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 0 Consider Ha u gt MO as an alternative The outcome that would allow us to reject the nulkl hypothesis H0 u MO is Z 2 CforsomeC gt 0 o How do you select C We need to control the probability of Type Error For a test of level 05 we have 05 P Type Error PZ 2 CZ N N01 0 Therefore we need to choose C 2a Such a test procedure is called uppertailed o It is easy to understand that for Ha u lt LLO we will have the rejection region of the form z g C For the test to have the level 05 we need to choose C za Such a test is called a lowertailed test Aug 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 0 Now considerthe case of Ha u 7 MO The rejection region here consists of z 2 C and 2 g C o For simplicity consider the case 05 005 Then 005 PZ 2 corZ g CZ N N01 lt1gt c 1 c 21 ltIgtc 0 Therefore we select C such that 1 ltIgtc PZ 2 c 0025 it is 20025 196 This test is called a twotailed test Aug 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 Summary 0 Let H0 LL 0 define the test statistic Z 1 Ha u gt ng has the rejection region Z 2 za and is called an uppertailed test 2 Ha u lt 0 has the rejection region z g 2a and is called an lowertailed test 3 Ha u 7 0 has the rejection region z 2 Zag or 2 g za2 and is called a twotailed test Aug 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 Recommended Steps for Testing Hypotheses about a Parameter 1 Identify the parameter of interest and describe it in the context of the problem situation Determine the null value and state the null hypothesis State the alternative hypothesis Give the formula for the computed value of the test statistic State the rejection region for the selected significance level 0701wa Compute any necessary sample quantities substitute into the formula for the test statistic value and compute that value a The formulation of hypotheses steps 2 and 3 should be done before examining the data Aug 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 Example 0 Consider Ex 86 in Devore 0 Parameter of interest is u true average activation temperature a H0M130Hau7 130 a Test statistic is E u0 53 130 Z av 15 Aug 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 o Rejection region is 2 g za2 orz 2 2042 Ifa 001 we have Z S Z0005 and Z 2 20005 a With n 9 and a 13108 13108 130 15 0 Since 216 is outside the rejection region we fail to reject H0 at 216 significance level 001 Aug 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 Type II Error 0 As an example consider an uppertailed test with the rejection region z 2 2a 0 H0 is not rejected when a lt uo 2a a o For a particular u gt u the probability of Type II error is then M PO lt 0 in Ux lu Ml X u Z lo Ml Pa ltaa 39 Z lo H q 0 Aug 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 0 Similar derivations can help us to derive Type II error probabilities for a lowertailed test and a twotailed test Results can be summarized as follows 1 Ha u gt ng has the probability of Type II Error uo u 1 205 ONE 2 Ha u lt ng has the probability of Type II Error 1 I Za HOH ax 3 Ha u 7 0 has the probability of Type II Error c1gt Zag iii3 c1gt ZaZ 375 039 Aug 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 Sample Size Determination 0 Sometimes we want to bound the value of Type II error for a u u speCIfic value u 0 Consider Ex 86 again fix oz and specify for such an alternative value For M 132 we may want to require 132 01 in addition to oz 01 o The sample size required for that purpose is such that Aug 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 0 Solving for n we obtain 2 n 0Zoz M0 H and the same answer is true for a lowertailed test a For a twotailed test it is only possible to give an approximate solution It is 2 2a 2 n0 2 M lo M Aug 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 Large Sample Tests 0 When the sample size is large the 2 tests described earlier are modified to yield valid test procedures without requiring either a normal population distribution or a known 0 a Let us assume n gt 40 Then the test statistic X Z J0 Sm is approximately standard normal 0 The use of the same rejection regions as before results in a test procedure for which the significance level is approximately 05 Aug 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 A Normal Population Distribution 0 When n is small we can no longer invoke CLT as a justification for the large sample test 0 Remember that for a normally distributed random sample X1 7Xn the statistic XM S has a 1 distribution with n 1 df T 0 Therefore we have the test with H0 u MO and a test Tc uo statistic valuet S Aug 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 Summary of the Three Possible tTests o If Ha u gt uo the rejection region of the level 05 test is t Z tam 1 o If Ha u lt uo the rejection region of the level 05 test is t S tozn 1 o If Ha u 3e 0 the rejection region of the level 05 test is t Z ta2n 1 ort S toz2n 1 Aug 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 Example 0 The Edison Electric Institute publishes figures on the annual number of kilowatt hours expended by various home appliances a It is claimed that a vacuum cleaner expends an average of 46 kilowatt hours per year 0 Suppose a planned study includes a random sample of 12 homes and it indicates that VC s expend an average of 42 kilowatt hours per year with s 119 kilowatt hours 0 Assuming the population normality design a 005 level test to see whether VC s spend less than 46 kilowatt hours annually Aug 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 0 H0 u 46 kilowatt hours and Ha u lt 46 kilowatt hours 0 Assuming Oz 005 we have a critical region t lt 1796 where 517 H0 Sx t with 11 df o The value of the statistic is t 42 46 116 11912 39 0 Since I is not in the rejection region we fail to reject H0 Aug 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 A curve for the t test o It is much more difficult to compute the probability of the Type II Error in this case than in the normal case a The reason is that it requires the knowledge of distribution of T X O under the alternative Ha To do it precisely we SW must compute me PT lttan1 when u u c There exist extensive tables of these probabilities for both one and twotailed tests Aug 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 Figure 1 A Typical Curve fer the Test w 39 cuwefere idf 5 when 4 w I 0 i Value efd eeiteepending e speci ed alternative to if Aug 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 Calculating 0 First we select u and the estimated value for unknown 0 Then we find an estimated value of d u0 u0 Finally the value of is the height of the n 1 df curve above the value of d o If n 1 is not the value for which the corresponding curve appears visual interpolation is necessary Aug 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 a One can also calculate the sample size n needed to keep the Type II Error probability below for specified 05 1 First we compute d 2 Then the point d is located on the relevant set of graphs 3 The curve below and closest to the point gives n 1 and thus n 4 Interpolation of course is often necessary Aug 2006 511 Statistical Memods Purdue University Dr Levine Fall 2006 Lecture 15 Tests abaut a Pnpulation Mean Devora Section 82 Ann 2066 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 A Normal Population with known 0 o This case is not common in practice We will use it to illustrate basic principles of test procedure design a Let X1 Xn be a sample size n from the normal population The null value of the mean is usually denoted MO and we consider testing either of the three possible alternatives ugtuoultuoanduuo o The test statistic that we will use is X H0 a It measures the distance of X from 0 in standard deviation Z units AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 0 Consider Ha u gt ng as an alternative The outcome that would allow us to reject the nulkl hypothesis H0 u MO is Z 2 CforsomeC gt 0 o How do you select C We need to control the probability of Type Error For a test of level 04 we have 05 P Type Error PZ 2 CZ N N01 0 Therefore we need to choose C za Such a test procedure is called uppertailed o It is easy to understand that for Ha u lt LLO we will have the rejection region of the form 2 g C For the test to have the level 04 we need to choose C za Such a test is called a lowertailed test AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 0 Now considerthe case of HG u 7 MO The rejection region here consists of z 2 c and Z 3 C o For simplicity consider the case 05 005 Then 005 PZ 2 corZ g CZ N N0 1 lt1gt c 1 c 21 03 0 Therefore we select C such that 1 ltIgtc P Z 2 c 0025 it is 20025 196 This test is called a twotailed test AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 Summary 0 Let H0 LL 0 define the test statistic Z 1 Ha u gt uo has the rejection region Z 2 2a and is called an uppertailed test 2 Ha u lt 0 has the rejection region z 3 2a and is called an lowertailed test 3 Ha u y uo has the rejection region 2 Z 2042 or 239 g za2 and is called a twotailed test AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 Recommended Steps for Testing Hypotheses about a Parameter 1 Identify the parameter of interest and describe it in the context of the problem situation Determine the null value and state the null hypothesis State the alternative hypothesis Give the formula for the computed value of the test statistic State the rejection region for the selected significance level 0701wa Compute any necessary sample quantities substitute into the formula for the test statistic value and compute that value a The formulation of hypotheses steps 2 and 3 should be done before examining the data AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 Example 0 Consider Ex 86 in Devore39 0 Parameter of interest is u true average activation temperature a H0M130Hau7 130 a Test statistic is E u0 53 130 Z am 15m AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 o Rejection region is 2 g Za2 orz 2 ZaZ Ifa 001 we have Z S ZQ005 and Z 2 20005 0 With n 9 and a 13108 13108 130 15 0 Since 216 is outside the rejection region we fail to reject H0 at 216 significance level 001 AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 Type II Error 0 As an example consider an uppertailed test with the rejection region Z 2 2a 0 H0 is not rejected when a lt uo Za a o For a particular u gt u the probability of Type II error is then M PO lt 0 in Ux lu Ml X u39 lo Ml Pa ltma 39 Z lo H ay AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 0 Similar derivations can help us to derive Type II error probabilities for a lowertailed test and a twotailed test Results can be summarized as follows 1 Ha u gt uo has the probability of Type II Error uo u 1 205 ONE 2 Ha u lt uo has the probability of Type II Error 1 I HOH ax 3 Ha u 7 0 has the probability of Type II Error 1 Zea2 33 I 2a2 l JO9 039 AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 Sample Size Determination 0 Sometimes we want to bound the value of Type II error for a u u speCIfic value u 0 Consider Ex 86 again fix oz and specify for such an alternative value For M 132 we may want to require 3132 01 in addition to Oz 01 o The sample size required for that purpose is such that 0 I q M Z 3 a AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 0 Solving for n we obtain 2 n 0Zoz lm M and the same answer is true for a lowertailed test a For a twotailed test it is only possible to give an approximate solution It is 2 za z 71012 M lm H AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 Large Sample Tests 0 When the sample size is large the 2 tests described earlier are modified to yield valid test procedures without requiring either a normal population distribution or a known 0 a Let us assume n gt 40 Then the test statistic X 57 is approximately standard normal 0 The use of the same rejection regions as before results in a test procedure for which the significance level is approximately 04 AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 A Normal Population Distribution 0 When n is small we can no longer invoke CLT as a justification for the large sample test 0 Remember that for a normally distributed random sample X1 7Xn the statistic XM S has a 75 distribution with n 1 df T 0 Therefore we have the test with H0 u MO and a test statistic value 75 S AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 Summary of the Three Possible tTests o If Ha u gt uo the rejection region of the level 05 test is t Z tcam 1 o If Ha u lt 0 the rejection region of the level 04 test is t S tozn 1 o If Ha u 3e M0 the rejection region of the level 05 test is t Z taZ L l ort S toz2n 1 AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 Example 0 The Edison Electric Institute publishes figures on the annual number of kilowatt hours expended by various home appliances a It is claimed that a vacuum cleaner expends an average of 46 kilowatt hours per year 0 Suppose a planned study includes a random sample of 12 homes and it indicates that VC s expend an average of 42 kilowatt hours per year with s 119 kilowatt hours 0 Assuming the population normality design a 005 level test to see whether VC s spend less than 46 kilowatt hours annually AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 0 H0 u 46 kilowatt hours and Ha u lt 46 kilowatt hours 0 Assuming Oz 005 we have a critical region 75 lt 1796 where 517 M0 Sx t with 11 df o The value of the statistic is t 42 46 116 11912 39 0 Since I is not in the rejection region we fail to reject H0 AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 A curve for the ttest o It is much more difficult to compute the probability of the Type II Error in this case than in the normal case 0 The reason is that it requires the knowledge of distribution of T X O under the alternative Ha To do it precisely we SW must compute MALI PT lttan1 when it MI 0 There exist extensive tables of these probabilities for both one and twotailed tests AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 Figure 1 A Typical CumTe for the Test 1 cuwe far it 1 11 ii when 4 ii Iif i i Value nfd cmi39espnnding i0 specified alternative to if AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 Calculating 0 First we select u and the estimated value for unknown 0 Then we find an estimated value of d u0 u0 Finally the value of is the height of the n 1 df curve above the value of d a If n 1 is not the value for which the corresponding curve appears visual interpolation is necessary AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 a One can also calculate the sample size n needed to keep the Type II Error probability below for Specified 04 1 First we compute d 2 Then the point d is located on the relevant set of graphs 3 The curve below and closest to the point gives n 1 and thus n 4 Interpolation of course is often necessary AUG 2006 511 Statistical Methods Purdue University Dr Levine Fall 2006 Lecture 18 Inferences Based an Tm Samples Beware Section 31492 Ann 2066 Statistics 511 Statistical Methods Dr Levine Purdue University Fa12006 Z Tests and Confidence Intervals for a Difference Between Two Population Means a An example of such hypothesis would be m ug O or 01 gt 02 It may also be appropriate to estimate 1 2 and compute its 1001 00 confidence interval a Assumptions 1 X1 Xm is a random sample from a population with mean 1 and variance 0 2 Y1 Yn is a random sample from a population with mean 2 and variance 0 3 The X and Y samples are independent of one another AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 o The natural estimator of M1 H2 is X 37 To standardize this estimator we need to find E X 37 and V X Y o E X 37 ul 2 so X 37 is an unbiased estimator 0f 1 H2 The proof is elementary EX YEX EYm u2 AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 o The standard deviation of X 37 is 0Xy g o The proof is also elementary VX YVXVYg The standard deviation is the root of the above expression AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 The Case of Normal Populations with Known Variances a As before this assumption is a simplification 0 Under this assumption Z 1 has a standard normal distribution a The null hypothesis 1 ug O is a special case of the more general 1 i2 A0 Replacing ul 2 in 1 with A0 gives us a test statistic AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 o The following summary considers all possible types of alternatives 1 Ha ul 2 gt A0 has the rejection region Z 2 2a 2 Ha ul ug lt A0 has the rejection region z g za 3 Ha ul u2 7 A0 has the rejection region Z 2 2042 or Z S Za2 AUG 2006 Statistics 511 Statistical Methods Dr Levine Purdue University Fall 2006 Example Consider Ex 91 in Devore Sample sizes are m 20 and n 25 Note that m 7 nit is not important now but will be later Note that the normality suggestion is based on some exploratory data analysis The hypotheses are H0 ul ug O and Ha ul ug 3e 0 The test statistic is 97 3 2 2 2 mn AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 o Fora level of significance oz 001 ZaQ Z005 258 and the rejection regions is 2 g 258 or z Z 258 o The computed value of z statistic is 366 which is well within the rejection region The Pvalue for this rejection region is 21 0366 w 0 which mean rejection at any reasonable level AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 Type II Error and the Choice of the Sample Size 0 Consider the case of an uppertailed alternative hypothesis Haiku M2 gtA0 o The rejection region is a g 2 A0 Za0Xy Therefore P Type II Error POE 37 lt A0Za0XY when Ml lug Al 0 Since X 37 is normally distributed under the alternative 1 ug A with mean A and standard deviation h 0XY m nWe ave 35 lt1gt 2a 039 A A0 AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 0 Similar results can be easily obtained for the other two possible alternatives In particular if Ha ul M lt A0 we have A A0 039 35 1 lt1 za o If m ug 7 A0 the probability of Type II Error is Al A Al A zw T lt1gt za2 0 AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 Example 0 Consider Example 93 from Devore Suppose that the probability of detecting a difference 5 between the two means should be 90 Can the 01 level test with m 20 and n 25 support this a Fora twosample test we have 5 0 39 5 0 ltIgt 2 c1gt 2 121 m5 58 134 58 134 5 0 Because the rejection region is symmetric we have 5 and therefore the probability of detecting a difference of 5 is 1 8749 a We can conclude that slightly larger sample sizes are needed AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 a To determine a sample size that satisfies P Type II Error when m ug Al we need to solve Oi 0393 A A02 m n 2a 25 o For two equal sample sizes this yields a 0za 2W A A0 quot7171 AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 LargeSample Tests 0 In this case the assumption of normality for the data is unnecessary and variances 0 0 need not be known a This is because for large n the variable X Y UH M 5 5 Vm7 is approximately standard normal Z AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 0 Then if the null hypothesis is m ug A0 the test statistic X 7 A0 5 5 Vm7 is approximately standard normal under the null hypothesis Z a This test is usually appropriate if both m gt 40 and n gt 40 AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 Example 0 A company claims that its light bulbs are superiOr to those of its main competitor If a study showed that a sample of m 40 of its bulbs has a mean lifetime of 647 hours of continuous use with a standard deviation of 27 hours while a sample of mg 40 bulbs made by its main competitor had a mean lifetime of 638 hours of continuous use with a standard deviation of 31 hours does this substantiate the claim at the 005 level of significance AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 oHozul u20andHazm u2gt0 Reject H0 ifZ gt 1645 a Calculations 647 638 z 138 272 312 40 40 Decision H0 cannot be rejected at 04 005 the p value is 00838 AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 Confidence intervals for M1 2 0 Since the test statistic Z that we just described is exactly normal when 0 and 0 are known o The 1001 00 CI is easy to derive from this probability statement it is 5 3 3 ZaZOX Y where 0XY is a square root expression AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 o If both m and n are large CLT implies that the normality assumption is not necessary and substitution of s for of i 1 2 will produce an approximately 1001 00 Cl 0 More precisely such an interval is 2 2 A 51 52 523 34 ZaQ E l E 0 Again this result should be used only if both m and n exceed 40 a Note that this Cl has a standard form of l zaga AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 Example 0 An experiment was conducted in which two types of engines A and B were compared Gas mileage in miles per gallon was measured 50 experiments were conducted using engine type A and 75 were done for engine type B The gasoline used and other conditions were held constant The average mileage for engine A was 36 mpg and the average for machine B was 42 mpg Find an approximate 96 Cl on LLB MA where MA and 3 are population mean gas mileage for machines A and B respectively Sample standard deviation are 6 and 8 for machines A and B respectively AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 o The point estimate ofuB MA is 3 33 53A 42 36 6 For 05 004 we find the critical value 602 205 0 Thus the confidence interval is 64 36 6 i 205i 343 387 36 50 AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 The TwoSample ttest o Assumptions Both populations are normal so that X1 Xm is a random sample from a normal distribution and so is Y1 Yn The plausibility of these assumptions can be judged by constructing a normal probability plot of the xis and another of the yis AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 a When the population distributions are both normal the standardized variable x Y ml m 5 5 Vm7 has approximately 75 distribution with V df T a V can be estimated from data as a V has to be rounded down to the nearest integerwhy not up AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 o The twosample con dence interval for M1 2 with confidence level 1001 00 is A onesided confidence bound can also be calculated as described earlier a The twosample ttest for testing H0 ul ug A0 is conducted using the test statistic t93 Z7 A0 s2 s2 we AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 Alternative hypothesis Rejection region for approximate level 05 test Hai l M2gtA0 tZtaV Hazul u2ltA0 tStaV Ha ul M2 7 A0 eithert 2 toy27 ort g ta27V o A Pvalue can be computed exactly as we did before for the onesample test AUG 2006 Statistics 511 Statistical Methods Dr Levine Example Purdue University Fall 2006 0 Consider example 96 in Devore The following table helps to illustrate it Fabric Type Sample Size Sample Mean Sample Standard Deviation Cotton 10 5171 79 Triacetate 10 12614 359 AUG 2006 Statistics 511 Statistical Methods Dr Levine Purdue c We assume that porosity distributions for both types of fabric are normal then the twosample ttestCl can be used Note that we do not assume anything about variances of the two populations concerned a The number of df is 6241 128812 10 10 V 6241102 12881102 93987 9 9 and we use V 9 University Fall 2006 AUG 2006 511 Statistical Memods Purdue University Dr Levine Fall 2006 a The resulting Cl is 96241 10 10 128881 5L7113614i2262 8706 8180 u Condusi on Ann 2066 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 Pooled t test a A simpler alternative test is available when it is known that of 0 o In this case standardizing X 37 we have which has a standard normal distribution AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 0 Instead of unknown 0 a we use the weighted average m 1 n 1 32 3 s p m n 2 m n 2 In this case both samples contribute equally to the common variance estimate 0 Substituting Si instead of 0 gives us a 1 distribution with m n 2 degrees of freedom This can serve as a basis for Cl s and tests analogous to the one we described in the previous sectiOn AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 Remarks 0 Traditionally this test has been recommended as the first to use when comparing two different means It has a number of advantages over the twosample t test it is a likelihood ratio test it is an exact test and it is easier to use 0 However this test has a major problem it is not robust to the violation of equality of variance assumption When a 0 its gains in power are small when compared to the twosample ttest That is why today it is often recommended to use the twosample t test in most cases It is especially true when the sample sizes are different AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 a It may seem to be a plausible idea that one could first test a hypothesis H0 0 0 and then choose the type of the t test based on the outcome 0 Unfortunately the most common type of test used for this purpose we will consider it at the very end of the course is very sensitive to the violation of normality assumption and often not very reliable as a result 0 Yet another warning concerns normality of the data If the distribution of the data is strongly asymmetric both of these tests will prove unreliable The alternative is to use a special class of tests that do not use any distribution assumptions at all socalled nonparametric tests AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 Analysis of Paired Data 0 The data consists of n independently selected pairs X1 Y1 X2 Xn Y with EXi M1 and EXi 2 The differences Di Xi Yi are assumed to be normally distributed with mean value up M1 2 and variance 0 The last requirement is usually the consequence of X s and Y s being normally distributed themselves AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 Example 0 Consider Ex 98 from Devore Six river locations were selected and the concentration of zinc in mgL determined for both surface water and bottom water at each location Presumably there is some connection between surface water and bottom water concentrations AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 The Paired t test a The test considered is H0 uD A0 where D X Y o The test statistic is t d A0 SD where Jan d 5D are the sample mean and standard deviation of CirS a Note that the old method of computing the variance of the difference does not work anymore since X and Y are NOT independent AUG 2006

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "Selling my MCAT study guides and notes has been a great source of side revenue while I'm in school. Some months I'm making over $500! Plus, it makes me happy knowing that I'm helping future med students with their MCAT."

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.