### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# 229 Class Note for STAT 51100 with Professor Levine at Purdue

### View Full Document

## 16

## 0

## Popular in Course

## Popular in Department

This 34 page Class Notes was uploaded by an elite notetaker on Friday February 6, 2015. The Class Notes belongs to a course at Purdue University taught by a professor in Fall. Since its upload, it has received 16 views.

## Similar to Course at Purdue

## Reviews for 229 Class Note for STAT 51100 with Professor Levine at Purdue

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 02/06/15

Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 Lecture 18 Inferences Based on Twa Samples Devon39s Section 91432 AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 Z Tests and Confidence Intervals for a Difference Between Two Population Means a An example of such hypothesis would be 1 ug 0 or 01 gt 02 It may also be appropriate to estimate 1 2 and compute its 1001 00 confidence interval 0 Assumptions 1 X1 Xm is a random sample from a population with mean 1 and variance 0 2 Y1 Yn is a random sample from a population with mean 2 and variance 0 339 The X and Y samples are independent of one another AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 o The natural estimator of M1 2 is X 37 To standardize this estimator we need to find E X 37 and V X Y o E X 37 ul 2 so X 37 is an unbiased estimator 0f 1 H2 The proof is elementary EX YEX EYm u2 AUG 2006 Statistics 511 Statistical Methods Dr Levine 02 a2 0 The standard deVIation of X Y is 0XY E1 W2 o The proof is also elementary VX Y VXVY E The standard deviation is the root of the above expression Purdue University Fall 2006 AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 The Case of Normal Populations with Known Variances a As before this assumption is a simplification 0 Under this assumption Z 1 2 2 02 m n has a standard normal distribution a The null hypothesis 1 ug O is a special case of the more general ul u2 A0 Replacing ul 1162 in 1 with A0 gives us a test statistic AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 o The following summary considers all possible types of alternatives 1 Ha ul 2 gt A0 has the rejection region Z 2 2a 2 Ha ul ug lt A0 has the rejection region z S za 3 Ha ul u2 7 A0 has the rejection region Z 2 Z042 or Z S Za2 AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 Example 0 Consider Ex 91 in Devore Sample sizes are m 20 and n 25 Note that m 7 nit is not important now but will be later a Note that the normality suggestion is based on some exploratory data analysis a The hypotheses are H0 ul ug O and Ha ul ug 3e 0 o The test statistic is 97 9 Z 2 2 2 Til71 AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 o Fora level of significance oz 001 2042 Z005 258 and the rejection regions is 2 g 258 or Z Z 258 o The computed value of z statistic is 366 which is well within the rejection region The Pvalue for this rejection region is 21 0366 w 0 which mean rejection at any reasonable level AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 Type II Error and the Choice of the Sample Size 0 Consider the case of an uppertailed alternative hypothesis Haiku M2 gtA0 o The rejection region is a g 2 A0 Za0Xy Therefore P Type II Error PX 7 lt A0Za0XY when Ml lug Al 0 Since X 37 is normally distributed under the alternative 1 ug A with mean A and standard deviation h 0XY m nWe ave 35 lt1gt 2a 039 A A0 AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 0 Similar results can be easily obtained for the other two possible alternatives In particular if Ha ul M lt A0 we have A A0 039 35 1 lt1gt za o If m ug 7 A0 the probability of Type II Error is 39 A A A A 39 q M To q To AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 Example 0 Consider Example 93 from Devore Suppose that the probability of detecting a difference 5 between the two means should be 90 Can the 01 level test with m 20 and n 25 support this a Fora twosample test we have 5 0 5 0 55 I 258 ltIgt 258 1251 0 Because the rejection region is symmetric we have 5 and therefore the probability of detecting a difference of 5 is 1 8749 a We can conclude that slightly larger sample sizes are needed AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 0 To determine a sample size that satisfies P Type II Error when m M2 Al we need to solve 0 0 A A0 m n 2a 25 o For two equal sample sizes this yields 0 0Za 2W A A0 quot7171 AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 LargeSample Tests 0 In this case the assumption of normality for the data is unnecessary and variances 0 0 need not be known a This is because for large n the variable XYm m s s m7 Z is approximately standard normal AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 0 Then if the null hypothesis is m M2 A0 the test statistic X Y A Z 0 is approximately standard normal under the null hypothesis 0 This test is usually appropriate if both m gt 40 and n gt 40 AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 Example 0 A company claims that its light bulbs are superior to those of its main competitor If a study showed that a sample of m 40 of its bulbs has a mean lifetime of 647 hours of continuous use with a standard deviation of 27 hours while a sample of n2 40 bulbs made by its main competitor had a mean lifetime of 638 hours of continuous use with a standard deviation of 31 hours does this substantiate the claim at the 005 level of significance AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 o Hozul u20andHazu1 u2gt0 o Reject H0 ifZ gt 1645 a Calculations 647 638 Z 138 272 312 4 0 E a Decision H0 cannot be rejected at 04 005 the p value is 00838 AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 Confidence intervals for M1 2 0 Since the test statistic Z that we just described is exactly normal when 0 and 0 are known X Y m m 2 0 2 7 1 P Za2 lt lt ZaQ 1 06 2 039 1 m o The 1001 00 CI is easy to derive from this probability statement it is 5 3 3 ZaZOX Y where 0XY is a square root expression AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 o If both m and n are large CLT implies that the normality assumption is not necessary and substitution of for of i 1 2 will produce an approximately 1001 00 Cl 0 More precisely such an interval is 2 2 i 81 5 2 x yzw EZ 0 Again this result should be used only if both m and n exceed 40 a Note that this Cl has a standard form of l galgag AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 Example 0 An experiment was conducted in which two types of engines A and B were compared Gas mileage in miles per gallon was measured 50 experiments were conducted using engine type A and 75 were done for engine type B The gasoline used and other conditions were held constant The average mileage for engine A was 36 mpg and the average for machine B was 42 mpg Find an approximate 96 Cl on uB uA where uA and 3 are population mean gas mileage for machines A and B respectively Sample standard deviation are 396 and 8 for machines A and B respectively AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 a The point estimate ofuB MA is 3 33 53A 42 36 6 For Oz 004 we find the critical value 202 205 0 Thus the confidence interval is 64 36 6 i 205 343 387 36 50 AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 The TwoSample ttest o Assumptions Both populations are normal so that X1 Xm is a random sample from a normal distribution and so is Y1 Yn The plausibility of these assumptions can be judged by constructing a normal probability plot of the asis and another of the yis AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 a When the population distributions are both normal the standardized variable X Y m m 5 5 m7 T has approximately 75 distribution with V df a U can be estimated from data as 2 2 2 8 1 8 2 s2mgt2 s2ngt2 2 n l y a V has to be rounded down to the nearest integerwhy not up AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 o The twosample con dence interval for M1 2 with confidence level 1001 00 is 2 2 S S LE gitaZV El F A onesided confidence bound can also be calculated as described earlier 0 The twosample ttest for testing H0 ul u2 A0 is conducted using the test statistic lgt i y 0 82 iEw t 3 I320 AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 Alternative hypothesis Rejection region for approximate level 05 test Ha ul 2 gt A0 252 750W Ha ul M lt A0 I S tau Ha ul M2 7 A0 eithert 2 tel27 ort g ta27V o A Pvalue can be computed exactly as we did before for the onesample test AUG 2006 Statistics 511 Statistical Methods Dr Levine Example Purdue University Fall 2006 0 Consider example 96 in Devore The following table helps to illustrate it Fabric Type Sample Size Sample Mean Sample Standard Deviation Cotton 10 5171 79 Triacetate 10 12614 359 AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 a We assume that porosity distributions for both types of fabric are normal then the twosample ttestCl can be used Note that we do not assume anything about variances of the two populations concerned a The number of df is 6241 128812 10 10 twang102 128891102 V 987 and we use V 9 AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall2006 o The resulting Cl is 5171 13614 k22621 1231 mfgsl 8706 8180 a Conclusion AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 Pooled t test a A simpler alternative test is available when it is known that 0 0 o In this case standardizing X 37 we have X Y Z 12 M2 a2 2 039 1 mn which has a standard normal distribution AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 0 Instead of unknown 0 0 we use the weighted average m 1 n 1 S2 812 83 In this case both samples contribute equally to the common variance estimate 0 Substituting Si instead of 0 gives us a 1 distribution with m n 2 degrees of freedom This can serve as a basis for Cl s and tests analogous to the one we described in the previous section AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 Remarks 0 Traditionally this test has been recommended as the first to use when comparing two different means It has a number of advantages over the twosample t test it is a likelihood ratio test it is an exact test and it is easier to use 0 However this test has a major problem it is not robust to the violation of equality of variance assumption When a 0 its gains in power are small when compared to the twosample ttest That is why today it is often recommended to use the twosample t test in most cases It is especially true when the sample sizes are different AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 o It may seem to be a plausible idea that one could first test a hypothesis H0 0 a and then choose the type of the t test based on the outcome 0 Unfortunately the most common type of test used for this purpose we will consider it at the very end of the course is very sensitive to the violation of normality assumption and often not very reliable as a result 0 Yet another warning concerns normality of the data If the distribution of the data is strongly asymmetric both of these tests will prove unreliable The alternative is to use a special class of tests that do not use any distribution assumptions at all socalled nonparametric tests AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 Analysis of Paired Data 0 The data consists of n independently selected pairs X1 Y1 X2 Xn Yn with EXi M1 and EXi 2 The differences Di Xi Yi are assumed to be normally distributed with mean value up ul 2 and variance 0 The last requirement is usually the consequence of X s and Y s being normally distributed themselves AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 Example 0 Consider Ex 98 from Devore Six river locations were selected and the concentration of zinc in mgL determined for both surface water and bottom water at each location Presumably there is some connection between surface water and bottom water concentrations AUG 2006 Statistics 511 Statistical Methods Purdue University Dr Levine Fall 2006 The Paired t test a The test considered is H0 up A0 where D X Y o The test statistic is t d AO SDIW where Jan d 5D are the sample mean and standard deviation of dz 8 a Note that the old method of computing the variance of the difference does not work anymore since X and Y are NOT independent AUG 2006

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "I bought an awesome study guide, which helped me get an A in my Math 34B class this quarter!"

#### "There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

#### "Their 'Elite Notetakers' are making over $1,200/month in sales by creating high quality content that helps their classmates in a time of need."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.