### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# INTRO TO STATISTICS 2 STA 3024

UF

GPA 3.83

### View Full Document

## 19

## 0

## Popular in Course

## Popular in Statistics

This 31 page Class Notes was uploaded by Golden Bernhard on Friday September 18, 2015. The Class Notes belongs to STA 3024 at University of Florida taught by Staff in Fall. Since its upload, it has received 19 views. For similar materials see /class/206587/sta-3024-university-of-florida in Statistics at University of Florida.

## Popular in Statistics

## Reviews for INTRO TO STATISTICS 2

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 09/18/15

STA 3024 Introduction to Statistics 2 Chapter 2 Statistical Inference In our last review7 we looked at sampling distributions However7 when working with those sampling distributions we assumed that all the parameters are already known and given to us which is a bit unrealistic why In this chapter7 we ll be working under more realistic condition that is7 we assume that the parameters of the populations are unknown constant which we are trying to estimate by statistics from random samples Thus introduces statistical inference methods lnference methods help us to predict how close a sample statistic falls to the population parameter We can make decisions and predictions about populations even if we only have data for relatively few subjects from that population This chapter introduces two major types of statistical inference methods con dence intervals and hypothesis tests PART I CONFIDENCE INTERVALS The idea is that true we do not know what the population parameter is but we can give a set a range of reasonable estimates of the unknown parameter For example we do not know what is the meanaverage amount of time American college students watch TV per week yet we say It s somewhere around 2 hours to 5 hours The interval or time frame in this case two to ve hours is a con dence interval A con dence interval CI is an interval containing the most believable values for a parameter The probability that this method produces an interval that contains the pa rameters is called the con dence level usually denoted as 17 a The signi cant level is 04 which we ll more familiar with later on in this chapter We may choose pretty much any con dence level however it s common sense to choose a high in the 90 range con dence level Also a choice of 100 con dence level might result in useless Cl Take our hour watching TV example it s not quite informative to say that I am 100 certaint that the average amount of time American college students watch TV per week is between 0 hour to 168 hours The most three common con dence level are from most to less 95 99 and 90 So what does it mean by saying We are 95 con dent that the average amount of time American college students watch TV per week falls between 170 hours and 317 hours What this means is that if we repeated our study googol number of times and calculated a 95 con dence interval each time then very nearly 95 of those googols of con dence intervals would actually contains the true value of the parameter the mean amount of time American college students watch TV per week Fine but how can we calculate a 95 con dence interval or how can we calculate any con dence interval for each sample taken for that matter Here is the standard procedure 1 Check the assumptions 2 Find the unbiased point estimate for the parameter 3 Identify the con dence level 1 7 a 4 Find the margin of error corresponding to our con dence level The unbiased point estimate takes care of the accuracy and the controlled margin of error takes care of the precision The 04 level Cl will then looks like unbiased point estimate i margin of error Let s apply all this to contruct con dence intervals to estimate population proportions and population means 11 Con dence Interval for a Population Proportion Knowing the basic ideas it s quite straight forward to nd the CI in this case The 1 7 a con dence interval for the population proportion p follows the formula A 131 r 13 p l 2 gtk n 1 where o 13 is the sample proportion o n is the sample size The larger the sample size is the smaller the margin of error becomes equivalently the samller the CI the estimated standard error is the term u M n A the number 2a indicates how much the con dence level 04 should in uence the width of the CI In term of precision the more precise we want the CI to be equivalently the smaller 04 is the bigger the CI becomes equivalently the margin of error increases The three common z s are 20025 1960 20005 2576 and 20050 1645 which correspond to 95 99 and 90 CI respectively Important note on the needed assumptions The formula 1 can ONLY be applied when the sample is approximately normal dis tributed that is BOTH 7113 and n1 7 13 must be 2 15 We can only use 2 when the sample is approximately normal distributed On top of that the data must be obtained by randomization The later assumption is most likely be given however the former assumption must always be checked So what should we do when the rst assumption does not meet Suppose a random sample does NOT have at least 15 successes and 15 failures ie the rst assumption fails The formula 1 is still valid if we use it after adding 4 to the sample size n 2 to the original number of successes and 2 to the original number of failures Example Do You Like Tofu2 We randomly sample ve UF students to estimate the proportion of students who like tofu All ve students just love tofu and cannot get enough of it H Find the sample proportion who like it to Find and interprete the standard error 03 Find and interprete a 95 CI using the large sample formula Is the interpretation sensible I mean do all UF students like tofu for real F Use a more appropriate approach to nd and interprete a 95 CI Solution Suggested Problems Chapter 8 12 13 15 16 17 18 21 22 23 24 12 Con dence Interval for Difference between Two Population Propor tions from Independent Random Samples First of all what does it mean by independent samples Independent samples indi cates that the observations in one sample are independent with the observations in the other sample For example the amount of money men spend on haircut per year vs the amount of money women spend on haircut per year are independent How man spend money does not in uence how woman spend money and vice versa Since we are talking about independent samples we should also mention the notion of dependent samples Dependent samples result when the data are matched pairs each subject in one sample is matched with a subject in the other sample It is important to be able to identify whether the two samples are independent or de pendent Example Independent or Dependent Sample 0 Test the effects of a new drug by comparing it with a placebo treatment Sample 1 consists of 30 random patients who will receive the new drug treatment and sample 2 consists of 28 random patients who will receive the placebo treatment Solution 0 Test whether the use of cell phones impairs reaction times in a driving skill test A random group of 79 people was selected Reaction times are measured when subjects perform the test without using cell phones observations for sample 1 then reaction times are measured again while subjects use cell phones observations for sample 2 Solution Test whether freshman 1577 is a myth In one study of 110 female rst year college students the mean weight was 1306 lbs at the beginning of the study and 1314 lbs six months later Solution Test the effect of sunlight on cactus About 12 Jakarta cactus were put under intense light and about 10 Jakarta cactus were put in the dark Water and minerals are provided sufficiently to all 22 cactus After a week we measure how healthy each cactus is Solution In this section we only look at the method to construct the CI for difference between two population proportions from independent random samples Section 15 will deal with the dependent samples Let sample 1 and sample 2 be two independent random samples Let 131 and 132 be the two sample proportions for sample 1 and sample 2 respectively Similarly let n1 and n2 be the two sample size for sample 1 and sample 2 respectively A 1 7 a CI for the difference p1 7 p2 between two population proportions is 1311 131 1321 132 131 7132 i 2 se where 35 m m Important note on the needed assumptions To be able to apply the formula 2 we need the following conditions H Categorical response variable for two groups to Large enough sample size n1 and n2 so that in each sample there are at least 10 suc cess77 and at least 10 failures That is ALL four numbers 711131 n117 131 712132 and n21 7132 must be 2 10 How should we interpret a CI for a di erence of proportionsf2 Only one of the following cases will happen 0 Case 1 Zero falls in the Cl In that case it s plausible but not necessary that the population proportions are equal 0 Case 2 The CI consists of ONLY positive numbers If so we can infer that p1 7p2 gt O or equivalently p1 gt p2 0 Case 3 The CI consists of ONLY negative numbers If so we can infer that pl 7192 lt 0 or equivalently p1 lt p2 Furthermore the magnitude of values in the Cl tell us how large any true difference is If all values in the Cl are near 0 the true difference may be relatively small in practical terms Example Binge Drinking The PACE project at the University of Wisconsin in Madison deals with problems associated with high risk drinking on college campuses Based on ran doms samples the study states that the percentage of UW students who reported bingeing at least 3 times within the past 2 weeks was 312 in 1993 n 159 and 382 in 2005 n 485 H Estimate the difference between the proportions in 2005 and 1993 and interpret to Find the standard error for this difference lnterpret it 03 Construct and interpret a 95 CI to estimate the true change explaining how your interpretation re ects whether the interval contains 0 q State and check the assumptions for the CI in 3 to be valid Solution Suggested Problems Chapter 10 1 2 4 6 13 Con dence Interval for a Population Mean The 1 7 a con dence interval for the population mean i follows the formula 2 Eit Z 3 where o i is the sample mean 0 n is the sample size The larger the sample size is the smaller the margin of error becomes equivalently the samller the Cl 0 the estimated standard error is the term 0 similar to the numer 2 g for formula 1 the number ta in formular 3 indicates how much the con dence level 04 should in uence the width of the CI The t score is available in Table 13 To look up a certain t score not only do we need the signi cant level a but we also need the degree of freedom df where df n 7 1 Important note on the needed assumptions To use the formula 3 we need a the data must be obtained by randomization and b the population from which the sample is taken is approximately normal distributed From the later assumption it follows that i N Na Why are we using tg instead of using 2 g 2 Suppose we know the exact population standard error i of the sample mean Then certaintly we can use the z score for formula 3 since the population is assumed to be approximately normal However we usually do not know the parameter a the population std We try to estimate a by the sample std the s from our Chapter 1 Section 23 Substituting the sample std s for a to get 35 in then inreoduces extra error This error can be quite accountable when n is small Thus we must replace the z score by a slightly larger score called the t score to compensate Question Can you nd z score for a normal distribution on the t table Table B Also keep in mind that the t con dence interval method does NOT work well when there are extreme outliers in the data Example Palm PDA on Ebay The selling prices of the Palm M515 PDA for one week selling period in May 2003 the buy it now option was not used were 250 249 255 200 199 240 228 255 232 246 210 178 246 240 245 225 246 225 1 What assumptions are needed to construct a 95 CI for a Point out any assumptions that seem questionable hint draw out a dot plot to Find th sample mean i 03 Find and interpret the 95 Cl given that s 2194 4 Now delete the outlier nd the new sample mean Given that the new 3 1792 nd and interpret the new 95 CI for In How does it compare to the 95 CI using all the data Solution Suggested Problems Chapter 8 27 28 29 33 36 37 38 39 41 42 43 14 Con dence Interval for Difference between Population Means from Independent Random Samples For two independent random samples 1 and 2 with sample sizes 711 and 712 and stds 51 and 32 the 1 7 a CI for the difference 11 7 p2 between the population means is 2 2 4 2 i1 7 i2 i 25 55 where 35 711 m Important note on the needed assumption This method assumes an approximately normal distribution for the underline popula tion from each group How can we nd the t score in this case Usually we have to rely on software to nd the df for us in this case However if not using software we can take df to be the smaller of m 7 1 and n2 7 1 and this will be safe as the t score will be larger than we actually need If 51 52 and n1 712 the degree of freedom is equal to df m 1 n2 7 2 Extra The formula to calculate the degree of freedom is 15 2 d 7 n1 n2 f7 2 2 2 2 1 i1 1 2 n171 n1 n271 n2 which is called the Welch Satterthwaite formula We will de nitely not using this formula in solving problems How should we intemret a CI for a dl erence of meansf2 Similar to section 12 only one of the following cases will happen 0 Case 1 Zero falls in the Cl In that case it s plausible but not necessary that the population means are equal 0 Case 2 The CI consists of ONLY positive numbers If so we can infer that al 7112 gt 0 or equivalently M gt M 0 Case 3 The CI consists of ONLY negative numbers If so we can infer that M7 lt 0 or equivalently M lt ag Furthermore the magnitude of values in the Cl tell us how large any true difference is If all values in the Cl are near 0 the true difference may be relatively small in practical terms Example Palm PDA on Ebay cont The selling prices of the Palm M515 PDA for oneweek selling period in May 2003 where the buy it now option was not used were 250 249 255 200 199 240 228 255 232 246 210 178 246 240 245 225 246 225 The selling prices ofthe Palm M515 PDA for oneweek selling period in May 2003 where the buy it now option was used were 235 225 225 240 250 250 210 This is somewhat a typical software output MlNlTAB is the used program Twosample T for C1 vs C2 N Mean StDev SE Mean C1 7 233 6 14 6 55 C2 18 2316 219 52 Difference mu C1 mu C2 Estimate for difference 196 9570 CI for difference 1409 1801 TTest of difference 0 vs not TValue 026 PValue 0799 DF 16 Say on the quiz I deliberately delete the line 95 CI for difference 1409 180177 and ask you guys to nd and interpret a 95 CI for the difference Table 1 Example Does Exercise Help Blood Pressure Subject Befo re After Dl erence 1 150 130 20 2 165 140 25 3 135 120 15 Solution Suggested Problems Chapter 10 15 16 17 18 22a 23 15 Con dence Interval for Difference between Population Means and Population Proportions from Dependent Random Samples 1 put the methods of nding CI for diff between pop means and proportions into one section because they are quite similar Actually the method for prop is the corollary of the method for means So let s look at the method for the means rst When dealing with a dependent sample we are actually dealing with matched pairs A friendly remider the notion of matched pairs is that each subject in one sample is matched with a subject in the other sample Thus we can certaintly de ne the difference for each pair to be a random variabble we can nd the mean for that particular RV and nd a con dence interval for that mean using the method introduced in section 13 This de nitely simpli es the analysis since it reduces a 2 sample problem to a 1 sample problem I think it d be clearer with an example Example Does Exercise Help Blood Pressure Several recent studies have suggested that people who suffer from abnormally high blood pressure can bene t from regular exercise A medical researcher planned a small experiment She randomly samples three of her patients who have high blood pressure She measures their systolic blood pressure initially and then again a month later after they participate in her exercise program The table shows the results Table 2 C Table for Obesity Now and in 20 Years77 Example 20 Years after Baseline Baseline Normal l Overweight Normal 695 Overweight 87 827 The descriptive statistics for the Difference77 is Variable N Mean Median TrMean StDev SE Mean Difference 3 2000 2000 2000 500 289 1 Explain why the three Before77 and the three After77 observations are dependent samples 2 Find the sample mean of the before scores7 the sample mean for the after scores and the sample mean for the differences before after How are they related 3 Find a 95 CI for the difference between the population means of subjects before and after going through such a study lnterpret Solution To conduct inference about pl 7 p2 between the population proportions7 we can use the fact that the sample difference 131 7 132 is the mean of difference scores when we code the responses by 1 and 0 why We can nd a CI for pl 7 p2 by nding a CI for the population mean of difference scores similar to the technique above Example Obesity Now and in 20 Years Many medical studies have used a large sample of subjects from F ramingham7 MA who have been followed since 1948 A recent study gave the contingency table shown for weight at a baseline time and then 20 years later 11 H Identify the two samples whether they are independent or dependent Explain to Find the sample proporttion with normal weight at loaseline7 ii 20 years later Explain how each proportion can be found as a sample mean and how the estimated difference of population proportions is a difference of sample means 03 Given that the sample std is 0170 Find and interpret a 99 CI for the difference between the population prportions Solution Suggested Problems Chapter 10 49abc 50a7 527 537 547 557 57alo7 597 60abc 16 Choosing the Sample Size for a Study This is one of the major concerns in design expermiment How can we know how large our sample should be Oloviously7 if we need our estimate to be more precise and the margin of error is small7 then the sample size should be larger So is there some kind of formula to determine the needed sample size n Luckily7 there is7 and the formula can be derived easily from our Cl formula in section 11 and 13 Sample size for estimating a population proportion Recall from section 11 the margin of error m is calculated as followed 231723 m Zltgt T sz 22a P1 P a n 2 230723 f aw We maybe able to nd 13 from the given data However ifp cannot be calculated then we have to set 13 05 Example Vietnam Study A researcher planning the study in Vietnam is trying to esti mate the population proportion having at least high school education 0 Case 1 No information is available about 13 How large a sample size is needed to estimate p to within 007 with 90 con dence 0 Case 2 Well turn out her peer did a similar research but years ago An value 13 078 is given How large a sample size is needed to estimate p to within 0017 with 95 con dence Solution Sample size for estimating a population mean Recall from section 13 the margin of error m is calculated as followed m Wen87 2 2 52 ltgt t 7 m a n 13 However7 if the exact population value a is given intead of the sample std s then we use the z score instead of the t score which results in the following formula 02 2 n z a gtk 7 3 m2 What if neither 0 nor 3 is given In such case7 we ll have to estimate a z w why Example Vietnam Study cont Same scenario The questions now changed to 0 Case 1 No information is available about a How large a sample size is needed so that a 95 CI for the mean number of years of attained education has margin of error equal to 1 year Let s assume that it takes 18 years to nish an education starting from 1st grade 0 Case 2 A study already claim a 25 How large a sample size is needed to estimate In to within 075 with 90 con dence Solution Suggested Problems Chapter 8 477 487 497 507 517 52 PART II HYPOTHESIS TESTS Sometimes instead of trying to come up with a set of reasonable estimates for a popu lation parameter like the Cl method we try to answer questions like Well we think the parameter should be equal to this or that and we wonder if the collected random samples support our suspicionguess or not Thus introduces the notion of hypothesis tests The Five Steps of a Hypothesis Test A hypothesis test consists of ve steps We will go through each step in details 1 Assumptions Typically a hypothesis test will make assumptions about the nature of the data We should check these assumptions as best we can before we start in order to make sure that our results will be reliable to Hypotheses Each sigini cance test another term for hypothesis test has two hypotheses about a population parameter the null hypothesis and the alternative hypothesis The null hypothesis denoted by H0 is a statement that the parameter takes a particular value The alternative hypothesis denoted by H1 states that the parameter falls in some alternative range of values The test works by pretending that H0 is true and checking whether the observed data is reasonable under H0 If not then we conclude that H1 must be true instead The burden of proof falls on the alternative hypothesis Note our book has a good courtroom analogy on page 410 DJ Test Statistic The evidence in the data for or against H0 is summarized by the test statistic Different hypothesis tests will have different test statistics A test statistic decribes how far off the sample statistic is from the suspected value for the parameter in our null hypothesis If it s too far off then we have to say the sample does not support H0 and we go with the alternative Vice versa if it s close enough then we have to say the sample does support the H0 and we will stick with the null Right next below we will learn how much is too far off or how much is close enough F P value To interpret a test statistic value we use a probability summary of the evidence against the null called the P value The PValue is the probability of obtaining a result at least as extreme as the one that was actually observed assuming that the null hypothesis is true We actually use the notion of P value quite often To make sense the de nition here s an example I like We ip a coin many times and observe the outcome to see if the coin is fair Well say we ip the coin 1000 times and there are 993 tails in total Obviously no normal person would declare the coin is fair Why is that Because if the coin is fair then the probability that there are 993 tails out of 1000 ips is extremely low The smaller the P value the stronger the evidence is against H0 15 5 Making the Conclusion Report and interpret the P value Based on the P value make a decision about H0 Recall that the smaller the P value the stronger the evidence is against the null H0 However arised the question How small does P value need to be so that we re able to reject the null H0 The answer is we need to compare the P value with the signi cance level 04 which was assigned at the start of an experiment That is if p g a then we ll reject H0 There are cases where no signi cance level 04 is given then the range below might help o If 0 g p g 001 then there s a very strong evidence against the null H0 o If 001 lt p g 005 then there s a strong evidence against the null H0 o If 005 lt p g 010 then there s a moderate evidence against the null H0 o If 01 lt p then there s a weak evidence against the null H0 Frequent misunderstanding of the P Ualue o The p value is not the probability that the null hypothesis is true 0 The p value is not the probability that a nding is merely a uke77 o The p value is not the probability of falsely rejecting the null hypothesis 0 1 p value is not the probability of the alternative hypothesis being true We will now apply all this theoratical background to problems Similar to the structure of Part I Con dence Interval we will run through the cases of hypothesis tests for popu lation proportions and population means 21 Hypothesis Test for a Population Proportion 1 Assumptions 0 Categorical variable 0 Simple random sample 0 Sample size large enough so that the sample is approximately normal Similar to the assumption in section 11 BOTH npo and n1 7190 need to be 2 15 2 Hypotheses o Null H0 19 p0 where p0 is the hypothesized value for p 0 Alternative There are three possible alternative hypotheses 7 H1 p 7 p0 which results in a two sided test 7 H1 p lt p0 which results in a one sided test 7 H1 p gt p0 which results in a one sided test 16 3 Test statistic A 7 1 7 20 F7190 where 550 p O p0 350 n 4 P value Use the z distribution the standard normal distribution Be cautious the P value for one sided test is different from the P value from the two sided test Thus it s important to be able to identify which test it is to nd the correct P value 01 Conclusion If we can nd the corresponding P value for our test statistic then the conclusion based on the P value is straight forward Smaller P value give stronger evidence against H0 If a decision is needed reject H0 if P value 04 the signi cant level However most of the time P value can only be found using statistical software If we can not nd the P value when solving problems by hand we can still make conclusion based solely on the test statistic Given a signi cant level 04 o For the onesided test if 20 gt 20 then we can reject the null H0 0 For the two sided test it 20 gt 2 then we can reject the null H0 Example Which Cola The 49 random UF students made blinded evaluations of pairs of cola drinks For the 49 comparisons of Coke and Pepsi Coke was preferred 29 times ls this a strong evidence that a majority prefers one of the drinks Below is the MlNlTAB printout Variable X N Sample p 950 o CI PValue C1 29 49 05918 0454 0729 01985 1 Check the assumptions to Write out the hypotheses 03 Find the test statistic 20 q lnterpret the P value from the MlNlTAB output 01 Given that the signi cant level 04 005 make a conclusion based on the P value Assume that we cannot nd the P value make a conclusion based on the test statistic Are the two conclusions consistent with each other a What does the 95 Cl tell us that the test does not Solution Solution cont Suggested Problems Chapter 9 14 15 17 19 20 21 22 Hypothesis Test for Comparing Two Population Proportions 1 Assumptions 0 Categorical variable 0 Simple random sample 0 Sample size 711 and 712 are large enough so that ALL 711131 71117131 712132 and 71217 132 need to be 2 5 Note that m and 712 are sample sizes for sample 1 and sample 2 respectively 131 and 131 are sample proportions for sample 1 and sample 2 respectively 2 Hypotheses o Null H0 3191 p2 or equivalently pl 7 p2 O 0 Alternative There are three possible alternative hypotheses 7 H1 p1 31 p2 which results in a two sided test 7 H1 p1 lt p2 which results in a onesided test 7 H1 p1 gt p2 which results in a onesided test Table 3 C Table for Laundry Detergent77 Example Tried the New Product Seen the Ad Yes No Total Yes 131 362 503 No 67 1017 1084 Total 198 1379 1587 3 Test statistic A 7 A 7 0 1 1 20 p1 p2 with 350 131713 i 860 711 712 where 13 is the pooled estimate q P value Use the z distribution the standard normal distribution Be cautious the P value for one sided test is different from the P value from the two sided test Thus it s important to be able to identify which test it is to nd the correct P value 01 Conclusion If we can nd the corresponding P value for our test statistic then the conclusion based on the P value is straight forward Smaller P value give stronger evidence against H0 If a decision is needed reject H0 if P value 04 the signi cant level However most of the time P value can only be found using statistical software If we can not nd the P value when solving problems by hand we can still make conclusion based solely on the test statistic Given a signi cant level 04 o For the onesided test if 20 gt 20 then we can reject the null H0 0 For the two sided test it 20 gt 2 then we can reject the null H0 Example Laundry Detergent A manufacturer of laundry detergent has introduced a new product that it claims to be more inviromentally sound ls there a link beyween advertisement and the purchase rate An extensive survey gives the contingency table table 3 H Check the assumptions to State the hypotheses 03 Find each sample proportion 131 and 132 and nd the pooled proportion 13 F Find the test statistic 01 Make a decision about H0 using the signi cance level of 001 a Give the following MlNlTAB output Estimate for p1 p2 0198629 99 CI for p1 p2 0144819 0252440 Test for p1 p2 0 vs not 0 Z 951 PValue 0000 19 lnterprete the P Value 7 What does the 99 Cl tell us that the test does not Solution Suggested Problems Chapter 10 7 8 9 10 12 23 Hypothesis Test for Comparing Two Population Proportions from Dependent Samples The McNemar Test Recall that for binary categorical variables we can treat a proportion as a mean if we code the two possble outcomes by 1 for success and 0 for failure For matched pair data there are 4 combinations of 0 and 1 That is 1 The pair 11 for say yes yes 2 The pair 10 for yes no 3 The pair 01 for no yes 00 for no no AAAA 4 The pair Table 4 C Table for Obesity Now and in 20 Years77 Example 20 Years after Baseline Baseline Normal Overweight Total Normal 695 368 1063 Overweight 87 827 914 Total 782 1195 1977 Example Obesity Now and in 20 Years Let s repeat this example from section 15 Many medical studies have used a large sample of subjects from F ramingham MA who have been followed since 1948 A recent study gave the contingency table shown for weight at a baseline time and then 20 years later table 4 In this example the pair Normal at Baseline Normal 20 Years after Baseline is de ned to be the pair 11 and there are 695 counts of such pair in our study Similarly the pair Normal at Baseline Overweight 20 Years after Baseline is de ned to be the pair 10 and there are 368 counts of such pair The pair Overweight at Baseline Normal 20 Years after Baseline is de ned to be the pair 01 and there are 87 counts of such pair The pair Overweight at Baseline Overweight 20 Years after Baseline is de ned to be the pair 00 and there are 1195 counts of such pair A The McNemar test only uses the two pairs 10 and 01 to nd its test statistic 1 Assumptions 0 Categorical variable 0 Simple random sample 0 The total count for both pairs 10 and 01 is at least 30 For our obesity example above the total count 368 87 455 2 30 to Hypotheses o Null H0 3191 p2 or equivalently pl 7 p2 0 0 Alternative There are three possible alternative hypotheses 7 H1 pl 344 p2 which results in a two sided test 7 H1 p1 lt p2 which results in a onesided test 7 H1 p1 gt p2 which results in a onesided test OJ Test Statistic number of count for 10 7 number of count for 01 2 0 number of count for 1 0 1 number of count for 0 1 q P value Use the z distribution the standard normal distribution Be cautious the P value for one sided test is different from the P value from the two sided test Thus it s important to be able to identify which test it is to nd the correct P value Conclusion If we can nd the corresponding P value for our test statistic then the conclusion based on the P value is straight forward Smaller P value give stronger evidence against H0 If a decision is needed reject H0 if P value 04 the signi cant level However most of the time P value can only be found using statistical software 01 If we can not nd the P value when solving problems by hand we can still make conclusion based solely on the test statistic Given a signi cant level 04 o For the onesided test if 20 gt 20 then we can reject the null H0 0 For the two sided test it 20 gt 2 then we can reject the null H0 Example Obesity Now and in 20 Years cont Here are the questions for the example 1 State the hypotheses 2 Find the test statistic 3 Given the signi cance level 04 005 make a conclusion based on the test statistic 4 Based on our answer for 3 in what range do you expect the P value falls into Solution Suggested Problems Chapter 10 57 59 60 61 24 Hypothesis Test for a Population Mean 1 Assumptions 0 Quantitative variable 0 Simple random sample 0 Population distribution is approximately normal 22 2 Hypotheses o Null H0 a no where no is the hypothesized value for a 0 Alternative There are three possible alternative hypotheses 7 H1 a 7 ao which results in a two sided test 7 H1 a lt ao which results in a onesided test 7 H1 a gt no which results in a onesided test 7 2 m 7 0 s to u where 350 7 seo n P value Use the t distribution with df n 7 1 Be cautious the P value for onesided test is different from the P value from the two sided test Thus it s important to be able to identify which test it is to nd the correct P value OJ Test statistic F 01 Conclusion If we can nd the corresponding P value for our test statistic then the conclusion based on the P value is straight forward Smaller P value give stronger evidence against H0 If a decision is needed reject H0 if P value 04 the signi cant level However most of the time P value can only be found using statistical software If we can not nd the P value when solving problems by hand we can still make conclusion based solely on the test statistic Given a signi cant level 04 o For the onesided test if to gt ta then we can reject the null H0 0 For the two sided test it to gt t then we can reject the null H0 From here on when the problem asks to carry on a hypotheses test it means you need to spell out all of the needed steps State the hypotheses nd the test statistic interpret the P ualue and make conclusion Example Gatorade Bottling A certain machine that lls Gatorade bottles at the bot tling plant is supposed to dispense 20 ounces of Gatorade into each bottle We want to test whether the average number of ounces of Gatorade per bottle is equal to 20 or different from 20 To do this we will perform a test about one mean with 04 005 We take a random sample of ve Gatorade bottles and calculate i 1893 and s 047 Solution Suggested Problems Chapter 9 27 28 29 31 32 33 34 35 37 38 25 Hypothesis Test for Comparing Two Population Means 1 Assumptions 0 Quantitative variable 0 Simple random sample 0 Population distributions are approximately normal for each group to Hypotheses o Null H0 a1 ag or equivalently al 7 ag 0 0 Alternative There are three possible alternative hypotheses 7 H1 a1 74 pg which results in a two sided test 7 H1 M lt ag which results in a onesided test 7 H1 M gt ag which results in a onesided test 03 Test statistic 7 70 r 2 t0 1 2 Wlth 860 860 711 712 a P value Use the t distribution where the df is given by software Be cautious the P value for one sided test is different from the P value from the two sided test Thus it s important to be able to identify which test it is to nd the correct P value 5 Conclusion If we can nd the corresponding P value for our test statistic then the conclusion based on the P value is straight forward Smaller P value give stronger evidence against H0 If a decision is needed reject H0 if P value 04 the signi cant level However most of the time P value can only be found using statistical software If we can not nd the P value when solving problems by hand we can still make conclusion based solely on the test statistic Given a signi cant level 04 o For the onesided test if to gt ta then we can reject the null H0 0 For the two sided test it to gt 25 then we can reject the null H0 Example Palm PDA on Ebay repeated The selling prices of the Palm M515 PDA for oneweek selling period in May 2003 where the buy it now option was not used were 250 249 255 200 199 240 228 255 232 246 210 178 246 240 245 225 246 225 The selling prices ofthe Palm M515 PDA for oneweek selling period in May 2003 where the buy it now option was used were 235 225 225 240 250 250 210 The MlNlTAB output is N Mean StDev SE Mean C1 7 233 6 14 6 55 C2 18 2316 219 52 Difference mu C1 mu C2 Estimate for difference 196 9570 CI for difference 1409 1801 TTest of difference 0 vs not TValue 026 PValue 0799 DF 16 Test whether the mean price with buy it now option is different from the mean price without buy it now option given that 04 005 Solution Suggested Problems Chapter 10 22b7 257 28c 26 Hypothesis Test for Comparing Population Means for Dependent Samples Similar to the technique used in section 157 we can de ne the difference for each pair to be a random variabble we then obtain the mean for that particular RV and carry on the hypotheses testing for that mean using the method introduced in section 24 This de nitely simpli es the analysis7 since it reduces a 2 sample problem to a 1 sample problem Example Does Exercise Help Blood Pressure repeated Several recent studies have suggested that people who suffer from abnormally high blood pressure can bene t from regular exercise A medical researcher planned a small experiment She randomly samples three of her patients who have high blood pressure She measures their systolic blood pressure initially and then again a month later after they participate in her exercise program 25 Table 5 Example Does Exercise Help Blood Pressure Subject Before After Di erence 1 150 130 20 2 165 140 25 3 135 120 15 The table shows the results The descriptive statistics for the Difference is 990 Variable N C4 3 StDev 500 Mean 2000 SE Mean 289 Lower Bound 010 693 0010 Test whether exercise helps blood pressure or not using 04 001 Solution Suggested Problems Chapter 10 48 49d 50bcd 27 Decisions and Errors When we conduct a hypothesis test we of course don t know whether H0 is actually true or false If we did then why are we bothering with a test We hope that our test will make correct decisionsiwe hope our test will fail to reject H0 when it is actually true and we hope our test will reject H0 when it is actually false However we know that our test will sometimes make the wrong decision We call these wrong decisions type I or type II errors Although in reality we won t know when we ve made a type I or type II error we 26 can still talk about them in terms of how likely they theoretically are to occur A Type I error occurs when we reject H0 when it is actually true When H0 is true the signi cance level 04 is the probability that Type I error occurs When H0 is false a Type II error occurs when we fall to reject H0 As probabilty of type I error goes down probability of type II error goes up The two probabilities are inversely related Suppose we want to reduce out type I error That means we ll reduce the signi cance level 04 which inplies that we need a smaller P value to reject H0 Life becomes harder it s harder to reject H0 But then it also means that it s harder to reject H0 even when H0 is false keep in mind we do not absolutely know if H0 is true or false Thus the smaller we make the probability of type I error the larger the probability of type II error becomes We commonly use 04 005 because it often yields an acceptable compromise between type I and type II errors Suggested Problems Chapter 10 42 43 44 45 47 49 51 PART III CONFIDENCE INTERVALS VS HYPOTHESIS TESTS Traditionally many researchers use hypothesis tests as their chosen method of inference However most statisticians believe that hypothesis tests are used too frequently and that con dence intervals would often be better Agreement with Hypothesis Tests First a con dence interval for a parameter and a two sided hypothesis test about that same parameter will always agree about which parameter values are reasonable provided that the con dence level and signi cance level match in the appropriate way A 95 con dence level matches an 04 005 signi cance level for example Advantage of Con dence Intervals A hypothesis test about a parameter evaluates whether a certain value of that parameter is reasonable But if it concludes that the value isn t reasonable it gives us no information about a reasonable range for the interested parameter For example if a hypothesis test rejects H0 a 17 it gives us no information about just how far from 17 we think a might be Meanwhile a con dence interval provides this information by displaying all the reasonable values of the parameter Why We Still Use Hypothesis Tests Again an excellent question Quan Despite what we just mentioned we will still perform plenty of hypothesis tests The reason is simply that in many situations it may be dif cult to nd a con dence interval that does what we need it to do Example Three Population Means Suppose we have three populations and we are interested in whether their means are all the same whether al ag ag It s easy to construct con dence intervals to compare any two of the population means but it s not immediately clear how to construct one con dence interval that provides information about all three A Often a good strategy is to perform a hypothesis test rst to deal with all relevant parameters at once and to then follow up by constructing con dence intervals for individual parameters or differences between parameters 31 Evaluating Hypothesis Tests When we violate whatever assumptions are necessary for a particular hypothesis test we ll often ask whether the test still functions as intended For a hypothesis test there are two main things that we want to make sure it does preserve the probability of a type I error and minimize the probability of a type ll error Type I errors When we perform a hypothesis test we choose a signi cance level 04 Under proper conditions 04 is the probability of a type I error But when assumptions are violated this may no longer be the case If violation of an assumption causes a test s probability of a type I error to be substantially greater than 04 then the test doesn t work well when that assumption is violated Type II errors The ability of a hypothesis test to avoid type II errors is called its power If violation of an assumption causes a test to make a lot more type II errors that is if it causes the test to lose power then the test doesn t work well when that assumption is violated Note Power 1 PType ll error 32 Evaluating Con dence Intervals We ll also discuss how well various types of con dence intervals work under violations of their assumptions There are two main things a con dence interval should do in order to work well maintain the speci ed con dence level and be as narrow as possible Con dence Level Mathematically the con dence level is supposed to be the percentage of the time that the con dence interval will contain its true parameter value in the long run However if the assumptions for a particular con dence interval are violated then this may no longer be the case For example what we call a 95 con dence interval77 for a certain parameter might actually contain the true value of that parameter substantially less than 95 of the time If violation of an assumption causes this to happen then the con dence interval doesn t work well when that assumption is violated Narrowness The narrower a con dence interval is the more accurate it is If violation of an as sumption causes a certain con dence interval to tend to be substantially wider then the con dence interval doesn t work well when that assumption is violated To review for this chapter I suggest you look at the chapter problems at the end of chapter 8 9 and 10 and also the problems in the Part 3 Review Excercises77 starting from page 537 This is the end of chapter 2 Cheers Quan Tran Summer 2009 STA 3024 Exam 3 Topics Analysis of 2way Contingency Tables Pearson s chisquare test Observed counts nij Expected cell counts Enij Null and alternative hypotheses Test Statistic Rejection RegionPvalue Regression Models Simple Linear Regression Model Least Squres Estimation Estimated error variance 52 Computer output Inference concerning slope ttest con dence interval Con dence Interval for mean when X X Prediction Interval for response when X X Correlation Coef cient Coef cient of Determination Analysis of Variance in Regression Multiple Linear Regression Model Partial Regression Coef cients Least Squares Estimation Analysis of Variance Coef cient of Determination Test of H0 13p0 F test Test of H0 Bi0 ttest Test of H0 BgFIm p0 Complete vs Reduced F test Models with Dummy Variables Logistic Regression with 1 Predictor Model Test for association chisquare Fitted Values Odds Ratio and Con dence Interval 2Factor Analysis of Variance Additive Effects and Interaction Analysis of Variance Ftest for interaction Ftests for Main Effects

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "I used the money I made selling my notes & study guides to pay for spring break in Olympia, Washington...which was Sweet!"

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.