# ELEM STATISTICS [C3T1G1] MATH 220

JMU

GPA 3.61

This 33 page Class Notes was uploaded by Eunice Schoen on Saturday September 26, 2015. The Class Notes belongs to MATH 220 at James Madison University taught by Steven Garren in Fall.

Date Created: 09/26/15

Section 81 What Are the Steps for Performing 1 Signi cance Test January 87 2009 1 8 Statistical Inference Signi cance Tests About Hypotheses 81 What Are the Steps for Performing a Signi cance Test Example Legal setting Test the claim that Ralph committed armed robbery H 0 null hypothesis status quo conventional wisdom old idea accepted idea Ha alternative hypothesis the challenge to the con ventional wisdom new idea proposed idea If we wish to reject H 0 ie reject the idea that Ralph is innocent of armed robbery in favor of Ha ie in favor of the idea that Ralph is guilty of armed robbery we need overwhelming evidence to support our claim such as witnesses videotapes confession or DNA evidence Section 81 What Are the Steps for Performing 1 Signi cance Test January 87 2009 Otherwise we fail to prove him guilty ie not guilty Innocent until proved guilty We reject the null hypothesis in favor of the alter native hypothesis or we fail to reject the null hypothesis Do NOT say accept H0 which is equivalent to saying proved innocent as a substitution for fail to reject H0 What is the goal in hypothesis testing Regarding the goal of hypothesis testing the researcher is analogous to whom in the legal setting D Example State the hypotheses for testing the claim that peanut oil causes colon cancer D Example State the hypotheses for testing the claim that peanut oil prevents colon cancer Section 81 What Are the Steps for Performing 1 Signi cance Test January 87 2009 D Statistical setting Example Suppose a particular politician s ap proval rating last month was 55 You believe that this approval rating has decreased due to a scan dal State the appropriate hypotheses Let p be the unknown current population approval rating of this politician D Example Suppose the mean personal income of your community last year was 41000 You be lieve that mean personal income has increased due to improved infrastructure State the appropriate hypotheses Let u be the unknown population mean personal income this year D Example In a college s handbook the mean SAT Scction 82 Signi cance Tests About Proportions January 87 2009 4 score is listed as 1100 You believe that the informa tion is outdated State the appropriate hypotheses Let u be the unknown population mean SAT score 82 Signi cance Tests About Proportions Let p unknown population proportion Let p sarnple proportion We make inferences on 10 using the point estimate 15 We use large samples and apply the Central Limit The orern ie p is approximately normal for large n Onesample Ztest on a population proportion 19 Example Suppose that the National Safety Coun cil believes that more than 20 of all autornobile ac cidents involve pedestrians Test this Claim at Sig Section 82 Signi cance Tests About Proportions January 87 2009 5 ni cance level 04 005 Suppose a simple random sample of n 200 automobile accidents re sults in X 46 involving pedestrians a De ne your notation Let p be the unknown population propor tion of automobile accidents which involve pedes trians Let p be the sample proportion of automobile accidents which involve pedestrians b State the hypotheses 0 Check the rule of thumb under the null hypothesis d Determine our speci c value of p the point esti mate of p e What is the approximate distribution of 15 under H0 f Find the value of the standardized test statistic Section 82 Signi cance Tests About Proportions January 87 2009 6 g Find the P Value Standard normal table7 pp A17A2 z 00 01 02 03 04 05 n 07 08 09 711 1357 1335 1314 1292 1271 1251 1230 1210 1190 1170 n 1587 1562 1539 1515 1492 1469 1446 1423 1401 1379 709 1841 1814 1788 1762 1736 1711 1685 1660 1635 1611 Standard normal table7 pp A17A2 z 00 01 02 03 04 05 m 07 08 09 09 8159 8186 8212 8238 8264 8289 8315 8340 8365 8389 m 8413 8438 8461 8485 8508 8531 8554 8577 8599 8621 11 8643 8665 8686 8708 8729 8749 8770 8790 8810 8830 The Pvalue is the probability of Obtaining a value of the standardized test statistic at least as extreme as the Observed value based on the assumption that H 0 is true The smaller the P Value the stronger the evi dence against H 0 When the P Value is small we say that the data Section 82 Signi cance Tests About Proportions January 87 2009 are statistically signi cant In this example the P Value is 01446 h State the conclusion in statistical terms and in regular English Is the Pvalue small enough that we should reject H 0 Rule Reject H0 in favor of Ha if P Value g or other wise fail to reject H0 Wheri P Value 01446 would H 0 be rejected if or 005 oz 01 oz 02 oz 015 and 04 01446 A mathematically rigorous de nition The Pvalue is the smallest value of oz for which the hull hypoth esis would be rejected The PValue is also called the observed signi cance level Section 82 Signi cance Tests About Proportions January 87 2009 8 Do researchers typically prefer small or large P values The P Value is NOT PH0 is true For example if the DNA match is 001 for the defen dant does this imply that there is a 1 chance that the defendant is innocent P Value is PSuch strong evidence would exist against the defendant given that the defendant is innocent P Value is NOT PDefendant is innocent Example You Visit a foreign country on vacation and get thrown into jail for no apparent reason The blood at some crime scene is type A and you who unluckily have type A blood become the defendant Suppose that 42 of all people have type A blood Is there a 42 chance that you are innocent Is there an 58 chance that you are guilty D Section 82 Signi cance Tests About Proportions January 87 2009 Example Suppose that two summers ago 60 of thenrecent high school graduates enrolled in col lege We are interested in Whether or not the college enrollment rate changed since two summers ago Test the claim at signi cance level 04 01 Suppose a simple random sample of 500 most recent high school graduates results in 275 enrolled in col lege a De ne your notation Let p be the unknown population propor tion of most recent high school graduates Who are enrolled in college Let 13 be the sample proportion of most recent high school graduates Who are enrolled in college b State the null and alternative hypotheses c Check the rule of thumb under the null hypothesis 01 Under H0 What is the approximate distribution Section 82 Signi cance Tests About Proportions January 87 2009 10 of 13 the point estimate of p e Find the value of the standardized test statistic f Find the P Value Standard normal table7 pp A17A2 z 00 01 02 03 04 05 06 07 n 09 723 0107 0104 0102 0099 0096 0094 0091 0089 0087 0084 0139 0136 0132 0129 0125 0122 0119 0116 0113 0110 721 0179 0174 0170 0166 0162 0158 0154 0150 0146 0143 g State the conclusion in statistical terms and in regular English We conclude that the population proportion of most recent high school graduates who are en rolled in college DIFFERS from 60 D Remark Commonly used values of or are 001 005 and 01 Section 83 Signi cance Tests About Means January 87 2009 83 Signi cance Tests About Means Let a unknown population mean Let X sample mean We make inferences on a using the point estimate X We use large samples and apply Central Limit Theo rem ie X is approximately normal for large n Alternatively we start With an approximately nor mal population in which case X is approximately normal for any n Onesample ttest on a population mean u Recall For independent or nearly independent obser vations and nite a if the original population is approximately normal OR n is large then X H approx 7 N i 8N5 tn1 Section 83 Signi cance Tests About Means January 87 2009 12 Example The manufacturer of Chimney cigarettes lists the mean nicotine content as being 25 mg Vol cano Incorporated invented a new brand of cigarettes with the same great taste as Chimneys but claims that this new brand has a lower mean nicotine con tent To test the claim of Volcano Incorporated at signi cance level oz 01 a researcher samples the nicotine content of 41 Volcano cigarettes and nds X 247 mg and s 009 mg Let u population mean nicotine content of Volcano cigarettes a State the null and alternative hypotheses b Find the value of the standardized test statistic c Find the P value Section 83 Signi cance Tests About Means January 87 2009 13 t table7 p A3 Con dence Level 80 90 95 98 99 998 Right Tail Probability df 17100 17050 17025 17010 17005 17001 30 1310 1697 2042 2457 2750 3385 n 1303 1684 2021 2423 2704 3307 50 1299 1676 2009 2403 2678 3261 01 State the conclusion in statistical terms and in regular English We conclude that the population mean nico tine content of Volcano cigarettes is less than 25 mg D Example Suppose the mean lifetime of mice is 26 months Test at level 001 if a new strain of mice has mean lifetime different from 26 months Seven mice are independently sampled and their lifetimes in months are 20 23 13 7 17 15 10 Section 83 Signi cance Tests About Means January 87 2009 14 a De ne your notation b ls the original population approximately normal or is the sample size large c State the null and alternative hypotheses 21 Find the value of the standardized test statistic e Find the P value t table7 p A3 Con dence Level 80 90 95 98 99 998 Right Tail Probability df 17100 17050 17025 17010 17005 m 1476 2015 2571 3365 4032 5893 I 1440 1943 2447 3143 3707 5208 1415 1895 2365 2998 3499 4785 I f State the conclusion in statistical terms and in regular English We conclude that this new strain of mice has population mean lifetime different from 26 months Section 83 Signi cance Tests About Means January 87 2009 15 g Now construct a 99 con dence interval on u t table7 p A3 Con dence Level 80 90 95 98 m 998 Right Tail Probability df 17100 17050 17025 17010 17005 17001 1476 2015 2571 3365 4032 5893 I 1440 1943 2447 3143 3707 5208 1415 1895 2365 2998 3499 4785 1 Layman s interpretation We are 99 con dent that the population mean lifetime of this new strain of mice is between 720 months and 2280 months Mathematically rigorous interpre tation If we repeat the sampling procedure many times to construct many 99 con dence intervals on u the population mean lifetime of this new strain of mice then approximately 99 of these 99 con dence intervals will contain the true value of u Section 83 Signi cance Tests About Means January 87 2009 16 h Is our 99 con dence interval consistent with the conclusion of our 2 sided hypothesis test of level 05 001 i Suppose we had tested H0 against Ha u y 20 months at level 001 u 20 months D Remark A strong connection exists between 2 sided hypothesis tests of level 05 and 1 04 level con dence intervals Remark The t procedures are robust Hence even under some violations of the assumptions ie n is large or the original population is approximately normal the t test and the t con dence interval of ten produce accurate results anyway Example Suppose 20 observations are sampled from the following Uniform population D Section 84 Decisions and Types of Errors in Signi cance Tcsts January 87 2009 84 Decisions and Types of Errors in Signi cance Tests Recall The PValue is the probability of obtaining a value of the standardized test statistic at least as extreme as the observed value based on the assump tion that H 0 is true Recall We reject H0 if and only if P Value g 04 What is the likelihood that we erroneously reject H 0 when in fact H0 is true De nition A Type I error occurs if we reject H0 When H0 is true What is PType I error ie What is PWe reject HolHo is true De nition A Type II error occurs if we fail to reject H0 When Ha is true De ne PType ll error PWe fail to reject H olHa is true Section 84 Decisions and Types of Errors in Signi cance Tcsts January 87 2009 18 Example Consider the hypotheses H0 Ralph is innocent of armed robbery Ha Ralph is guilty of armed robbery Describe the Type I error Describe the Type II error Describe oz and in terms of probabilities and propor tioris D What are the possible values of 05 What are the possible values of Do we want or to be large or small Do we want to be large or small For the defendantsincoart example how can 05 be made srriall near zero What happens to as 05 gets srriall near zero For the defendantsincoart example how can be made srriall near zero Section 85 Limitations of in Signi cance Tests January 87 2009 What happens to 05 as gets small near zero Example In October 2002 just prior to the Per sian Gulf War Iraqi President Saddam Hussein re leased most all Iraqi prisoners and detainees What were Hussein s values of oz and D Example State the null and alternative hypothe ses When a person is tested for a disease Describe the Type I error in regular English and in medical terminology Describe the Type II error in regular English and in medical terminology D In the statistical setting how can we minimize both oz and 85 Limitations of Signi cance Tests Section 85 Limitations of in Signi cance Tests January 87 2009 20 Statistical signi cance does not mean practical significance De nition Data are statistically signi cant When the P Value is small ie P value 3 oz so the data suggest rejecting H0 in favor of Ha De nition Data are practically signi cant when the conclusion is of practical value Example Revisit Volcano cigarettes X sample mean nicotine content 247 mg H0 lt 25 mg Ha a lt 25 mg P Value lt 0025 lt 01 oz We concluded that the population mean nicotine content of Volcano cigarettes is less than 25 mg Are the data statistically signi cant Section 4 Should We Erpem39ment or Should We Merely Observe January 87 2009 1 4 Gathering Data 41 Should We Experiment or Should We Merely Observe Example Does cell phone use cause brain cancer a Conduct an observational study ie7 take a survey among humans 0 What are the drawbacks to this observational study b Conduct an experiment with humans 0 What are the drawbacks to this experiment c Conduct an experiment with mice 0 What are the drawbacks to this experiment Section 4 Should We Erpen39rnent or Should We Merely Observe January 87 2009 Example University of Michigan7 May 197 20037 news report by three social scientists Student drug testing not effective in reducing drug use77 wwwumichedunewsReleases2003May03r051903html Study was from 1998720017 based on 722 secondary schools 497 high schools and 225 middle schools Students were asked if they used marijuana in the past 12 months7 and a school administrator was asked about the drug testing policy Overall7 schools testing for drugs were virtually similar to schools not testing for drugs in terms of marijuana use among students The authors implied that drug testing is a waste of money a What are the explanatory and response variables b Was this study observational or an erperz39rnent c ls the authors7 implication valid D De nition Anecdotal evidence consists of self selected data What are some examples Section 42 What are Good Ways and P007quot Ways to Sample January 87 2009 3 A Survey and 21 Census Example Suppose we are interested in the population proportion of American adults who support the Presidents foreign policy in Afghanistan ldeally7 take a What would be the disadvantages What would be the advantages lnstead7 take a De nition A sample survey selects a sample of people from a population and interviews them to collect data77 What would be the advantages What would be the disadvantages 42 What are Good Ways and Poor Ways to Sample De nition A simple random sample of n subjects from a population is one in which each possible sample ofthat size has the same chance of being selected7 Section 42 What are Good Ways and P007quot Ways to Sample January 87 2009 4 Example Suppose we want to sample 80 students out of 7000 Should we select the rst 80 names on the list D Data collection may consist of personal interview7 telephone interview perhaps via random digit dialing7 or selfadministered questionnaire The margin of error measures the precision of an estimator For the estimator 13 ie7 the sample proportion7 the margin of error is roughly 1 7 gtlt 100 W Example Suppose that in a simple random sample of 250 adults7 the Republican is leading the Democrat by a vote of 135 to 115 ls it reasonable to conclude that the Republican really is winning D Example Suppose that in a simple random sample of 1000 adults7 the Republican is leading the Democrat by a vote of 540 to 460 ls it reasonable to conclude that the Republican really is winning Section 42 What are Good Ways and P007quot Ways to Sample January 87 2009 5 D De nition The sampling frame is the list of subjects in the population from which the sample is taken7 Example Sample only American senior citizens when estimating how American adults feel on Social Security issues What are the population and sampling frame Would the results of this poll be valid D Types of Bias in Sample Surveys 1 Sampling bias occurs from using nonrandom samples or having undercover age7 2 Nonresponse bias occurs when some sampled subjects cannot be reached or refuse to participate or fail to answer some questions77 3 Response bias occurs when the subject gives an incorrect response perhaps lying7 or the question wording or the way the interviewer asks the questions is confusing or misleading7 Example In 2000 for a JMU student research project7 the researchers asked ques tions similar to the following Seetiori 42 What are Good Ways arid Poor Ways to Sample January 87 2009 6 Do you smoke marijuana7 Do you think JMU students smoke marijuana7 D Example The Literary Digest Poll Know this example iri detail although you rieed riot memorize the numbers Franklin Roosevelt vs Alfred Landon7 Election of 1936 Since 19167 the Literary Digest correctly picked the Presidents Digest mailed questionnaires to 10 million people7 whose names were from country club membership lists7 phone books7 and automobile registrations George Gallup7 polling 507000 people7 predicted Digest s results in advance 3rd party earididates were ereluded iri the numbers below Roosevelt7s percentage The election result 62 Digest s prediction 43 Gallup7s prediction of Digest 44 Gallup7s prediction of election 56 Section 43 What are Good Ways and Poor Ways to Eaperz39meht January 87 2009 Example Thomas Dewey vs Harry Truman7 Election of 1948 Know this example in detail although you heed hot memorize the numbers candidates Crossley Gallup Roper results Harry Truman 45 44 38 50 Thomas Dewey 50 50 53 45 Strom Thurmond 2 2 5 3 George Wallace 3 4 4 2 A Gallup Poll interviewer in St Louis was required to interview 13 subjects of whom o 6 live in suburbs7 7 in central city 0 7 men7 6 women 0 AND additional criteria based on age7 race7 monthly rent D Do we trust the results of volunteer samples7 such as internet polls Why or why not 43 What are Good Ways and Poor Ways to Experiment De nition In an experiment7 the subjects may be called experimental units The three principles of experimental design are 1 control 2 randomization 7 Section 43 What are Good Ways and Poor Ways to Esperz39meht January 87 2009 3 replication Example Know this example in detail Diethylstibestrol DES is an arti cial hormone7 and was believed to prevent miscarriages The rate of miscarriages among pregnant women was known to be some xed number for non DES users In ve large studies7 pregnant women volunteered to try DES7 and the rate of miscar riages for these DES users was lower than for the non DES users In each of these ve studies7 the researchers concluded that DES lowers the rate of miscarriages Were the conclusions of these researchers valid Even in the late 1960s7 doctors were prescribing the drug to 507000 women each year DES was banned for use on pregnant women in 1971 D ldeally7 a doubleblinded experiment is best When is a double blinded experiment not possible or not ethical Example Salk Vaccine Field Trial Know this example in detail although you need not memom39ze the numbers In 1916 polio epidemic in United States Section 43 What are Good Ways and Poor Ways to Eavpcrz39mcnt January 87 2009 9 ln 1950s Jonas Salk had a promising vaccine7 which worked well in laboratory ie7 the vaccine seemed safe and produced antibodies against polio What now Test whether or not vaccine works a hypothetical Test vaccine on a small sample of children eg 10 children If successful on them7 mass distribute the vaccine b hypothetical Offer vaccine to a large number of children Typically7 not everyone will accept the vaccine We have two groups treatment those who accepted the vaccine and control those who declined the vaccine What is the explanatory variable What is the response variable Would this study be considered a valid experiment Section 43 What are Good Ways and P007quot Ways to Eavpcrz39mcnt January 87 2009 c real data The National Foundation for Infantile Paralysis NFIP proposed vaccinating all grade 2 children if consent was given7 and leaving grades 1 and 3 for control Would this study be considered a valid experiment d real data Randomized control Offer many children the ability to participate in the experiment7 but do not tell them if they are given treatment or placebo Also7 do not tell doctors or nurses Section 44 What are Other Ways to Conduct Experimental and Observational Studies January 87 2009 11 Statistics tell us that with large enough samples7 randomized controlled experiments determine whether or not a treatment eg7 drug7 vaccine works If the treatment group has a higher success rate than the placebo group7 we need to decide if this was due to chance or due to a successful treatment Typically7 we require overwhelming evidence that the treatment was successful before marketing the new vaccine Salk vaccine trial of 1954 Rate of polio per 1007000 Randomized controlled double blinded experiment The NFlP study size rate size rate treatment high hyg 2007000 28 Grade 2 vaccine7 high hyg 2257000 25 control high hyg 2007000 71 Grades 1 X5 3 control7 average hyg 7257000 54 no consent low hyg 3507000 46 Grade 2 no consent7 low hyg 1257000 44 D 44 What are Other Ways to Conduct Experimental and Observational Studies Recall An experiment has a treatment and a control7 whereas an observational study consists of just polling Section 44 What are Other Ways to Conduct Experimental and Observational Studies January 87 2009 12 Multifactor Experiments Often we are interested in more than one explanatory variable ie7 factor Suppose we are testing two different treatments for cancer What might be another factor of interest Experiments Randomized or Matched An experiment may use a completely randomized design 7 All experimental units are randomly as signed to treatment or control b matched pairs 7 Each individual or pairs of similar individuals is given both the treatment and the control Give an example of a medical experiment using matched pairs When is a medical experiment using matched pairs not possible Which is better7 and why completely randomized design or matched pairs Types of Observational Studies Sample Surveys Section 44 What are Other Ways to Conduct Experimental and Observational Studies January 87 2009 13 De nition A Simple random sample of n subjects from a population is one in which each possible sample ofthat size has the same chance of being selected from section 42 De nition A strati ed random sample divides the population into groups called strata7 and then selects a simple random sample from each stratum Example Suppose that a university is known to be 60 female and 40 male7 and a survey is to be conducted related to the abortion issue Enough funding or time is available to sample 100 students a How would a simple random sample be taken b How would a strati ed random sample be taken c Which sample is better and why De nition Divide the population into a large number of clusters Select a simple random sample of the clusters All elements within each cluster are sampled7 to form a cluster random sample Example You have one week to estimate the average annual church donation of Baptists members in Rhode Island a How would a simple random sample be taken

