### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# Introductory Statistics

### View Full Document

## 816

## 0

## Popular in Course

## Popular in Statistics

This 608 page was uploaded by Harsh Karthik on Sunday March 9, 2014. The belongs to a course at University of California - Los Angeles taught by a professor in Fall. Since its upload, it has received 816 views.

## Similar to Course at UCLA

## Popular in Statistics

## Reviews for Introductory Statistics

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 03/09/14

Copyright 2013 Pearson Education Inc All rights reserved ALWAYS4EARquotNNG Chapter 1 Introduction to Data 2 S Learning Objectives Cl Distinguish between numerical and categorical variables Cl Find and use rates including percentages and understand when and why they are more useful than counts for describing and comparing groups Cl Understand when it is possible to infer a causeandeffect relationship Cl Explain how confounding variables prevent us from inferring causation and suggest confounding variables that are likely to occur in some situations Cl Be able to distinguish between observational studies and controlled experiments Copyright 2013 Pearson Education Inc All rights reserved Statistics is the science of problem solving using data A statistical investigative process involves with collecting organizing summarizing and analyzing information in order to draw conclusions 0 D ata Cl Data are observations that you or someone else records Cl Numeric data Cl Pain Satisfaction scale from 110 Cl Measurements Weight Height Distance etc Cl Number of customers during a business day Cl Data can also be non numeric Cl List of song titles stored on IPod Cl Ethnicities of each student on campus Cl Party preference in the upcoming election Copyright 2013 Pearson Education Inc All rights reserved Variation K Cl Both represent the letter A but how do they vary 3 Write the letter A three times on a piece of paper How can you measure the Variation Cl How does your Writing Vary from your neighbor s Writing Cl There is no need for statistics if data contain no Variation Copyright 2013 Pearson Education Inc All rights reserved to More Than Just Numbers P 832 791 964 918 lO33746 Cl As just numbers this list is uninteresting What can you say if this list represents Cl Birth Weights Cl Price in dollars and cents for lunch 3 Commuting distances Cl Minutes to run a mile Copyright 2013 Pearson Education Inc All rights reserved Some Data Collection Instruments Cl Polling Cl Preferred Customer Card Cl Google Analytics Cl Application for college admission Cl Doctor s charts Copyright 2013 Pearson Education Inc All rights reserved Sample vs Population CI The population is the collection of all data values It includes all outcomes that have ever or could ever occur Information about the population is usually the goal however obtaining all data values from the population is usually impossible Cl A sample is a subset of the population A sample is used to get a partial understanding of the population 1 8 Copyright 2013 Pearson Education Inc All rights reserved 0c Variables A Variable is a characteristic of the individuals sampling units Within a population Some examples are Cl A person s gender Cl The weight of a newborn puppy CI The concentration of CO2 in the atmosphere The word variable comes from the fact that the measurements vary from one Value to another Copyright 2013 Pearson Education Inc All rights reserved M Example How can statistical techniques be applied to solve the following problem UCLA is interested in revising its recruiting policy A study Will be conducted to find out where the current UCLA students come from and the reasons for choosing UCLA j S The Process of a Statistical Investigation Step 1 Identi z a Research Objective Step 2 Collect the information needed to answer the questions Step 3 Organize and summarize the information Step 4 Draw conclusions from the information P39l7 39QI O Step 1 Identify a Research Objective Researcher must determine a question he she Wants answered question must be detailed Identify the group to be studied This group is called the population An individual person or object that is a member of the population being studied is the sampling unit 3 Step 2 Collect the information needed to answer the questions In conducting research We typically look at a subset of the population called a sample Step 3 Organize and summarize the information Descriptive statistics consists of organizing and summarizing the information collected Consists of charts tables and numerical summaries H C Step 1 Identify the research objective Consider the Recruiting Example To investigate where the current UCLA students come from and the reasons for choosing UCLA So that the information can be used for revising recruiting policy Determine the target population for the study all UCLA students Determine the sampling unit for the study 1 UCLA student Determine the number of individuals to be chosen for the study sample size n 5OO for eXample Step 2 Collect the information needed to B answer the question Develop a survey with questions asking their demographic information and reasons for choosing UCLA Select a sample of individuals for the study by using random sampling technique such as assigning an ID number for each individual in the population randomly select 14 individuals as the random sample Distribute the survey to the randomly selected individuals to collect the information needed Step 3 Organize and summarize the information An example of a sammarfrom sample data The demographic information from the sample indicated 90 students are from Within 200 miles of UCLA 70 of students are from cities along 405 down south of UCLA Three main reasons selecting UCLA are It is at the right distance from home Come for the programs in teacher education It is less expensive than others Step 4 Draw conclusions from the data An example of drawing conclusionsfrom the Summary and potential policy revision Based on the summary of the random sample of 500 students We conclude that approximately 90 of all UCLA students live with 200 miles from UCLA and that most of them are from areas along 405 UCLA may Want to provide more information to schools Within 200 to 300 miles from UCLA and provide some targeted visits to schools In addition UCLA may Want to make effort to recruit more diverse students from intemational community pUu Key Terms 1 Population Universe All subjects of interest 2 Sample Portion of population random unbiasea representative 3 I opulation Earameter Summary measure about a population unique often unknown 4 Sample Statistic Summary measure about a sample varies from sample to sample N Another Example of a Sample and a I1 Population Birth data from North Cl Population All babies Ca a2 quot4 pZ P Weight Gender Smoke 769 F 0 Cl Sample These SIX babies 088 M 1 Cl Variables Weight Gender 600 F 0 Mother Smoked 73919 F 0 806 F O K 794 F O Copyright 2013 Pearson Education Inc All rights reserved p Other Questions of Interest Cl How Were the Data Collected Hospital reports to the government g 1iia2g5 f North Medical CareGiver Weighed the infant and surveyed the mother Wag Gender Smke Cl What Were the Units of 73969 F 0 Measurement pounds F and M for 03988 M 1 girls and boys O gtNonsmoker 500 F 0 l gtSmoker 719 F 0 Cl Where Were the Data Collected 806 F 0 North Carolina K 794 F 0 Cl Why Were the Data Collected To leam about infant health as it relates to mothers smoking Copyright 2013 Pearson Education Inc All rights reserved Two Types of Variables Cl A Quantitative or Numerical Variable describes quantities of the objects of interest Data values are numbers Cl Weight of an infant Cl Number of sexual partners Cl Time to run a mile Cl A Qualitative or Categorical Variable describes qualities of the objects of interest Data values are usually words Cl Skin color Cl Birth city Cl Last Name 1 22 Copyright 2013 Pearson Education Inc All rights reserved 8 Two Types of Variables Cl A Quantitative or Numerical Variable describes quantities of the objects of interest Data values are numbers Cl Weight of an infant Cl Number of sexual partners Cl Time to run a mile Cl A Qualitative or Categorical Variable describes qualities of the objects of interest Data values are usually words Cl Skin color Cl Birth city Cl Last Name 1 23 Copyright 2013 Pearson Education Inc All rights reserved 5 b A E Example Numerical or Categorical W Age Gender Major Units Housing GPA 18 Male Psychology 16 Dorm 3 6 21 Male Nursing 15 Parents 31 Cl Numerical A quot L LtCal 0 Cl Categorical Cl Age Cl Gender 539 Units Cl Maj or 539 GPA Cl Housing Copyright 2013 Pearson Education Inc All rights reserved o Numerical or Categorical Why are you in college Answer 1 Person Growth 2 Career Opportunities 3 Parental Pressure 4 Personal Networking K Results l4322 l233 142 Cl Coding Categorical Data with Numbers Although the above data Values are numbers the Variable is still categorical Cl Reason for Coding Easier to input into a computer Copyright 2013 Pearson Education Inc All rights reserved a Coding YesN o Questions Cl We often use 0 for No and l for Yes Cl Useful for data with only two possible Values Cl True or False Cl Black or White 3 Success or Failure Bernoulli Cl Dead or Alive Cl Head or Tail coin toss Copyright 2013 Pearson Education Inc All rights reserved j EXAMPLE Distinguishing Between r Qualitative and Quantitative Variables Determine Whether the following variables are qualitative or quantitative a Flavors of icecream b Speed of Roger Federer39s serve C Number of times your lntemet service Went down in the last 30 days d Zip codes 0w Types of Quantitative Variables A discrete variable is a quantitative variable that either has a nite number of possible values or a countable number of possible values The term countable means the values result from counting such as O 1 2 3 and so on A continuous interval variable is a quantitative variable that is de ned on an interval of real numbers and can be measured to any desired level of accuracy Q Types of Qualitative Variables An ordinal variable is a qualitative variable that bears an ordering and can be ranked from the lowest level to the highest level It can be easily coded into discrete numbers For example rating of a professorcourse customer satisfaction level A nominal variable is a qualitative variable that simply classify data into categories Without any specific order For example ethnicity FirstLast Name Email etc Two Way Tables W Gender and Seat Men WOWquot Belt Practices N A1Wquot yS 2 3 Always 4 7 Cl TwoWay Tables show how many times each combination of categories occur Cl The 2 in the above table tells us that there were two men from the sample who do not always wear seat belts Cl A frequency is the number of times the Value is observed in a data set 1 30 Copyright 2013 Pearson Education Inc All rights reserved I Two Way Tables and Frequencies W Gender and Seat Men Women Belt Practices N WayS 2 3 Always 4 7 Cl How many women were surveyed Cl Women 3 7 10 Cl How many from the survey always wear a seat belt Cl Always 4 7 11 Cl How many women from the survey do not always wear a seat belt Cl Women AND Not Always 3 Copyright 2013 Pearson Education Inc All rights reserved c 5 TWc Way Tables and Percentages W Gender and Seat Men Women Belt Practices N quot ayS 2 3 Always 4 7 J D What percent are men C M6113 100 0375 2 4 3 7 D What percent always Wear a seat belt D A1WaYS x 100 06875 2 4 37 Copyright 2013 Pearson Education Inc All rights reserved 1 33 Percentages to Counts Ofthe 400 students who were surveyed 65 were carrying their calculators Of those carrying their calculators 4 0 were men Cl How many of these 400 students were carrying their calculator Cl 400 X 065 260 El How many female students were carrying a calculator Cl 260 X 060 156 Copyright 2013 Pearson Education Inc All rights reserved 0 Incomplete Data 6 Sport Number of Injuries Sports Injuries Baseball 178668 Basketball 615546 Answer the following kbotball 387948 true or false Cl There are more basketball injuries than football injuries Cl True 615546 gt 387948 El Basketball is the most dangerous sport among the three listed sports Cl This cannot be inferred from the data since We do not know the number of players in each sport 1 34 Copyright 2013 Pearson Education Inc All rights reserved K ag P C 6 D Sport Injuries Players Baseball 178668 15600000 Sportslnju1i S Basketball 615546 28500000 Football 387948 17700000 J Answer the following true or false Cl Basketball is the most dangerous sport CI Baseball gtlt100z 115 CI Basketball x1ooz 216 Cl Football gtlt100 219 Cl False football is the most dangerous based on this data set Copyright 2013 Pearson Education Inc All rights reserved e Organizing and Reporting Categorical Data Cl Use a 2Way Table to display results of a two question survey or two outcome experiment Cl Use percents or rates rather than counts when comparing groups with different sizes Copyright 2013 Pearson Education Inc All rights reserved 3 V 7f V a4 Misleading Statements Cl China s total CO2 emissions are higher than America s so the Chinese are worse polluters Cl Problem Since there are many more Chinese than Americans the per capita emissions are higher for Americans Cl The government receives more of its total revenue from middle class Americans than from the rich so the rich are under taxed Cl Problem There are many more middle class Americans The rich are actually taxed at a higher rate 1 37 Copyright 2013 Pearson Education Inc All rights reserved P G Observational Studies Vs Experiment Cl An Experiment consciously places participants into two or more groups and records the differences used to prove a causeandeffect relationship Cl An Observational Study uses groups that are already created and records the differences In observational studies researchers do not assign choices they simply observe them 1 39 Types of Observational Studies Cl Retrospective Study Researchers use historical data for their analyses Cl Prospective Study Researchers identify subjects of interest in advance and then collect data as events unfold Cl It is NOT possible for observational studies to demonstrate any causal relationship Instead We may observe associations Copyright 2013 Pearson Education Inc All rights reserved o p w 1 40 Controlled Experiments Cl A controlled experiment is an experiment where each individual is assigned to either the control group or treatment groups Cl Replicate The sample sizes should be large enough to account for other sources for Variability Cl Rcmdomizez Assignment should be done randomly Copyright 2013 Pearson Education Inc All rights reserved r Establishing Causality Cl Establishing Causality means to show that a difference in the response variable is effected by some level of a treatment Cl Treatment Groups contain the individuals who receive the treatment Cl The Control Group has the individuals Who do not receive any treatment Cl Example To see if red Wine decreases the chance of heart disease 800 red Wine drinkers and 900 non red wine drinkers were observed 1 41 Copyright 2013 Pearson Education Inc All rights reserved E A More on Control Cl There may be other factors not included in the study that have an effect on the response Variable Cl We control all other sources of Variation to prevent them from changing and affecting the response Variable Copyright 2013 Pearson Education Inc All rights reserved p1 Anecdotal Evidence is not Science Cl An Anecdote is a story that a single individual tells about his or her own experience Cl An anecdote should never be used to make a statement about a group of individuals Cl If Jack drank peanut juice and notices that he became much healthier that does not mean that peanut juice makes people healthy 1 43 Copyright 2013 Pearson Education Inc All rights reserved s 1 44 Randomization Cl Ensures that samples look like the rest of the population Cl Allows us to nearly equalize the effects of unknown or uncontrollable sources of Variation Cl If samples are not random or representative we can not apply statistical methods to draw conclusions from an experiment Copyright 2013 Pearson Education Inc All rights reserved 4 an 0S An Example of Experiment Research Question Does the use of fertilizer improve the tastiness of Watermelons 7 fertilizer 39 Subject Watermelon Copyright 2013 Pearson Education Inc All rights reserved 5 5 An Example Cont 45 identical seeds Group 1 Group 3 15 plants 15 plants 15 plants Control all other sources of Va iation sun exposure ater etc 1 Control Treatment 1 Treatment 2 G1 O11p 12 dose full dose Com are tastiness 1 47 Association is not Causation 539 Unless the individuals of the study are identical in every Way except for the treatment We cannot conclude that the treatment caused the outcome Cl People with grey hair are observed to have more wrinkles This does not mean that grey hair causes Wrinkles Cl A confounding variable is a characteristic other than the treatment that causes both outcomes For example old age causes both grey hair and Wrinkles Copyright 2013 Pearson Education Inc All rights reserved Placebo Effect CI The placebo effect is the phenomenon of reacting after being told of receiving a treatment even if there was no actual treatment given Cl Example After being told by a doctor that an empty capsule will treat the patient s fever the patient s temperature decreases 1 48 Copyright 2013 Pearson Education Inc All rights reserved Random Assignment and Bias 3 Use a computer or other physical method that randomly divides the population into the control and treatment groups This is called random assignment Cl Bias may occur When the assignments are not random and the results are in uenced in a particular direction Cl Example Giving tutoring to the first 15 who register for a 30 student class to see if tutoring helps with exam scores This is biased since early registrants are likely to be better students 1 49 Copyright 2013 Pearson Education Inc All rights reserved J 6 Blind and Double Blind Study Cl A blind study is a control study Where the participants do not know Whether they are in the control group or the treatment group Cl A double blind study is a blind study Where the person administering treatment also does not know who is in the control group and who is in the treatment group 1 50 Copyright 2013 Pearson Education Inc All rights reserved P Double Blind Study Example Cl A researcher Wants to determine Whether a multivitamin pill shortens recovery time from the common cold The researcher prepares 300 bottles of the multivitamins and 300 bottles of identically looking placebo pills The bottles are marked with serial numbers that only the researcher can decipher The pharmacist does not know which are Which but writes down the numbers in a log Copyright 2013 Pearson Education Inc All rights reserved Copyright 2013 Pearson Education Inc All rights reserved ALWAYS4EARquotNNG hapter 1 Case Study Deadly Cell Phones 0 0 Case Study Cl Dr Christopher Newman sued claiming that cell phone use caused his brain tumor 3 Suit cited a TV show Where a man s Wife used a cell phone often and died of a brain tumor 3 Suit also cited a study of people who recovered from brain tumors Cl The study found a high percent of cell phone users from the brain tumor in icted population 1 53 Copyright 2013 Pearson Education Inc All rights reserved 0 1 54 Anecdotal Evidence Cl Citing a single person as evidence that cell phone use causes brain tumors is anecdotal evidence Cl One cannot conclude anything about the whole population from anecdotal evidence of a single person Cl This piece of the doctor s argument is completely invalid Copyright 2013 Pearson Education Inc All rights reserved 0 4 Problems With the Study Cited Cl Participants were not assigned to use or not use a cell phone Cl Looking at brain cancer survivors only excludes all Who did not survive Cl A followup control study with mice did not agree with the original study 1 55 Copyright 2013 Pearson Education Inc All rights reserved Xn Conclusion of Lawsuit CI The judge ruled that the evidence was insufficient Cl Evidence did not establish a causal relationship between cell phone use and brain tumors CI The judge threw out the case Copyright 2013 Pearson Education Inc All rights reserved Copyright 2013 Pearson Education Inc All rights reserved ALWAYS4EARquotNNG Chapter 2 Picturing Variation with Graphs Chapter 3 Numerical Summaries of Center and Variation 0Q Visualizing Data Cl Organize the data using the chart that most effectively Visually summarizes the data Cl Make a picture Cl Comment on the distribution shape center spread and etc Search for pattems Identify unusual features if any Cl Draw conclusions and cautiously make predictions Copyright 2013 Pearson Education Inc All rights reserved 5 D Some Commonly Used Plots 1 Organize qualitative data using a Frequency and relative frequency table b Bar chart c Pie chart and d Pareto chart 2 Organize quantitative data for 1 Discrete Data using a Frequency and relative frequency table b bar chart c pie chart and d Pareto chart 11 Continuous Data using a historam b stemleaf lot c timeseries lot Copyright 2013 Pearson Education Inc All rights reserved h3 Organizing Categorical Data Cl For qualitative data three measurements are available for different levels in the variable the frequency or raw counts of observations in that category the relative frequencyie proportion frequency total of obs the percentage Cl A pie chart is a circular graph showing how the measurements are distributed among the categories Cl A bar chart displays the distribution of observed categorical values in bars With the height of each bar equal to its relative frequency Cl A bar chart in Which bars are ordered from largest to smallest With cumulative is called a Pareto chart Copyright 2013 Pearson Education Inc All rights reserved J Pie Chart Example Whit i 1 Female Mame p Yur Gender A snuwefy f am indiuzdluiala are Esujwey 1 l39EItE1LhaE schmn quajliw Du Theaita i Ell39I391quot39il39 IquotIal39iEE4dZ I aitinr m E w FrEquEmny pgu 25 12 Vv Vv Vv Vv 5 339rquot a 5 rquotu 2325 S Draw 3 pie hrzzhart ka ErCIquot39IEI 39il and La FarEitn niharit 394 M2 iEaiii39rE Frequency Perwerntaghe F In IUw ELF p is LiuIquot 39 J6 P IqiE39iquot39nari F395 E l3939njquot 9 quotV 39 C 3 ID 3 I c z 1 s quot l7f 3N PK Pareto Chart Example from Wikipedia Fasretn Eilhaft f LateA Arrivalls by REFEIFEIEH EatzisE 1vE E ZI39 lt P m ism L 1 T 39939Ci39 39 13494 W a hnqa 1115 n gt Ma 1103 E 34 i 5 r E 2 39 n4 39 I 3 339quotLuquot E3 I139quot quoti mu j j 43 Traffic Chiill care iFquotLJh iII weatihar EverEIepti E mer gency tra E115 psrIat nri N h Side by Side Bar Chart 4 lsi le FermlE iquotI E I Q E Baths pa 1 i1UH E Copyright 2013 Pearson Education Inc All rights reserved Segmented Bar Chart I4iI I4eetin p EIIIIII Eler Cheri Ive SIIIIII IIIIIIII EIIIIII EIIIIII 1II Frequency nf gender female male gender E IIIII39IIEquotItiElquot39g39 dengereue IILliI391 I4eetiIr utit 0d 3 55 CH3 pf 35 EIIIIII H 39u I1 5 25D borderline heerthy39 IlII39IIElquotItiElquot39g39 zierugerII IIres Iunt 1 I ender El fernele E rnele lquot5 lN Pf Organizing Quantitative Data Cl For a discrete variable if it takes only a handful of values We may use bar chart pie chart or Pareto chart just as We did for qualitative variables Cl For a continuous variable Whose values are defined as intervals on the realnumber line or one that can assume an infinite number of values We may use Stem and leaf plot histogram Time series plot if some measure of time is observed for each value 1 10 Copyright 2013 Pearson Education Inc All rights reserved l E Dot Plots for Numerical Variables Cl A Dot Plot is a chart that contains a dot for each data Value Cl Bene ts Cl Shows the individual data Values Cl Easy to spot outliers Cl Describes the distribution Visually Cl Drawbacks Cl Not as common as bar and pie charts 3 Not great for data that has too many individual Values 1 1 1 Copyright 2013 Pearson Education Inc All rights reserved P 0o Dot Plot Example Ill Ilf IIJ III I I 3939I lil 1 1EE ZEII EEEI EIII Cl Clearly shows an outlier just below 300 Cl The rest of the data is generally uniformly spread out 1 12 Copyright 2013 Pearson Education Inc All rights reserved P p p q Stem and Leaf Plot Cl This plot presents a graphical display of the data using the actual numerical values of each data point Cl How to construct a Stem and Leaf Plot 1 Divide each measurement into two parts the stem and the leaf 2 List the stems in a column with a vertical line to their right 3 For each measurement record the leaf portion in the same row as its matching stem 4 Order the leaves from lowest to highest in each stem 5 Provide a leaf unit to your stem and leaf plot so that readers can recreate the list of actual measurements if needed 1 13 Copyright 2013 Pearson Education Inc All rights reserved P He Emmnemz 0n The fellewingl Teelle liiee the prieee iirm lllerej f 19 ii fererni Ierene f wellkjinig elhiu 0 Cenieftmet e tern en lleef ellet 1 iiepley the adietnieutien e11 ihe date The priee T4 ie 1 nted e A 1eeEemTendIeeIMThepriee an 0 W3 L T5 T s39nel by M E leef1LI1nita 55 ea en P 4 1 95 Tquot4 I M 5 ea 5 FD P CD 3 Eelmiienj 5 Hi 1 r 1 quot3939 7 5355 liw 4 l IE I i 4 5 0 IF 3 9 quot42 M 3935 4 Histogram Example Frequency m 14 12 3 10 6quot 6 4 4 2 il 2i U I I I I I I U 39 39 39 39 39 39 A El 5 D 5 1U 15 EU 25 30 Tina Cl Different bin Widths depict the same data differently CI The smaller Width shows more detail Cl Too small a Width shows too much detail and will not clearly display the main features 1 15 Copyright 2013 Pearson Education Inc All rights reserved Relative Frequency Histograms 39quot3919r quot3 39 Cl A Relative Frequency Histogram is a histogram Where the vertical axis represents the relative frequencies or percents rather than the frequencies Cl Compute the relative frequency by dividing the frequency by the sample size Y aXis CI The relative frequency histogram always has the same shape as the frequency histogram The scale of the vertical axis is just changed 1 16 Copyright 2013 Pearson Education Inc All rights reserved H Relative Frequency Example 39 Women s Soccer Players 391 H W NCAA D1v1s1on III 2009 r r E1EI 3 I U DH LEW 39l 39lE 39lE IA P r Game Cl Clearly shows that half of all Women score on average between 07 and 08 goals per game Cl Shows there are a small number of exceptional players 1 17 Copyright 2013 Pearson Education Inc All rights reserved e f i 3 HOW to Construct a Histogram quotrs Ill Cheeee the eulnrteer et elleeeee eeeeltly ttet39weer1 5 end I15 2 Ceteelete the epereeitntete elleee wittth by eitrtetiee the dittereeee eetweaee th e t e rgeet eee ee1elteeteele ee Rrtge lreet emtlleetj by the t I1tLtt39tquotIJtE139 1l elleeeee eerie the eeyereeimtetre eleee width up te e eentrenier1ttr1emker 4 Leeet the eleee keeeeetrtie e It eteerete E5iEti T11 T11Ei er mete irttegere te e eleee It eentittueue eee Metlhled ef left ineluteienz kleelteee the Ilett elleee tzaeendery eeirtt Ieet net the riieht teenderyr eetnt in the eleee MUTE Dtlterent rrtetheee may e ueed in ttttferent ee1ftwere Eerrue may tlee Itiht incluetiee Eerrue meg eecl eh ecleitierlnel tteeirrtell elleee fer the cleee euneery P Ceeettect e etetietieell tekle eeetetnitrig the eleeeee their keeereetrtiee eeel theiir Itetletitee frreeuteeeiee e Ceeetteet the htetegrettt ltlte e her greett n 4 Constructing a Histogram by Hand aampT The fellew ng Telle HiEiE1iIquottE pfll l in dellere ef 13 different rende ef eheee Cenetruet e rele iee nie1i39 eren1 te diepley i he dietrilut en ef the detet 939 39 39 TI T5 TI 65 HE E WE Q5 T5 39 E E5 439 E5 Ellutien yr hend 1 lllleiirernntine ef eleeeeez quotfer eeernple LIEE E elleee ne 95 439 Elleee width 91 1 r Run the width up net run e nr huneeine be e eerweniient nrun1her 4 39LJee lle tinelueien 1tJede1i ernnir1 e elleee eundeIniee 4395 I539B39 6 1quot 39339 39fQQ 1 Y enetreet e Relleiinre Frequency Telle reii eeunt ef eheeere1iier1e in eeeh dleee Tlhie tie 139Jhe fredueney eellll it fi geleiljiee frequency nfij fin where r1 ie the ilzrctell ef neinite il Drew e iiwedirrIrer1eiene renhi Kezariie the elleee D39Llli idEll iEE ef 11 IquotiE veniele end quotI1quote rItie iihie rele iee frequency fer eeeh eleee end e reeiiengle with 139 IquotiE relletnre t 39erIuene3r ee iihie hieighil fer eeeh eleee P39l7 39QI Example Draw Histogram by Hand Histi i mm EIIIUE Frequuenbcy Tables Q I Him n Gruiup Frequency Ella ive M iF rEqueny 1 1 IIH may I 0 139 139 5 1 H ll9 s may I m A F L manna 2 A A quotS Differences Between Bar Charts and E Histograms Cl A histogram displays numerical data A bar chart can display categorical data Cl The bar widths of a histogram are meaningful and must all be the same size The bar widths for a bar chart are meaningless Cl The bars of a histogram usually touch each other For a bar chart there are always gaps between bars Cl There is only one choice ascending by x for the order of the bars in a histogram while there are many choices of order for a bar chart 1 21 Copyright 2013 Pearson Education Inc All rights reserved w Three Aspects of a Distribution Cl Shape Cl Symmetry Cl How a many bumps or modes 3 Other distinguishing features Cl Center 5 What is a typical Value Cl Spread Cl Is the data all close together or spread out Copyright 2013 Pearson Education Inc All rights reserved How Many Modes CI A Unimodal distribution A has one mound A El A Bimodal distribution 0f has two mounds Cl A Multimodal distribution has more than two mounds 1 23 Copyright 2013 Pearson Education Inc All rights reserved Skewness Unimodal Distribution Figure Ehereeter etptleIre elrid the eelnreeperiding ie39trihIuItien ehepee J J 1E3i 1EE 39fii39 t Eymlnfietrie Elltewtehigh t Skew39te eft Skewteright Ntee39t quotHquot ELJEEu ere ernelll EIrIy39 e few ere rmueh Ierger 39l39he Ilerig teill ie en the rtght eiide Skew teleft Meet fretee ere Ilerge Dlrily e few ere Infueh emielller The 1erfgteill ie er the left etde lquot lN Where the Tail is El A distribution is Skewed Right if most of the data Values are small and there is a tail of larger Values to the right Cl A distribution is Skewed Left if most of the data Values are large and there is a tail of smaller Values to the left Copyright 2013 Pearson Education Inc All rights reserved skewed Right liistriltluticin 5390 ltillil l Etil 2amp0 Income skewed Left tilistrihuticin Symmetric Distributions Cl A distribution is symmetric if the left hand side is roughly the mirror image of the right hand side Freq us I r Symmetric Distributions Copyright 2013 Pearson Education Inc All rights reserved 5 a H Normal Distributions Cl A nearly normal distribution has the following properties 539 Symmetric 539 Unimodal Cl Mound or Bell Shaped Copyright 2013 Pearson Education Inc All rights reserved L Outliers Cl An Outlier is a data Value that is either much smaller or much larger than the rest of the data Cl Some traits for outliers Cl Possible error in data collection Cl No error For example the oWner s salary could be an outlier if the rest of the employees are all low Wage Workers Cl There is a numerical formula to determine Whether or not a Value is an outlier 1 28 Copyright 2013 Pearson Education Inc All rights reserved K r2 Measure for Central Tendency Mean Value Average Cl Population mean population parameter is computed using all individuals in a population Jg JrCE quotIIN Ex N N 2 Cl Sample mean sample statistics is computed using sample data only 1 1E r Err If H 7 Copyright 2013 Pearson Education Inc All rights reserved Recall Population Parameters Vs it Sample Statistics Numerical Variable Population size N 111621112 u size n Variance o2 m flni X Vananoe s2 standard deviation o often unknown standard deviation s Calculating the Sample Mean Find the mean of the number of siblings for the 8 students questioned 322l2352 The sample size n 8 z x32212352 fl 8 25 1 31 Copyright 2013 Pearson Education Inc All rights reserved P 0 4 h The Mean as a Balancing Point for All z Observed Values If We place a nger at the mean the histogram will balance perfectly Emn Fr uemggr N E I l L s 1 5 11 15 ED 25 E Ealarrgr rnilliinnua f duallare Maj or League Baseball 201 O Copyright 2013 Pearson Education Inc All rights reserved n Skewness and the Trouble with the Mean For a skewed distribution the mean gets pulled towards the tail The mean is also pulled towards outliers For a skewed distribution or a distribution with only upper or only lower outliers the mean does not represent a typical Value Copyright 2013 Pearson Education Inc All rights reserved Median The Middle Number CI The Value that lies exactly in the middle of the list of data when arranged in ascendingdescending order Cl Splits the top half from the bottom half Cl Compared to mean median is not sensitive to insertingremoving extreme Values Cl Often a preferred measure for center Copyright 2013 Pearson Education Inc All rights reserved 1 35 How to Find Median 1 Arrange the data in ascending order 2 Determine the position for the Middle If the total number of observations is n then this position is n12 a If n12 is an integer locate the data Value at exactly the n12th position Here n is odd b If n12 is NOT an integer the median is the average of the data Value at n2th position and n22th position Here n is even Copyright 2013 Pearson Education Inc All rights reserved p Example Compute the Median Find the meeni en meiiien ef in felliewin pi39lLliEE Iretee frum e eemple ef ii iidiviiiie NC3ITE Il in tlhie eeee 1 E 0 0 p L The mean iie T235 Arrriige them iiri eeeendirig remquot E T3 T6 A 7 Fin iii peeiitien n1i392 i 1j245 This is net elm iirifiger Meiiien T2T3i 2 7235 r273e5 Aieiieiin ins elieliitiieriel D39lLli5EJ rit ei 1 i inw fii i li the Imeiien ei ti1eeiiteNI3ITE n Q in tlhie eee A W F2 F3 AE 11100 51559 7273 U EU31 UH Aeerieliin rcler A F eeiiiielnzi Q1i392 5 Mediri ie T3 3T1ti39IEJ5 1 peeiitiIii 1 36 Copyright 2013 Pearson Education Inc All rights reserved S 0F P Medran VS Mean an E m il I r HE Ell r 1 153 1rE1rElrE I39i1 quotH3 Flnri a lnmnie t L nrl dnllms er yrealj The median income of 18000 better represents the typical income than much higher mean income The right tail greatly increases the mean but only slightly increases the median Copyright 2013 Pearson Education Inc All rights reserved q o P P Center as Mean or Median What is a typical Value Cl Center not a typical Value for bimodal or skewed Frau u B n E w F r e ILIEr1n39gr Cernter not a typical value Copyright 2013 Pearson Education Inc All rights reserved 9 Mode CI The most frequent observation of the Variable that occurs in the data set Cl If there are two Values that occur With the most frequency We say that the data is bimodal Cl Example Find the mode of the following pulse rate data 80 76 65 68 73 72 80 65 98 101 Copyright 2013 Pearson Education Inc All rights reserved 4 Five Number Summary Boxplot Cl Minimum Ql 25th percentile Q2 or Median 50th percentile Q3 75th percentile Maximum Cl Other related Values Range Max Min IQR lnterquartile range Q3 Ql Copyright 2013 Pearson Education Inc All rights reserved p x 4 Finding k th percentile Ho Ste 1 Artretnge the ttete ih eehingl rer Ste 2 39Crhtpute eh inttett ttreing the tellewing fehmute iQ 1 r 1 3w 1 where K ie the ereehtitle ef the dete trelue eh h ie the tufttb f ef initriduele in the date eet Step t er fr39 ie err integer the trth pereenttie ie ith eta ueltle I3 Itt ie ht en ihteger the Itth pe l I139IquotE ie t he HVIEFHIQIE ef the the izleta values that lie ethetre endt helew the ith peettieh Copyright 2013 Pearson Education Inc All rights reserved ot Example Finding Quartiles The Eta is tlh Serum HIL Emmi Freq Siiem V frnm 154 inivi uils Us in 1 Eta Tt edeterminezi 4 12 13 15 45539 366 a EH3 permntiile alsn ll 1 T5 39quot39 pe rmntile alsn mall L L V 1 2 b 25 39quot39 pe rmntile alsn call quot1 I 139 QBquot39quot39 pe rmntile Z 3 Steps for Constructing a Boxplot Step quot1 Determine the letter anti Iuper T Lewer fenee 11SS Uper fenee 3 quot tfll Step 2 Drew trerljieel Ilinee et 1311 It and 3 Enelpee theee trertieet ling iin e P Step 3 Leel the EI wEF and Iuper tenem Step t Drew e line frem E1 to the ernelleet dam tnellue thet ie Ierger then the letter fenee Drew e Iline from an to the Ila QIEft dete trelue that iie ennellller then the upper Q Step 5 0 date trelrue Ieee then the I Jr er greeter then the upper fenee elre pautliere end ere merlteti with en eeteriek quotquot P39l7 QN O T 66 Example Boxplot nee M 33 are Mew P L f Copyright 2013 Pearson Education Inc All rights reserved Gempute the Iewer en weeir fence and drew a bept Outlier if A EIgto3 15IQR or ltQ1 15IQR Cl Example HDL data n15 li39lIdEIIquottEill39I3tquot liipnpr tEin IIll Eta 23 32 7A 3 44 45 46 4 49 53 56 56 b W F3 Recall Q1 38 Q2 48 Q3 56 gt IQR 5638 18 Q1 15IQR 38 27 11 Q3 15IQR 56 27 83 Therefore there are no outliers in this data set Copyright 2013 Pearson Education Inc All rights reserved The Effect of Outliers a Number of employees at several businesses on main street 6 7 14 18 23 25 26 Mean 17 Median 18 b If the 26 employee business is turned into a WalMart 6 7 14 18 23 25 334 Mean 61 Median 18 Conclusion The mean is strongly affected by outliers While the median is not affected by outliers Copyright 2013 Pearson Education Inc All rights reserved p T Comparing Median and Mean Tha fIilwiing ia tn aiiuiia aaaraaa ni 1 atuialianta in aliaaa A 55555T T T T T Finiaii nquotiaan n maiiianzj Tha fIilawiing ia ith aiuia aanrai af 1 Et39ILIEJI39quotlIJt5 ini nlaa B 55555iT T T T 30 Finali man Tin nquotiiiian Fact The nmaarr iia aanaiitiva ta E1EtIi39El mE data values Median is l39lEIIiblliET ta axtiranna data values Copyright 2013 Pearson Education Inc All rights reserved S O Affected by Outliers Affected by Outliers Mean Standard Deviation Range Not Affected by Outliers Median lnterquartile Range IQR Copyright 2013 Pearson Education Inc All rights reserved P 39l 39sq 39 1 50 Relationship Between Shape and Boxplot Cl If the median is near the center of the box and each of the whisker is approximately equal length then the distribution is roughly symmetric Cl If the median is left of center of the box andor the right whisker is substantially longer than the left one the distribution is right skewed Cl If the median is right of the center of the box andor the left whisker is substantially longer than the right one the distribution is left skewed Copyright 2013 Pearson Education Inc All rights reserved I Illustration Boxplot and Shape F reqJnIeu1cy IIETEQHEIJIEEII IrIEuE I1Cy EU Est MI 0 25 35 45 P1 55 5 4 4 IT 139 SL sfEa 0 If I2 q 15 E i 22 fl Ie EH an am ET I m U I I I I I I I I I 1 14 H 15 11 13 1 an 13 14 15 El n 139 1H 13 21 2 lquot N Copyright 2013 Pearson Education Inc All rights reserved What Boxplots Show and Don t Show Boxplots Show Typical Range of Values Possible Outliers Variation Boxplots Don t Show Modality Mean Anything for small data sets especially lt 5 Copyright 2013 Pearson Education Inc All rights reserved Comparison Between Mean Median and Mode for Unimodal Distributions Meanemidiain MEaii MiEiiiai1 MeanMtiiiain LeftSkerwei Symmetric Rightskewed Men Median Mn le mean liiIeriian hind Made Median Miean K MOTEE IN HE 3iL Wi iPLiQii3iiEHi ii3THiLiiiIM F EMMLE MIA BAH NEVE E EEBiLii Eii39MMETHiCi THE SHHFE px Uiili f E EHFUEIMATELY 3ii MMEiHiB THEHEFE IF MEAN IE GLDSE T MEIMI MET NEi3E5S iquoti39 BTW MHI MEF39vMi WE WWLE W THIE i5THlLiT N I5 3i i1i39ELquoti39 5quotiquotIiIiETHliE 1 x L n Copyright 2013 Pearson Education Inc All rights reserved 0S Variability Cl Variability describes how spread out the data Value are Frequency 33quot 25quot Frequency IE1 quot High Umiai1ity E E quot 1quot VL 1i1ty 5 U quot 15 d 25 0 35 4 45 H3 EU 33 II3 EU EU TU Copyright 2013 Pearson Education Inc All rights reserved F 2 Example Comparing Variability Meeemee et iiepereiemu l lTquotIEJELlI the egiree that the eta velluie epr The later the ate velluie epiree the later the vrietih if this eta vltlee Eznzempillz Seeree f etumllente in eleee A C Seeree et 5 etuente in eleee A 1 Seeree etquot 5 etuente in eleee p1 3939lJ39D C1 Seeree in Cleee thew lterget verietiieni SeeFee in Cleee F hee EEJIIWJ vriitiehm P39l7 QN v Numeric Measures of Variability Standard Deviation and Variance The Sample Variance is quot39 quot539 K 1 The Sample Standard Deviation is S x The Population Variance is The Population Standard Deviation is PIx 1 56 Copyright 2013 Pearson Education Inc All rights reserved Standard Deviation Cl It is a measure of the spread Cl It represents a typical distance from the mean of the observations Cl It has the same unit as the observed data Values The larger the sd the more spread the data the atter the hiso gram The smaller the sd the more clustered the data around the mean the taller the peak of the histogram 1 57 Copyright 2013 Pearson Education ncA rights reserved f Put the Following in Order From Smallest Standard Deviation to Largest lair ll FA I Lquot 9 I 139 I E 1 gj E E 1 E 1 mi 1 1 H I I I I I I 1 2 3 4 5 E 1 E 32 43 5 13 I I I I I 1 E y 42 Y E Copyright 2013 Pearson Education Inc All rights reserved 0 Comparing Standard Deviation Another p Example EU P EU 13E4E1 31E nilzla at U 5 1 llaes U 5 I 4 5 5 I Gllass A Claee E Which class has the largest standard deviation Copyright 2013 Pearson Education Inc All rights reserved 4 The 6895997 Empirical Rule um W man V man p Copyright 2013 Pearson Education Inc All rights reserved Empirical Rule The Empirical Rule If a distribution is unimodal and symmetric then Approximately 68 of the observations roughly twothirds Will be within one standard deviation of the mean Approximately 95 of the observations Will be within two standard deviations of the mean Nearly all 997 the observations Will be within three standard deviations of the mean Copyright 2013 Pearson Education Inc All rights reserved 1 62 Empirical Rule Example The mean body weight for women between 18 and 25 years old is 134 lbs and the standard deviation is 26 lbs Assume a mound shaped distribution 134 26 108 134 26 160 About 68 of women in this age group weigh between 108 and 160 lbs 134 226 82 134 226 186 About 95 weigh between 82 and 186 lbs Almost all weigh between 5 6 and 212 lbs Copyright 2013 Pearson Education Inc All rights reserved Empirical Rule Example Daily cash register receipts at a local store follow a mound shaped distribution with mean 9200 and standard deviation 1500 The day a new employee Was hired the store took in 4500 Should the manager be concerned 9200 3l500 4700 Yes the manager should be concerned since it is highly unlikely that such a low receipt total for the day Would happen by random chance alone Copyright 2013 Pearson Education Inc All rights reserved Identify Rare Events Benaieeir the b AGT teat the atreirage 2i and a etaneiaird eietriiatiaeln wae 4 The adietrihntiien ef the AGT eeeree ia mueunuetedeihaped i A atudent reeehred a eeere ef le thie an LliquotIiLiEiLiEiyquot high eeere If Ui3e wiili admit atudente with a i39iquotIiiI IiiiIquotIiLil IquotI MET te he ene atandard deiriatiene helew the mean what ie the l IquotIii39Iiil IquotIlillit Iquoti MIT fer UGLA aamieaien 3 A attideint reeeitred an ACT ef ile thie an ninuanaiiy high eeere NilE WER i 214luithat ie ene ed aetre the mean it ie iineitie en iireim TIJHE mean Be it ie iiEiT unueually hih eeere A The ere at ene ea hellew the mean 21 4 W 3 the are a Eil 24 3 ie euteiee the two en from mean There ie eniy ef eeeree higher than 29 Henr e 3 ie an unlueuailyr high ere Copyright 2013 Pearson Education Inc All rights reserved n The ZScore CI The Z Score measures the number of standard deviations a speci c value is above the mean Cl The Z Score is unitless Cl If the absolute value of Z Score is larger than 2 the corresponding data value is unusually large small Cl Z Score value mean sd Copyright 2013 Pearson Education Inc All rights reserved p j ii The ZScore Formula F 39puatian 2 mere 3 g 2 Sample Z eerie 0o 2 E El The mean price for a loaf of bread is 312 and the standard deviation is 089 Find the ZScore for a loaf of bread that costs 200 200 312 z T 126 089 1 66 Copyright 2013 Pearson Education Inc All rights reserved Summary of Describing a Distribution Cl What is the shape Cl Is it Symmetric Skewed or Neither D Unimodal Bimodal or Multimodal Cl Normal Cl Are there outliers Cl Where is the center Is the center a typical Value Cl Is there low or high Variability Copyright 2013 Pearson Education Inc All rights reserved f Mean and Standard Deviation or Median 53 Cl Use the mean and sd When the distribution is mound shaped Cl Use the median and IQR When the distribution is skewed and or when outliers are present Cl If the distribution is not unimodal it may be better to split the data Copyright 2013 Pearson Education Inc All rights reserved 5 Trouble With Bimodal Distributions Fl39eeteurrIt Fieeeiete per Pereerl Freque ey 1 H W 15 Ian 1quot Heeeipte There are two typical Values Neither the mean nor the median describe the typical Values The data should be separated out by lunch customers and dinner customers Copyright 2013 Pearson Education Inc All rights reserved a1 1 71 Separating Lunch and Dinner 1 Lunch HEEEWE Iinner Heeeipte Frequency FFEUEWCE Median quot I1 an Median 1Dquot 3 0e 8quot d Mean an 5 4quot g I 2 U D M I 2 I 2 I 2 g g quot 13 1 1 an i 25 an 35 an Lunen Heeeipte iWiE39 Re i t Displaying the data with two histograms allows a comparison between lunch and dinner Copyright 2013 Pearson Education Inc All rights reserved 0 Separating Lunch and Dinner 11 S Lu tth H ttei t Dinner Receipts Frequency I Frauen 15quot Median 14 x 12quot 1Equot 3 Meian ran 5 5 TDD in ii ii an A 25 an 35 4h 39 mi Re eipm inner Retjeinta The Lunch distribution is mound shaped and the Dinner distribution is skewed right Note Do not compare the mean of one data set With the median of another Use the medians for comparisons Lunch median is 8 and Dinner median is 22 Copyright 2013 Pearson Education Inc All rights reserved Comparing Distributions With Boxplots I ii I I lj I I I I I I I I I I EIII HI 413 T E 11I391I Iulmtinun Terrperature Iegrees nan l1Eit Both cities have similar typical temperatures Both cities have fairly symmetric distributions Provo has a much greater Variation in temperatures than San Francisco 1 72 Copyright 2013 Pearson Education Inc All rights reserved Comparing Distributions with Boxplots II rrth rate far wcnmgven Asi The fcilllEiwiin bm pt5 reprE5Er1tthie 15 44 3rEar5 139 age in 1990 and 99 fr each 51EifE39 Bir thratea I993 warsguia iQQquot p p 0 LEJi39 L931 4 5 am FIE E an 139 What mnnliusii n urzan yquotLJ make 0t Ways to Mislead with Graphs Don t Do Any of These Cl Have the frequency scale not begin at O to create the illusion of greater differences Cl Use symbols other than bars that hide or accentuate the real differences Cl Use unequal Width bars Copyright 2013 Pearson Education Inc All rights reserved Y Scale Not Starting at O 3 5 51 E ma 1 W2 5 3quot E E EE 4 at E5quot E3 Ir E3 ELI1 a r I reg r r r h quotmar The left bar chart misleads by making the differences seem greater than they are Copyright 2013 Pearson Education Inc All rights reserved P Chapter 2 Case Study Copyright 2013 Pearson Education Inc All rights reserved ALWAYS4EARquotNNG Class Sizes Private vs Public 2 F1l3939a 1iEI Eumeagas 51 Fuheliln p 15 Pm 21 15 11 1 2 D 111 I 15 P 1 1 25 15 15 1 1 15 1 5 d 13911 25 21 21 15 15 11 15 15 1 11 11 21 15 8 15 25 15 15 12 15 15 14 15 11 15 15 15 15 1 1 15 5 1 111 5 1 15 15 55 25 25 21 15 15 1 5 11 15 15 15 5 11 15 15 15 1 15 21 1 11 11 15 14 0 15 15 15 1 11 1 1 12 15 15 5 15 12 211 12 15 1 5 15 15 12 11 5 15 15 11 1 I 15 12 A 12 15 11 15 11 11 11 15 14 1 z 121 Using raw data is ineffective for this comparison 1 77 Copyright 2013 Pearson Education Inc All rights reserved gr Private Colleges StudenttoTeacher Ratio E E Q 3 W E IL I I I I I I I I I I I I II El 1H u t 54 0 3 5Fl li e Cl Typical ratio between 10 and 11 Cl Skewed right Cl Outlier of 54 studenttoteacher ratio Cl Lar e Variation Copyright 2013 Pearson Education Inc All rights reserved in Public Colleges StudenttoTeacher Rat1O Fl IIII39 In 1E 0 1E H u 4 E ITI I I I T I I I I 39 II E 13 P 0 Ea Cl Typical ratio between 16 and 20 Cl Generally symmetric Cl Outlier of fractional student toteacher ratio Cl Less Variation Copyright 2013 Pearson Education Inc All rights reserved in Public Colleges StudenttoTeacher Rat1O Fl IIII39 In 1E H 1E H u 4 E ITI I I I T I I I I 39 II E 13 p F A Ea Cl Typical ratio between 16 and 20 Cl Generally symmetric Cl Outlier of fractional student toteacher ratio Cl Less Variation Copyright 2013 Pearson Education Inc All rights reserved p Comparing the Histograms E l 5 8 PIlIIIiII ullllegea q TZi1tiF3tE It is much easier to describe the data when they are displayed using histograms compared to just the raw data table Copyright 2013 Pearson Education Inc All rights reserved Chapter 3 Case Study Copyright 2013 Pearson Education Inc All rights reserved ALWAYS4EARquotNNG EI J an E 9II Eam E E E E gem umII I 3I J 2III u I I I I I In I I I I Ez aDEDD1 1E EH4DEED Fed Rim mi Appli wmnen Femehred iar of Applii P Cl Skewed right for both men and Women Cl Unimodal for both men and Women Cl Women s typical Value slightly higher than men s Cl Five Point Summary appropriate for both Copyright 2013 Pearson Education Inc All rights reserved Hisilc sni Aplinnea P Risk of Appliance gun Statistics Mm Il Eli Will 15 25 Cl Men s median is 10 Women s median is higher at 15 CI The middle 50 of men Varied by 20 while the Variation was higher 25 for Women Copyright 2013 Pearson Education Inc All rights reserved p an m E k 24 E211 E 1L 12 10 u u I I I I I I D 1 II 2 an an h 1 I Risk of g lh an 4D ED g Rial f p b mg p p p p Cl Relatively symmetric for both men and women Cl Unimodal for both men and Women Cl Women s typical Value close to men s Cl Mean and standard deviation appropriate for both Copyright 2013 Pearson Education Inc All rights reserved W B3 C3 Of X39rayS Mean Statistics D V1at10n Men 468 20 Wome 478 208 I1 Cl Men and Women have s1m11ar mean and standard deviation risk perception for X rays Cl About 68 of men perceive a risk between 268 and 668 CI About 68 of Women perceive a risk between 27 and 686 1 87 Copyright 2013 Pearson Education Inc All rights reserved Guided Exercise 1 Copyright 2013 Pearson Education Inc All rights reserved ALWAYS4EARquotNNG The mean rate of Violent crime in the west j was 406 per 100000 people and the standard deviation was 177 Assume the distribution is approximately unimodal and symmetric Cl Between which two Values would you expect to nd about 95 of the Violent crime rates Cl Between which two Values would you expect to nd about 68 of the Violent crime rates Cl If a western state had a Violent crime rate of 584 crimes per 100000 people would you consider this unusual 3 Would 30 crimes per 100000 people be unusual 1 89 Copyright 2013 Pearson Education Inc All rights reserved m S The mean rate of Violent crime in the West Was 406 per 100000 people and the standard deviation was 177 Assume the distribution is approximately unimodal and symmetric 4 1L r r 39 v 1 1 Y n u M n I u n 39 g u u u n n 391 1iTIquot L J T F Y r 1 r r I 39 F L L I 11 L 1 1 V 1 J A LI 139 J L A L V1 Copyright 2013 Pearson Education Inc All rights reserved pO 0 The mean rate of violent crime in the West was 406 V per 100000 people and the standard deviation was 177 Assume the distribution is approximately unimodal and symmetric Cl By the Empirical Rule about 95 of the data is Within two standard 8 deviations of the mean 39 0 1 N E E El This represents the green and blue areas together Cl The number 583 represents one standard deviation more than the mean 406 177 583 1 91 Copyright 2013 Pearson Education Inc All rights reserved quote Ll 4 The mean rate of Violent crime in the West was 406 per 100000 people and the standard deviation was 177 Assume the distribution is approximately unimodal and symmetric Cl 406 177 229 i CI 406 2177 52 p 0 CI 406 2177 760 h g to 0 E g F P g g g 1ii 1 52 E29 Copyright 2013 Pearson Education Inc All rights reserved The mean rate of Violent crime in the west was 406 per n 100000 people and the standard deviation was 177 Assume the distribution is approximately unimodal and symmetric Cl Between which two Values would you expect to nd about 95 of the Violent crime rates Cl 95 of the violent crime rates are between 39 0 M 52 and 760 crimes per 100 000 people Cl Between which two Values would you expect to find about 68 of the Violent crime rates Cl 68 of the violent crime rates are between 229 and 583 crimes per 1 00000 people 1 93 Copyright 2013 Pearson Education Inc All rights reserved 0o 1 94 The mean rate of Violent crime in the west was 406 per 100000 people and the standard deviation was 177 Assume the distribution is approximately unimodal and symmetric Cl If a Westem state had a Violent at crime rate of 584 crimes per 0 100000 people would you consider this unusual 539 N0 since 584 is within 2 standard deviations of H dY the mean 1 13939 52 W 1 X 39I39quotEEiI EEQ 3 Would 30 crimes per 100000 people be unusual Cl Yes because less than 5 occur so far from the mean Copyright 2013 Pearson Education Inc All rights reserved Chapter 4 Regression Analysis Exploring Associations between Variables ALWAYS4EARquotNNG P P Bivariate Data Cl Data in which two Variables are measured on each individual 3 Often the purpose of studying them is I Find association between two categorical variables 11 Analyze correlation between two numerical Variables leastsquares regression problem making predictions to Two Categorical Variables Example I Tquot Cl Behavioral Risk Factor Surveillance System BRFSS survey in 2000 sample size 20000 El Example gender and smoking habits Cl 0 l m 4547 5022 f 6012 4419 Compare Percentages of Smokers Similar Example in Lab 1 Cl 0 1 total m 4547 5022 9569 f 6012 4419 10431 total 10559 9441 20000 3 Out of male 50229569 525 are smokers 539 Out of female 4419 10431 424 are smokers Cl Conclusion Male is more likely to smoke than a female in this data set Clicker Question Which of the following segmented bar chart better illustrates the previous comparison H IEI Z 8000 10000 8000 10000 E000 0000 000 000 E000 E000 0 0 Clicker Question Which of the following segmented bar chart better illustrates the previous comparison Answer Plot B shows that male D has a higher tendency to 3 3 6 be a smoker in this data 3 set because its proportion 3 of gray area out of the column total is larger than D that of female p D 5 j 392 aid WI 1239 7 W1 1 0N Titanic Survivors Example 11 39 First Second Third Crew Total Count 203 118 178 212 711 ROW 286 166 250 298 100 Alive 61 625 414 252 240 323 Ce11 92 54 81 96 323 Count 122 167 528 673 1490 ROW 82 112 354 452 100 Dead C61 375 586 748 760 677 Ce11 56 76 240 306 677 Count 325 285 706 885 2201 ROW 148 129 321 402 100 TOW C61 100 100 100 100 100 Cell 148 129 321 402 100 P What Kind of Association Do You Observe What Should You Ask ECICI 391ECICI CI I Drew I mm E U Bad I Semrmil I Jquot E 39 I Himst 2 9 lilfSl Secmrmil 1I1trI iIrzil Eraw FIJi39vr39iE Desadl e Clicker Question Which Class Had the 39 51 21 Highest Survival Rate IIquot hIiIIIIIE Had E IiIr5Jt Ei l lll l 39Tl39II39iIrIIi Y EDD I I I I I I I D WEI 0a e Clicker Question Which Class Had the 2nd Highest Survival Rate Answer gzfer I allirure E El Had D 3 E 3 IiIl39Eifl Ei lnl l 39139l39IiiIrii A B C D 625 414 252 240 Two Numerical Variables Cl Y response Variable dependent Variable Cl X explanatorypredictor Variable independent variable 3 Graphical tool Scatter plot each individual in the data set is represented by a point in the scatter plot We do not connect the dots T Example Is Hand Size a Good Predictor for Height Cl Y Height Cl X Hand length Haw Ger1dErEng1tI1 m391i t391 height quot1 female 2 39ifEf IquotIEE 3 female r female 5 f39Iquot39lE1E E mallae T female E FIquotIE1E Q mallae tl malle I1 malle 12 femalle TIE malle I4 faemalle TIE malle femalle Tl malle 1 femalle 1 femalle 2 femalle t 4 35 T25 T1 F15 j j T5 quotU39 3D 5IU39 TJUU k 650 E50 E25 950 E335 90 EEBJU 31 vB B39U EFBJU FTI39 Tull BEE Til 39U39 r l TU EBBJIJ T5 T230 950 T 90 Ti JU o 621039 TI 1 E920 T5El39 ET 5 950 630 quot E920 p p 1 0 T2 dp dp 5339U39 p o il 39U39 E35 l7lT AIyampQN Example Scatter Plot 7 Etllmllntnf llnigllit vs lImruIEmg lI 39 ll I Is the relation 4 positive H 39 E H K i I I l i 11 HOW strong 5 quot39 39 I is the 39 relationship 5 I an E as E all 515 Elli 1395 all 315 all hi l 0 Positive Trend 0 I I HI uis E I 39 39 quot39739 3 I I I I I E E quotil 15 I agarMEI Cl Older cars tend to have more miles than newer CElI S Cl Newer cars tend to have fewer miles than older cars Cl There is a positive association between car age and miles the car has been driven l7lT AIyampQN J Negative Trend pw 39 A E p 5 ll 1 3939 ill J I t quot 39 I I 1 I I I I I I J 8 quotI L 5 Countries with higher literacy rates ten fewer births per woman Cl Countries with lower literacy rates tend to have more births per woman Cl There is a negative association between literacy rate and births per woman l7lT AIampEQN P Positive Vs Negative Trend s I ii 0 X A P A W z quot fa i C i VIV r K quotquot39 xquotquot quot i quotquot as x increases y increases as x increases y decreases NESW direction NWSE direction l7lT AIyampQN W 39 Wt No Trend Cl There is no trend between the speed and age of a marathon runner Cl Knowing the age of a marathon runner does not help predict the runner s speed Cl There is no association between a marathon runner s age and speed FEagain Strength of Association Cl If for each Value of x there is a small spread of y Values then there is a strong association between x and y El If for each Value of x there is a large spread of y Values then there is a weak or no association between x and y El If there is a strong weak association between x and y then x is a good bad predictor of y 5 Strength of Association EB Eat ED Etm 1E p SI39r39iai Iuair i w L i p Q Larger 7 39 Spread we i V Eight Ipiuiti I39iiquot ia ifi 1 I Waigitt I siountisj iEiZi ttu V iieiiti itt iinnnesiii W i i ti he i Weak Association Strong Association l7lTA139yampQN Linear Trends 3 L E E i t 339 I13 M2 L1 IHE L3 L4 hr39an1ire n ql my jrfsear1EE Cl A trend is linear if there is a line such that the points in general do not stray far from the line Cl Linear trends are the easiest to Work With Cl There is a positive linear association between number of searches for Vampire and number for Zombie l7l7AIyampQN Positive and Negative Linear Trend 5 i i M quot5quot laif i E if I E ii i j 3 E E Hi 1 i hiiF39i ii iii Eas i p ii at In ii gs 3939 E P 39V 1 Ii 5393 all ii I as i E i la as E if u a is as W in p i 5 N 5 ugh E hi I E7 ii I II IE1 Egg 1 139 ii quot i ii 21 i ii L p Other Shapes 1r 1 In Fl 39539 I I gana ppm 2 I I H I I I I I I I Tenumelamre FF Cl Nonlinear association can also occur but this is covered in a more advanced statistics course Cl In terms of the strength of a linear relationship which one is stronger Why 0 Nonlinear Trend 411 glia3 N if ha Nnffnear N Hheaw 0 P 5rfm39E C fa P EIffI39l 3939E C rrerfa n N5 c rrefafsbng quoti21iIi II I II II Ili 39I39 qmliu F I I I I i D IrIliIIIiE139HiII39Il aE39iI 539 A U l i K N n hear Na curreJ39afmn39 N nr39E3939armrn H NE afi wE E rrrEfaIIuH I d E T T 395 I K mII1 iIaatIIIIIaliI III P be 3 V1 in hr mp 39339 i 3 a 3 T3 gj Ii39ih1rE d reuse in 1 39 E 13 5 Summary of Analysis of the Scatter Plot Cl Look to see if there is a trend or association Cl Determine the strength of trend Is the association strong or Weak Cl Look at the shape of the trend Is it linear Is it nonlinear pg Writing Clear Descriptions Based on Association NOT Causation Cl Good Cl People Who have higher salaries tend to travel farther on vacation Cl A person Who has a high salary is predicted to travel far on vacation Cl Bad Cl Because they have higher salaries they travel farther Cl A person With a high salary Will travel farther on vacation Quantifying the Strength of Linear Relationship Correlation Coefficient r Cl The Pearson correlation coefficient is a number r that measures the strength of the linear association between two variables 3 l S r S 1 Cl If r is close to 1 then there is a strong positive linear association Cl If r is close to 1 then there is a strong negative linear association Cl If r is close to 0 then there is a Weak or no linear association 7lTAIyampQN Positive Correlation 7 El IIIII Illllil 3 I 1 S ITIVIE Ham um I I I I E13 r 1 2 Irrcnrnnang me aatnmztannsar Earaar E E E E E E I 395 1111 5 W I I I I 39 A j 39 f l i o 39 E S lt3 2 Hi T T 39 g P Q E mnmmi E i ii i 9Q 0 P E quot39 I 39i39 E E 1EIIIII 1 AE I P SE E 1 IE 139 P W 1 5 9 1lIIIlm 3 IL o It i TH quot39II HE 4 I I I I I I I l7lTA139yampQN Weak or No Correlation M F 3 D 191 I Iquot39 3110 V II E l 1 I 5 N 11 39 139 iiii 1 39gt I I 39lJ iI E e 1 Kiri I 3 52mquot If I 0 A lquotquot ifquot tquot1 III iquotquotquot I 1 i 3 F 1 P L E W 45 wt quot Elm39 J I P I I I 0T I I E 3 I I I I I I I I dE n I H I I I Ea5 353 I I I I I I g to Positive and Negative Correlation 2amp 1 E1 Y t at V I 539 E ii i F g ii I i gi E H Eli ii J E E 395 quot as g f w I quot5 oiquot as i 9 vi 5 1 sun 1 Eli ii Eli I quot9 ii Egfi IE Iii It gi I F 1 as as I I W 3 quot 5 6 i in an E if I sh ii ii E W in email 5 E H IiE ER 35 P7 13 E ii 39V as E11 si mi II 39 ani iif539 J 3i mg if IIIampi i hig In 1 ii ii i Elf um i L it E 97 li ilia i ii iii is 393 F H lZ39l39 H IE4 H P IF 1 Pi How to Calculate Correlation Coefficient The ill11EH I39 Ei ll l cnef ent 1139 HITEII product I l l lEIlIZ am139raElati n Ef ent is 1 meaamte 0f the 5 t1e11glh iliid di139ues39ti I1 f the 1i11ea139 1 ela Iin11 bietween qh q11antita tive tmtfiabllest 0Fr 11EnE51Ll39lE Gmeeok le1te1 p q 10 139ep1quote5uent the an ul t I1 E 139139ElEl tl 391 vn efEie1 t mid r 1 1ep1esen t tll 5 1m13ele am11eetla tin ueEi ienta We plweselit iZSI1391lffquotlf1E fut139en1ul f1 the sample l39tsn1quot1quotuela1i 11 E ffi 39I1t Sanlplse Linar Cl39quotlIquot39El Il Clquotl39iuiaEntiquot 1 F l 3 is the saI1391l wa139iab1e ia the 3an1quot lfl If the 391in5sJEInEs iF 391quot l l lE V 6 pie 51 and 31121 1 vi tiD11 f thi3 IE Ep vii lr1e Trend and the Sign of Correlation l 0 h Pm 0 l 0 k to 0 E k l E O O O O E E 4 E l 0 0 0 l 0 0 l 0 E 0 O 2p 2p Ak 0 0 l 0 0 0 O E 0 0 j 0 E 0 0 0 0 j 0 0 0 0 0 0 r ZZXZyn1 Most data points end up in Most data points end up in Quadrant I and Ill Overall Quadrant II and IV Overall sum of Zx Zy is positive sum of Zx Zy is negative 33 7l7AIampEQN gt Interpreting Correlation Cl The correlation between daily swim suits and ski jackets purchased in an apparel store is r 096 Cl There is a strong negative correlation between daily swim suits and ski jackets purchased 3 On days with strong swim suit sales one predicts that ski jacket sales would be weak Cl This does not mean that people who buy swim suits are causing potential ski jacket buyers to not buy B Zero Correlation Vs No Association A I glia3 Cl A linear correlation coefficient of zero means that there is no LINEAR relationshzp between X and Y El A quadratically associated pair may have r Very clo s e to O N a Na c rrefattflnnquot N falfi J rm 1 39 rm Possibly not associated 7ITvs QN The Effect of Switching x and y E III 39 F E E 339 H 39 E W 39 3 EiI 5 39 E EIEI 1 U CF lin E EI P 54339 39 3 HI 395 Em 3 fa I l I 3 I I I I I E 3 5 El BIL an i1iE El ED TIZII EIJEI LifE EoiIau39eEn1Ir WIrrlaan IfII39aim ILE 39IIan R Cl r for life expectancy Women VS Men r 0977 Cl r for life expectancy Men VS Women r 0977 3 Switching x and y has no effect on r 6 Correlation Arithmetic and Units Cl Multiplying all x s or all y s by a constant does not change r Cl Adding the same constant to all x s or all y s does not change r Cl Changing units such as in gtcm or F gt C does not change r Cl r is unitless k Correlation Linearity and Outliers 539 Only use linear correlation to interpret the data when there is a linear relationship Cl An outlier could strongly in uence the correlation fill 11 III 13 54 r not useful 7 re useful 3 not linear quotM useful but M wznete Uutlier V N P I quot rno t useful E 39 nef near I I I I I 511 39i 5 llllsllli 115 ll5I1 15 lllllll I125 1151 All four have r i ols17 Least Squares Regression Line giazi 1i ITIJ 1 f 111I E 131 i N 131 V 11l i Aquot 1 ii I I I I I I I E1 E E41 quotI EH U7 quot I39E39iQIquotIUjIquotIlII39IE5 Cl The Regression Line is the best fit line for the data CI The line minimizes the average squared Vertical distances Cl It is only useful With data with a linear model W ht pa undsl l7lT AIyampQN Linear Function as Equation for a Line 39 a quot1 NE W i 39 i Rei ililil i me thieiTieitiee iLihe yr ime 3 e inn ie the elleez ith Limit eihenge ef 393 whein iii1Gi E iIquotl IHE iiihi39t if e h ithe yraiin tiereeit the 1 iiieiue when 539E39139iiiiIquotIw i39E 4 II Eeemp ee 1 Greih the liine 1 ii 2 3 and deirniiiihe the elpe end iiiiitereepii 2 Greih the liiine ii 1 ehd dEiirE I IIT1iI IiiE139iii i E elepe ehdi iii39iteree 39t 3 eIierii1ihe the irie ii imeh iiheit ipeeeee iiihreiugh peiiiite L 5 eiiiidl me Iii iEquotIEquoti iIquotiiiiquotiE39 the eileiee Iii 2 y 21i1wi39iEe1i2a5Viiii3 i2 1 The Ezuaiiil ie ii iIj 1i e b New te 39 1lEiZrEI39il39I iiil lE iiihe iiritemepi eepillyi e iint say ii 5 iinie the Eqiua39tiilquotI 1ii1 54 ii h 5 ii15I1ilZJiiiiiEi39iI eile ferquot h 2 65 Se ihe eiqueiiih ie yi i 5IIE 2 ii Example Finding the Equation of a Line iy 0 57 52 in 22 fag P8 3 pv I Pf 0u V j 2 J by and l o P 3 0 d 5J L9 26 4095 y yl rIi 3r1 1 5 0953Jr 2 1 53 0951 19 y 39U953i39 75 5 and f in m l7lTAIyampQN 4 Example Making Predictions w ps mar 39aiE P Ehquot P 39LT39 394 rr was Pj an r anr 3 395 15 0953 15 41 l7EA139ampQN Example Residuals O 2 3 IE FI I use l J t J We made the prediction for x 3 predicted y 475 Let denote the predicted 3 Value The discrepancy between y and is called the residual In this case 52 475 i 045 residual observed predicted 3 3 0J l7lTAIyampQN Residuals from Fitting a Line by Eye gz heaad Iangih rnmfp Iamal length mun Which of the above three observed possum head lengths yValues yields the largest absolute residual When predictions are made from the line using possum total length xValues 7l7AIyampEQN MJ Finding the Best Fit Linear Model Cl We Would like to keep the overall residuals to be small Cl We need the model to be linear for its simplicity Cl We Would like to give each data Value the same Weight CI The solution LSRL LeastSquares Regression Line minimize the sum of squared residuals iI l3939 Why not The LSRL minimizes E 2 But Why not minimize lE1ll921ll LSRL is the most commonly used method Computing the LSRL is much easier by hand and in most computer softwares In many applications a residual twice as large as another one in absolute Value is more than twice as bad Squaring the residuals accounts for this discrepancy Comparing Two Models on 1 m 9 i u L on a E m E w E E E A Em mm E i E 5 3 im phi E m m am 3 mg E i M 9 5 EE 3 EH E4 3 5 E E 0 E m w P 5 gm g r 4 are U ml l p9 mm in Q 0G i iwu pP mlmio E wl MQE an Q Q lHlmII l E E 0 5 QEwl m lwHm ilm 4i i if i L m I L im E Q Q ilwiiai is H 4 lwlEE Em Ea mg E 1 I E E o E E E F E mimI ay H Em pC an x Hou lm ml a i p 53 E9 3 m E 1 Q E awlamplIlwi 1 M i m E m m1mJwWTM 39 umEw no Ea m lw f i t m m E J E A 1 v E Q I I S Q lm l r H E E i Q 5 W E E E n mm mi my H Q E E 5 I E 3 spur Mam aJIiErn n1 i 39lEp Em EoE Solid line LSRL Equation of LSRL The equation of the leastsquares regression line is given by j In 90 bl F 0 is the slope of the least SI squares regression line b P is the intercept of the least 0 quotV squares regression line l7lTAIyampQN 1 ew Geru er leruitn width 1 femelue 2 femelle 3 femelue 4 feme e 5 medle 5 medle T feme e 0 medle 0 medle 1 Imele 11 nnelle 12 femelle K Imele 14 femelle 15 Imele 1 feme e 1 male 1 feme e 139 femelle feme e 50 1 T50 T25 T50 50 50 000 000 T5 00 0039 0539 39 039 T 0539 0539 T25 050 00 000 00 T39 B5 Tquot B5 050 0 T5 1 1 5 T50 050 p p 1 0 T5 T5 height eee sen 050 eeuu T T1 emu me Handittenglthm r1 c c Hetgrhw 2 e 1 15 000 9 722 tine mtmgputter resutilt is r 11ntelt evrsre1quottinits mtderratte 1yhiitgth 3 1 35 Sufficient Statistics Stlandard etviattnn UM Mean Sample Slat ii5TiC5 N 075 405 392 5 lLVE 3 Computing the Equation for LSRL from Summary Statistics iiil39iii the ea5tisqures r egiree5intrn ilinez ffa quot5Eiilquot Iquot39IrE Statistics N iiiiean Standard iuiii tiiilii Hm iLeri thi 11 an 33331 3 u Height pr 23 51375 in t bi IquotSi 65 4n5ei ue3 3353 mm p 5 mse J Tlt51 2335i3 t33 432 The LSRL is Predicted Height 4327 3353 HandLength u Using the Regression Line L39a t E E heightquot In 55 139 E 55 E39I 35 H39Ilf39I ItHmllIEIrIgltl1 Cl Predict the height when a person39s hand is 77 inches long Cl Predicted Hight 4327 335377 691 inches l7lTAIyampQN yh Interpreting the Slope Cl The slope is the coefficient in front of x in the regression line equation Cl RiseRun means that if x is increased by 1 then y is predicted or increases by an average of the slope Value Cl The slope is only meaningful if the data follows a linear model e Interpreting the yintercept Cl The yintercept is the Value of y when x is 0 Cl Use the yintercept to interpret the data only when Cl It makes sense to have a Value of O for x Cl The calculated yintercept Value is meaningful Cl The data include Values equal to or close to O Example Interpreting the Slope and Intercept CI The slope is 3353 Cl If x is increased by l y is predicted to have an average increase of 3353 Cl For every additional inch in hand length a person is expected to show on average additional 3 353 inches in height Cl It is meaningless to interpret the intercept coefficient in this problem Why Correlation is Not Causation Cl A strong correlation is not evidence of a cause andeffect relationship 99 66 Cl Do not use the Words causes makes Will because etc when making regression analysis based conclusions 99 66 Cl Do use the Words predict tends and on average Slope and Causation Predicted Salary 22000 8000 College Years 3 Wrong Each year in college results in an additional salary increase of 8000 3 Wrong A person with one more year of college education Will earn an extra 8000 5 Correct On average people with one more year of college education tend to earn an extra 8000 S PV Properties of LSRL 5 1quot Si j bl 90 43953 50 J bl 3395 3 Correlation coefficient r and LS slope coefficient always hear the same sign Cl The LSRL goes through the point D We are often more interested in the slope coefficient since it gives the rate of change Cl Remember that LSRL is a tool that can be applied to any data set but it does not mean that you should pA Regression Means 0 5 Cl Regress toward the mean A Son39s height gt Father39s height lquotEA139yampQN Nonlinear Data glia3 Cl If you can t imagine a line don t try to nd one Cl If the association is not linear don t attempt to nd or interpret r or the equation of the least squares regression line Scatterplnt Elf agewm number cIf 1matiInstanan vaca nna 12 quotH1quot 3quot I I I age D Summary Conditions for Applying Linear Regression Cl Linearity The data should show a linear trend Cl Nearly normal residuals Generally the residuals must be nearly normal Cl Constant Variability The Variability of points around the least squares regression line remains roughly constant Flawed Linear Models Generally most residuals should be within two standard deviations of O Beware of Outliers Cl Outliers have a strong effect on both the correlation and the equation of the regression line Cl An outlier that strongly effects the regression line is called an in uential point Cl When there is an in uential point present perform regression analysis both With and Without the in uential point and comment Do not simply ignore or remove them 7Ea QN Example of an In uential Point Jav E i39E39irrm FIIEIE p arr 39iiJIIfIEIl 1EBEIEIEI in i lil z S T I II L 395 3933939 3 I 39 r 1135 I as I 1 I i I II I Equot W L4 IA Ll PG P J InIuaIs I in uan1EaiI rIIir11 ui il I I Iquot I i I I I iv an 1 II I gt H b ma ea ml Anz luia IIZIE I II i i i i 5 3 i rimjaa Hm Mr 1IliIEI5 39IiI39I maisa in EIEIEE Iuiafimn 0p aquare Im ejn Which outlier is the most in uential 1 E1 4 E U 1 Er 7 Regression of Aggregate Data IL 1 Cl Using Aggregate Data for regression means that each point represents the mean of all the y Values with a given xValue Cl When using aggregate data be sure to include the Word mean in all interpretations g Aggregate Data IE3 ililli K 6 E 139 2 as we Eli II i f I In L11 E Eirltlnal Haa1Ie U Fla P El I I I Stats Man rltleal Heading I Nntlusing j gglregate data 7 U3i E mu 3 lie a E 4 E 7 A A V V MEWIJ EHT E late IIIIEEITIJ lla l E Cl There is a Weak correlation between math SAT scores and critical reading SAT scores Cl There is a strong correlation between states mean math SAT scores and states mean critical reading SAT scores l7lT AIyampQN fiflil e e Don tEXtrapo1ate E D Only 1lS th 11 regression line to D F p predict 4 pJ K SAT lpE men ljl yValues for xValues 96 lIlllIllll iiiwrrr at 5mtTHtnl that are within or near the range of the For someone with an SAT score at 90 data percentile Predicted GPA 00019 0047790 429 But college GPA only goes up to 40 The linear model was built for observations between 26 and 72 percentiles l7lT AIyampQN p Strength of a Regression Fit 3 R2 measures how much of the Variation in the response Variable y can be explained by the explanatory Variable x Cl It is Called the coefficient of determination Cl It is used to help determine which explanatory Variable would be best for making predictions about the response variable 3 R2 is close to 100 when r is close to l or 1 and close to 0 when the strength of linear relationshi is low l7EayampQN 0 Deviation Away from Mean As a Sum 39 The di ferewnee between the ppredietienea p the aczljitienl innfermetien exmaiined by the Hlarmd Lenth expalinet devietien Ly J E p1E1iIf1 d deviaatizen y if amp wfmI Tetgal deariatiEen 3 J Eeplaizned deviatien f J7 4 15 0 1 iILE 1 viati p T u pb Interpreting Total Variation in y With Regression Model Tn tal eviatiwn Uneznpllaineid eviatiiIn E l i eviatiini y y y y y 1IiIEIl I ltllniexplaiinetl Varii ti n Explaiinad U39rii ti IrI Tnctl Va I39i39 which its I3IEIlfl1 JLIlli139iE as f Ii ElWEI Total nf Erma Sum of H Reessi n of Squares ZiyW Eiy3 ZCJ7W SSTntalj SSErrn1 SSmnessinn l7E3ayampQN 1 n Marfarfmn due in Ermr 5F p 9 6 0j AA afsquare P fhe pr p fb V HE5quot 39 E 5 Sf warm e m39amE b 8 warrlable SSTuw Il 6 quot sSTurm1 SSTurIal sSTma1 RE SSREgms3i I1 ii 1 A M SSEImI 5 1 A SST4 al L SSTt al l R3 is called thae E f il t emf dete minati m which is th tha 13E139rEI11 Hf va ati n that tha pmdictm va abll can explain the IESPEIISE wn abla T Coefficient of Determination Example r 0668 R1 O6682 x 100 446 CI 446 of the Variation in the Value of height can be explained by the hand length The other 554 cannot be explained by the length of a hand Ls 3 55 Practice True or False 1 If the Pearson correlation coefficient r lt 0 then 9 lt O 2 Ifr lt 0 then R2 lt O 3 HR2 081 and I9 0135 then r 09 4 HR2 081 and I9 0135 then r 09 5 If bl 0 then it is still possible that r gt O Ezeelinpile If le e 2 en rm he1 Zfiiixi pb T 45min ewe the We an end Ehe 4 37 zen and Z111 he ejgm gr 255 hem e E 1 deite Find e mill 5 2 e ere E 4L5en e E3 quot Eihee SE IV V 225 ha 3 15 Same leg with S5 hi The Lir eer E e1Teleili1 Eee eith A xi e 3 e55e h25e p 155 33 T an m 15 m an en e The S eqpe lefthe Leeeit pt Eegeeeih Line h1 1 III II S Ln hi T 3 e5 a E eh5eee 1 The F39i11Ie139eelzrt ef the Least qlJ 1 E E R gf l Line H5 h f hp 5 h 395 EwE as E39I 0 e Write W11 ilzhe equetieh if ilzhe Leeet queree eg139eeeim1 Line f ha l E 0 h 39h5 uEn quot39e if Chapter 4 Case Study ALWAYS4EARquotNNG Scatterplot of City Government Income vs Private Meter Income Without Brinks Fquot1 HIl uT393f3939Ill quotliiFH 39PiIfll Cl Positive Weak linear association Cl Predicted Collection 688497 1455 City Income 7lTAIyampQN Are Brinks Employees Stealing from Parking Meters Cl New York City contracted Brinks to collect parking meter money The city suspects that employees are keeping some of it Cl There is data on the monthly meter collection of honest not Brinks collectors VS the city s total income for that month jg Predicted VS Actual Brinks Collection A I I 39 I l l l I Z n r r F39quotI1i Ill f393 eic3939 39iiim3939iill Xw Xw f nn1a133n143 E H iImm Cl Predicted Collection 688497 1455 City Income 3 One month City Income was 7016 and Brinks collected 1330143 CI 688497 14557016 1709325 Cl Discrepancy 1709352 1330143 379182 l7lTAIyampQN 5 Comparing Brinks VS Honest Employees A E 1 E3 3 1 IT J Li J L 39 i E 1 F V J 445 V V F I 1 L IS Fl I 1 4 L v I j u l r 4 1 t7 Wu H 7Ehijt iri rjtV J A ill I 39 1 J I 11 A I In IJ Cl Conclusion Income when Brinks is Working is clearly lower than When the honest employees are Working Chapter 4 Guided Exercise 1 ALWAYS4EARquotNNG Does the Cost of a Flight Depend on R 5R D RIZEIIIIIIIEIJliJI li igiir Fa res Iran Q Cl How much would it E mt MM cost to y 500 miles T 1 D Elilitag fl El Use a complete SEEM 39 3195 regression analysis mm 254 em LA 314 E399 quotE139 QO 9399 4239 ilallii 3fl39Ir39 Ii339Ir39 E lfle H E9 J 439lE39l F43 EL TE 334 i H1 l7lT AIyampQN P Create a Scatterplot 74 Scatterplnt of Milesw Cast 450quot 4EEquot ESEquot EEIEIquot EEEIquot EEIEIquot I quot15Equot quot1EEquot Elil 1 39 EI quot15lEIEI EEIIEIEI EEIEIEI EEIIEIEI Miles Cl Since the cost tends to increase as mileage increases and since there is no apparent strong curvature the linear model is l7lT AIyampQN A6 The Regression Line Fitted lirie plet P Q eet 45 p P Simple llneer regreeelen results Dependent Verielelez Eieet 4EEquot lndependentiferielale ru1iee 35 quot Sample eiee III 2521 Hn cerreletien ceef eient uz IIITEi53 2mu FL Heq n5e5e35 1 A Eetirnete eferrer etenderd devietien TEEEE14 m I I I I I I EEIEI 391ElEIEl 1500 EEIEIEI EEEIEI EEIEIEI II39iee Cl Interpret the Slope 008 Cl For every additional mile on average the price goes up by 008 Cl Interpret the yintercept 163 El This is the predicted price for a 0 mile ight The yintercept is meaningless here l7lTA139yampQN P 0 Answer the Question Cl How much would it cost to y 500 miles Cl Predicted Cost 16260 00796 miles Cl 16260 00796 500 20240 Cl A 500 mile ight is predicted to cost 20240 Chapter 4 Guided Exercise 2 ALWAYS4EARquotNNG Test Scores Slope r Cl The summary statistics between the midterm and nal exam scores are Cl Midterm Mean 75 Standard Dev 10 Cl Final Mean 75 Standard Dev 10 Cl r 07 n 20 Cl First find the slope 1 2 5 S midterm O7 j 07 10 Test Scores yintercept Midterm Mean 75 Standard Dev 10 Final Mean 75 Standard Dev 10 rO7n20bO7 Cl Then find the yintercept a from the equation I I a y bx 75 0775 225 Test Scores Regression Line Midterm Mean 75 Standard Dev 10 Final Mean 75 Standard Dev 10 r 07 n 20 I9 07 a 225 3 Write out the following equation Predicted a bx Cl Predicted Final Score 225 O7Midter1n Score 3 Use the equation to predict the final score for a midterrn score of 95 Cl Predicted Final 225 O795 89 Cl This is less than 95 since the slope is less than 1 l TEFA139yampQN Chapter 5 Modeling Variation With Probability ALWAYS4EARquotNNG Learning Objectives Cl Understand that humans can t reliably create random numbers or sequences Cl Understand that a probability is a longterm relative frequency Cl Know the difference between empirical and theoretical probabilities and know how to calculate them Learning Objectives Continued n Cl Be able to determine whether two events are independent or associated and understand the implications of making incorrect assumptions about independent events Cl Understand that the Law of Large Numbers allows us to use empirical probabilities to estimate and test theoretical probabilities Cl Know how to design a simulation to estimate empirical probabilities What Is Randomness Exp erim em Ra n do m 0 utcom e 3 Experiment Flip a coin Outcomes Head Tail How random is this experiment when the coin is fair P Physics and Randomness Randomness means the total absence of patterns or orderliness One can take away the randomness and hence the Variation in possible outcomes by guring out repeated pattems and xed behaviors in an experiment We live in a World bound by the laws of physics R ii Psychology and Randomness The choices human beings make every day are not random Example of cognitive bias CLICKER QUESTION Which hand A Left B Right R Choosing At Random Cl If a string of numbers are chosen at random then absolutely no predictable pattern occurs and no digit is more likely to appear more often than another Cl In general outcomes occur at random if every outcome is just as likely to appear as any other outcome and not the slightest predictable pattem of outcomes occurs Understanding Randomness There is no such thing as perfect randomness in nature If the choices We make can never be truly random how can We be unbiased in a controlled experiment Solution Pseudorandomness arti cial random numbers created based on number theory gf Simulating Randomness Rolling a Fair Die 10 Times 23 3149 35314 22393 EEEE4 Ed rraua 29 3 4 EEHEE E E l d 3391 so TEEIIE IIHHI HIEI 1u5 4553i anus an E3 95 E 3363 3953a WEHEQ 32 EEDEE area 3324 li l 33i 1 Pick a line say 30 on the table to begin 2 Select numbers in order disregarding O789 3 The random outcomes are 545346253 Note Random number tables can also be used to simulate biased outcomes PEEMSQM Simulating Randomness Make 20 Free i Throws 34 I3a39F393 9 39Ir39I EELS 93363 E a 339l39ltIquot 32 EISTEBEE 59quotli39 3 E a 33aE4I 39l 2quotIquot quotEH33 Assume shooting percentage 80 Using Line 31 if the number is O17 then We count it as a made shot if the number is 8 or 9 We count it as a missed shot What is the emirical robabilit Limitations With Tables and Computers CI The random number table only has a nite list If it is used many times it will not be random at all Cl Computers involve a random seed typically rounded to the time the command is sent to the nearest millisecond Generated numbers are called pseudo random numbers Empirical Probabilities Cl Probability measures the chance that a random event occurs Cl Events are statements about random outcomes Cl Empirical probabilities are short run relative frequencies based on the observations from an experiment A coin was tossed 50 times and landed on heads 22 times The empirical probability is 2250 044 3 Question How good is this estimate Will you be satisfied with 10 ips 1000 1000000 l7lTAIyampQN e Theoretical Probabilities 1 Cl Long run frequency Flip a coin n times PHead of headsn as n gt 00 Cl Population proportion In 2006 the quotrealquot adjusted for in ation median annual household income rose to 5023300 according to the Census Bureau Wikipedia 7lTAIyampQN quotquoti i Theoretical Probabilities II 39939 lz39 7 it 39 39 7 lVae 18517830 Total Population 37253956 1 Female 18736126 Under 18 9295040 Total 13680081 18 amp over 27958916 Occupied 12577498 25 24 2755949 Owneroccupied 7035371 25 34 5317577 Population in owneroccupied 35 49 7572529 number of individuals 55 54 5599545 20742929 55 amp over 4246514 Renteroccupied 5542127 1 Population in renteroccupied White 21453934 number of individuals African American 2299072 15691211 Asian 4861007 Households with individuals under 18 American Indian and Alaska Native 4713016 362801 Vacant 1102583 Native Hawaiian and Pacific Islander Vacant for rent 374610 144386 Vacant for sale 154775 Other 6317372 Identified by two or more 1815384 7EayampQN 08 Theoretical Probabilities III 1 EIIIJIEIII I III 1III juulz IIEDII Using Theoretical and Empirical Probabilities Cl Use Theoretical Probabilities when the description ts one of the three probability Views and that We can mathematically determine them Cl Dice Cards Coins Genetics etc Cl Use Empirical Probabilities when they cannot be mathematically determined This is done by sampling 3 Weather Politics Business Success etc l7l7AIyampQN p p Sample Space Cl Recall Experiments produce random outcomes 3 Sample space 2 Omega the set of all possible outcomes Cl Event A a subset of 2 a statement about outcomes Cl Example Roll a fair die once 2 l 2 3 4 5 6 A even number 2 4 6 u in m r I 39r39I E 5 L I L J 1 r rl i I r n I an L4 LIJ I m r 1 r 1 1 1 r I 1 J A L I J l7EayampQN M Sample Space for Rolling a Pair of Dice 469 ner r 1 2 3 4 5 6 Row 13 roll Cdumn 2 roll All 36 slots are equally likely Axioms of Probability 31 O SPAS1 Cl There can t be a negative chance or more than a 100 chance of something occurring CI 11 P 2 1 Which implies that PA39 1 PA Cl A or A39 is the complement of A It means not A or A does not occur Cl If there is a 25 chance of getting a head then there is a 75 chance of getting a tail 7lT AIyampQN p Probability for Equally Likely Events Cl If all events are equally likely then PA Number of Outcomes in A Number of All Possible Outcomes Cl Example Find the probability of picking an Ace from a 52 card deck 4 Aces 1 539 PAce T Z 52 Cards 13 A Standard Deck of 52 Playing Cards 4 Suits 13 Ranks p I 139 7 G 139 45 4 I v v wquotquotw11Ia 9lI 439as v SUI quot I I O 9 II m at War 1 4 ltligtltIigt vlh I 4101 P P 13953 39l39g 1quot139 g39amp3939 9 9 033 399 3 Q0 4 9 39 En Mm Wm Mn an huh 5 quotquot9Equot quotquotE quot quotquotsE quotquotEiuquoti 21 29 1 I11I q 3 EU EU I 01gt iH iH Edi 5 I 33 is 9 PF PF I 43 4 W W i i 5 3 439 r 3 439 u1 u1 I 3 G 3 1 IE IE i Edi 2 439 2 439 3 3 I I 2 401 2 17 2 i u MW aw 4 M iw IHM IMM IMM Ah i H w Sum of Dice Cl R011 2 dice Find PSum 7 Cl Sum 0f7 6 Cl Possible outcomes 36 U PSum 7 636 16 gt4 p Intersection Logical AND Cl The Word AND in joining two events means both must occur Cl Example If you roll a die nd the probability that it is even and less than 5 Cl Solution The die rolls that are both even and less than 5 are 2 4 PEven AND lt5 Union Logical OR CI The Word OR in joining two events means at least one of the events must occur Cl Example Find the probability of picking a Spade Or a King from a 52 card deck Cl Solution There are 13 spades in the deck There are 3 kings that are not spades Thus there are 16 cards that are a spade or a king 16 4 PS d RK pae mg 52 5 l7E3IyampQN Venn Diagrams D 54 Hi il GIIBEEEEE li lun Emu Maia Ea1 J hm anild Cl A Venn Diagram is a chart that organizes outcomes Cl PHat AND Glasses 26 Cl PNot Hat 1 PHat 1 36 12 l7EaPyampQN Mutually Exclusive Events Cl Two events are called Mutually Exclusive if they cannot both occur Cl If A and B are Mutually Exclusive then PA AND B 0 Cl Example A person is selected at random Let A be the event that the person is a registered Democrat and let B be the event that the person is a registered Republican Then A and B are mutually exclusive events 7lTAIyampQN p84 Axioms of Probability Continued v A Sum 7 D HI PA OR B 2 B 15 roll is 5 or larger PA PB PA AND B Cl This is also known as the Inclusion Exclusion Principle Cl It is best illustrated using a pictorial proof PA OR B PA PB PA AND B Q PA AND B 11 PA 1 11 PB 11 111 PAORB 111 111 l quotlT AIyampQN Review Probability Rules 1 PA l PA 2 PA OR B PA PB PA AND B 3 Mutually Exclusive 21 PA ORB PAPB b PA AND B 0 Tables and Probability E B 39 Tr s s W Car Bike Walk Bus Total Male 75 14 12 23 124 Female 90 7 25 32 154 KTotal 165 21 37 55 278 J Find the probability that a randomly selected student from the table Will Cl Take a bus Cl Be female or Walk to class 539 55278 539 154278 37278 25278 Cl Be male and take a bike 2 166278 Cl 14278 P Conditional Probability U Conditional Probability is the probability of an event occurring given some prior knowledge Cl Find the probability that a person Will vote for a tax cut given that the person is Republican Cl Find the probability that student Who is a psychology major is also a vegetarian The Formula for a Conditional Probability P A AND B ifPB O ma 539 Use this formula when explicitly given the probabilities or percentsYou do not need to use this formula when given a contingency table Cl B is the givenprior event 3 Conditional probability always appears as a proportion of the joint event Where both events occur Within the prior event 7lT AIyampQN g Tables and Conditional Probability n 459 0uf Car Bike Walk Bus Total Male 75 14 12 23 124 KTotal 165 21 37 55 278 Cl PBusFemale Probability of riding the bus given that the person is female Cl 32154 Cl PBus AND Female 32278 Cl PFemale 154278 Variations of the Formula PHE PAArNDBz PB1 pg1 ANDB2 B PA PA B PB A B t PuA AND B PABPB PB AND A PBAPA PABPB Joint prob conditional prob marginal prob Conditional Probabilities Cl By 2020 2 of Americans will be Senior Citizens living in poverty 17 of all Americans Will be Senior Citizens in 2020 What percent of all Senior Citizens Will be living in poverty Cl A gt Senior Citizen B gt Living in Poverty Cl PA AND B 002 PA 017 Cl PBA 002017 S 012 Cl In 2020 about 12 of all Senior Citizens will be living in poverty k k Independent Events Cl Events A and B are called independent if one of the following conditions holds cu PAB PA PA 2 D PBA PB PB 2 D PA AND B PAPB Cl lntuitively events are independent if knowledge of B does not change the probability of A occurring and vice versa Cl In this case conditional prob marginal prob 1 37 7l7AIampQN 3 M A 0 Z Are They Independent H Sueeen yen drew en eenzz frern e e1reInJderd 52ee rel deck 1 ea ms end then F quot e die Th newts quotquoteIarew e nee rt end quotwell en even nLnniner ere ine ennelenit Vheeejus nth reeulte ef thesin die teeem Sumpee twe rquotveer Id weirnen whe Im in the United Sftetee er re ndennly eeleeted The ewe nte quotweim en 1 euarvivee the year e nd quotwen391en P X euwivee then wee Ir are independent E Sueeen tww 40yreeir Id w Iquotnen live in the same eer tn nent eemlre The evnte wernen 1 sulnurive th year and 39 werneInJ pZP euwive th HEVEIFH re re dZ39IEFIE FItg l7lT AIyampQN e eeirell e e nwt 39nnpeet th reeulte ef the Determining Independence 413 CI Two 6Sided dice are rolled LetA be the event that the dice sum to 7 and B be the event that the first die lands on a 4 Are A and B independent 539 A1 6 25 34 43 52 6 1 539 B4 1 42 43 44 45 4 6 CI PA 636 CI PAB 16 539 P1 I3 P1 Cl Yes the events are independent Checking for Independence 5 Cl 55 of all students at the university are female and 30 of all students at the university graduate in four years 13 of all students at the university are Women WhO graduate in four years Are gender and graduating in four years independent Cl F gt Female G gt Graduate in 4 years CI PF AND G 013 D Pm x PG 055 x 030 0165 3 Since 013 75 0165 gender and graduating in four years are not independent so they are associated 7lT AIyampQN e Fact The Strict Equality Condition for Independence Makes It Hard Sta s calljr Independent Pizza T 139 same T1111quotl law it Peppertmi Mushroom Independence is mathematically important but highly unlikely to achieve in practice Example la A I drove to school today B I took a bus to school today Example 2a roll 2 dice C The first die lands on 4 D The sum is 12 I Independent Vs Mutually Exclusive Example lb A I drove to school today B You drove to school today Example 2b roll 2 dice C The first die lands on 4 D The sum is 7 P C B Mutually Exclusive Events Cannot be Independent If nontrivial events A and B are independent then it follows that PA AND B PAPB But if they are mutually exclusive then it must satisfy PA AND B 0 But how can that be possible if PA and PB are positive X 9 Independent Events Are Not Mutually Exclusive 39f If nontrivial events A and B are mutually exclusive then it follows that PAB O PBA But if they are independent then it must satisfy PAB PA and PBA PB Most events in this World are dependent and not mutually exclusive 5 Independence and Repeated Events Cl 8 of all babies are born with a low birth Weight If 10 babies are born tonight at the hospital find the probability that at least one is born with a low birth Weight Assume independence and letX denote the number of newborns With low birth Weight CI PX21 1 PX 0 Cl l 092 X 092 X 092 x p X 092 D 1 0921 U S 057 Cl There is about a 57 chance that at least one of the 10 babies will be born with a low birth Weight 7lT AIyampQN U e Multiplication Rule for Independent Events giazi Ilia pirabalility lzhall a ran amllyi aalaata famala aad 0 O yaaira aid iaiillll alliraiaa lll39lE yaair ia 0gZ awarding tn the Matiainal Wlal Etatialliscra Ralpart Val 41 Na 2 What ia lllla pilrabability llliat f llllll39 randaimly aalaata 5 year al famalaa will auraiiaa lllla yaair Plall lfuir auiraiaa F I51 all raiaaa arllzl 2quot auiraiiaas arid 3quot ailraiaaa alnlldl tilt ail raiaaa H15quot auraiiaaa 39 Plzquot auiraiaaa Pl3quot auraiaaa 0pN Plaltll auraiaaa Maia 9135 Maia ISTB Bayes Rule PM i 3 3 W113 PA i 3 MB39 A v PIZBE FEB I144 Eeeimpllez A recent MEmE d higi1wer eeiityr eitiuid1r feuin iiheii in TquotTquot ef ell eeeieenie the eriiirer wee wearing a eeeii39ii39t Aeeii eiriii rperm irieieeiiee met e39f39 thee eriirere eeeeeed eerieue imjuriee Ileepiieiaeii39ien er deem iIiurii E3 ef i39HiquotIr Irheirieeliiee dI quotaquotli E were ee iiermneizue What ie H1 li DbaEl2i Iit IF1li1Eil E1 driver who wee erieueyr iriiiuirred wee net wearing e eeiieelii39 Eeliuriieri A TiquotIr driver is eerieuelyr imjured u GK lhilcrt eerieuelyr injured Ei Thie Itiiriiirer wee wearing a eeeiiiziii EiE Me E t We 3 e firid lFiEiquot Ai Wlrieii do Knew ND quotl39iE IT iJ ill il4ETIquot D WquotRquotreFr iMrrpie5f 23T PAquot39 Wmeh impiTie5f PIZA 393 quotquotlZI39J39ii1JI 13 1 Eerieuellgir irij PAE B em Whrcia ifmpHequot PA H iZi3 Ir w rquot39i s 1 P Bi A Ii i5 PA iil quoti6 0MX Partitioning the Sample Space PM 03 Pu m23 Elt E Hm llt 99 erra Faas39r393 63 l7lT AIyampQN pvh Finding Probabilities in a Partitioned d Sample Space Following the partitioning of the 2 a randomly chosen person in the general public into one of the two subsets B seatbelt Wearing and B not seatbelt Wearing one can find the probability of A accident by PA PA AND B PA AND B PABPB PAB 913099 This is called the Law of Total Probability j Statistical Inference Using Bayes Rule Infection i igt Symptoms Diagnosis Notice 1 The viralbacteria infection is the cause 2 Almost all labblood tests involve errors 3 A positive reading indicates that the person is highly likely to be infected P Example Rare Disease Let D denote the event that a rare disease is present We usually know the accuracy of a test This means that We are given PD and PD39 or the probability of their complements D false positive D false negative CLICKER Question Which scenario is worse quotquoti Example Rare Disease What We Would like to nd out is PD Example The prevalence rate of a certain rare form of selfin ieted virus is l in 300 in the population Suppose that a blood test can detect the virus 97 of the time when it is present but also gives a positive reading 2 of the time when the virus is not present What is the probability that a person is infected if she just got a positive reading Here l300 PD p Example Building a Tree Diagram Given PD PD and PD We shall partition the sample space by the cause D or D first then followed by the test results Use the Complement property to nd PD and PD 91 m8 D P AND D PDPD Q W 97OO333 0 OO323 2993 Dr 00 kn 0 P Example Finding Conditional Probability 39 L39aii w From a Tree P P AND D P AND D O2317 PD PD AND P 1396 y P AND D PDPD Ms D 97003333 Q X 00323 29 D yr P AND D PD39PD39 00 O299667 quot98 Ol993 7lT AIyampQN Simulations of Multiple Trials air 1 Identify physically What makes a single trial e g roll four dice 2 Decide how to simulate a single trial computer random number table etc 3 Decide the response Variable e g the number of 5 s rolled 4 Decide on the event of interest e g rolled exactly one 5 5 Carry out the simulation 6 Calculate the proportion of times the event of interest occurred This is the Empirical Probability 57 Simulations Resources UCLA s SOCR Cl httpsoeruelaeduhtmlsSOCREXperimentshtml Cl Choose Binomial Coin Experiment Cl n is the number of coin tosses per trial Cl p is the probability that it lands on heads 1 395W1W39E 000000000000000000 Wquot 000000000000000000 U1 Simulations Resources UCLA s SOCR Cl http socruc1aeduhtm1s SOCREXperimentsh Cl Choose Card Experiment Cl n is the number of card dealt Cl Y is the Value Z is the suit II quotII ll 11111111 1111111 I1ll 1 0 r 5 lt1 0 3 Q Om 12 E 13 E3 14 24 112 22 2 1 3 1 2 1 3 1 2 11 2 2 3 4 3 2 11 1 1 3 11 3 4 2 11 2 12 2 4 11 3 2 4 111 11 2 1 11 11 2 11 2 1 2 13 11 2 1 12 1 2 2 4 1 2 2 11 3 3 2 3 4 1 2 1 2 I4 11 13 1 2 2 4 11 13 11 2 1 11 11 2 12 1 2 2 3 3 2 1 0n 11 11 1 2 111 2 2 3 p The Law of Large Numbers LLN at V 39 Cl The Law of Large Numbers states that if an experiment with a random outcome is repeated a large number of times the empirical probability of an event is likely to be close to the true probability Cl The larger the number of repetitions the closer together these probabilities are likely to be Cl In terms of a numerical Variable it indicates that the sample mean tends to revert to the population mean as the sample size increases The Law of Large Numbers Examples Cl If you ip a fair coin one million times it is likely to land on heads close to half the time Cl If you randomly survey 50000 Americans asking them if they know what the capitol of Alabama is the proportion from the survey who do know will be Very close to the proportion of all Americans who know Warnings About the Law of Large Numbers Cl If the theoretical probability is far from 05 it takes a very large number of trials for the Empirical Probability to be close Cl If you ip a fair coin ve times and it lands on heads all ve times this does not mean that it Will land on tails the next ve times to compensate Cl For a numerical variable LLN is a theorem about the meanreversion behavior Lb Some Practice Problems at sulrvev vvas euhijluete hv the Gallup 39l39g3l ilE3iIl I eelnuetes Ilvlav Bc illft vM in vvhieh ilL 1 a uilt i imeiieans were asked quotWhieh et the fellevviri squotiIa terr1en ts eernes elesest te veuir helief aheiuit Eu vuui lhelieve in I ey veui ijlenit lhelieve ih Beat hut vein ea lseiieve in a univelrsai spirit er hiher eevv39eI elr veu erift helieve in either The results ef the suirvev hv reien et the euuhtrv are given in the tahie lhelevvt Eeat 2 ill I ll F lilliisdvveet 2 l 5g G ll F Eeurtllil 219 P Neeti 1 52 TB 2E all What is the iif hallilitly that a rari ernlv seleete aullt vriIquotieIrquotiIar1 vvhe lives in the East helieves in Eetl ls What is the Irehahilitv that a rari ernlv seleete aullt emseriean vvhe heiieves in P lives in the East l7lT AIyampQN Z Some Practice Problems 1 P A BAj 6 Find men 2 Page 6 PM n Y Find An 3 P A n B 4 Pug 6 Final P 4 The felllewing e e eILImn1ery39 If quot1quot mtiere plreferene Mt Plleeeenwt in hlree dirffelrent ege guns in Age ED34 354 EU er elder Teteal Den1eeIre 1Jie quot150 quot1 39 15D 33 Republican 391 ED 15D 25D 52D Tetel em 33 nee IN Q1 Fin the areheiitr ef ege 35 er eler en ver ng fnr F ulhlimn 0 B Fin the nrehzehillity ef age 34 er yrewuner GR vehng for p A 1113 Given a arebeillityr ef ve n Y Given uer ng prheferenee being lwemeere e n the pirczabebillityr ef age being u er yreiun er eiremn ie ee 5amp1 eqr el er ni ihe fer rpIub4ieen l7lT AIyampQN 5 V 2 Some Practice Problems u Sblpbeee that Ebb ean tteeide te age te by ehe Uquotf1thFE1E Imedea U f 1tFtat115pI la ti eat tua at eehtntuter train eeattee et htgth traftte if he deetiduee te age by eat there ta a 0 tthanee he wiillll be Ilate If he gees by btte whtieh thae atpeeiiatl reaewed lanee bttt iia aethetiihtea eyuetrerewdrett the prebabtitti ty ef tbethg late tie elntly 39the eehtntuter trath tie alllnneath1eyertatewtth a pretbability bf enlly t lZJtlI139It5ttquotIquott t39E eape heiiy39e than the blue at Subpeee that Heb ie late ene day and hiia beee wtehee te eatiimnatre the pmb atbiilitty that he dxreye te teeth that day by ear Etihee the dbea net hhew whtieh Imeee 1lTt394Ett39tEtZi t39 atii Ettt39t eh hattatltly tteee he gtiyuee a writer pretabtitlity ef tl te eatth f139ItttEtht39EeE peaaib ittitttiuee hat iia the beae eetirnate of th e btrebatbiill tty that Bet dt39 quotJ Ee te wettlt bl Stmbtpeeev that a eewerker Uf ebquota knewe that he almeet alwaya takea the eelmmuter train te werk never takee the tbua tut aelnrtetiilnnee t at the ti t IquottE takea the ear 0V ie the eewerkere btrebatiilltty that Ebb tdreyue te G that day gtye h that he Ilate l7lTAIyampQN P 1 G 2 3r 1 0 E39Ti3939fI32 39 39i 39 1 J s e We went te eetleullete Ftt eet tlete B gt eyee Theerem thte tie tF tt eer Iete EN ME 1 A F V t Ce1t PtLt1te Ctr M Pf er39 PIette39 Ce at PRttt PLete BN3 PtTttete39quot Pttete II Trettt 5 v x ht Repeat the teehtieel eetleutlettehe ee the etette tut theteetl ef the gpriet ptehehtllttiee heing H3 gee uee Pt hue H teenr ehtt Fr tteth tlttgtn th ten the eeme eqttetteh wtth three chehgee get t39 eer llete 435 Chapter 5 Case Study ALWAYS4EARquotNNG What Did the Jury Decide Cl There are three possibilities Cl This rare event occurred by random chance 539 SIDS Was not the true cause of death Cl The events were not independent Cl J ury s Verdict 3 Guilty C What do you think Chapter 5 Guided Exercise 1 Copyright 2013 Pearson Education Inc All rights reserved ALWAYS4EARquotNNG Finding a Probability 39f Dem Rep Other Total Liberal 306 26 198 530 Moderate 279 134 322 735 Conservative 104 309 180 593 Total 689 469 700 185 8 J Cl Find the probability that a person is Liberal OR a Democrat 1 PLiberal 5301858 2 PDem 6891858 3 Liberal and Dem Mutually Exclusive 3 No since there are 306 who are both I Dem Rep Other Total 3U Fl dl g El PI39ObElb1l1ty Liberal 306 26 198 530 3 Moderate 279 134 322 735 Conserv 104 309 180 593 El Find the probability that a Total 889 489 788 1888 person is Liberal OR a Dem t J 1 PLiberal 5301858 2 PDem 6891858 3 Liberal and Dem Mutually Exclusive Cl N0 since there are 306 who are both 4 PLiberal AND Dem 3061858 5 Remember to Subtract PLib AND Dem PLib PDem PLib AND Dem 6 5301858 68918583061858 7 5 049 l7lT AIyampQN Chapter 5 Guided Exercise 2 Copyright 2013 Pearson Education Inc All rights reserved ALWAYS4EARquotNNG j e Are Gender and Which Thumb is on Top of Naturally Clasped Hands Independent M W 39 ai Right 18 42 Left 12 28 J Cl First expand the table to include the marginal totals Chapter 6 Modeling Random Events The Normal and Binomial Models ALWAYS4EARquotNNG P Probability Models and Distributions Cl A Probability Model is a description of how a statistician thinks data are produced Cl Uniform Cl Bernoulli Cl Normal Cl Other Cl A Probability Distribution or Probability Distribution Function pdf is a table graph or formula that gives all the outcomes of an experiment and their probabilities Random Variable Random 0 Q outcome l EXperiment gt P e e iosample soe X Real numbers Statements about outcomeslt3 5 EVGIYFS discrete A B r 9 9 p Real number intervals con nuous A Random Variable is a realValued function a coding mechanism that maps events or subsets of 2 to numbers or intervals Discrete Probability Distributions 3 Cl The most common Way to display a pdf for discrete data is with a table CI The probability distribution table always has two columns or rows Cl The first x displays all the possible outcomes Cl The second px displays the corresponding probabilities for these outcomes X 1411 0L Kn inm new Pz rrgj a P xn h Discrete Probability Distributions 3 Cl The most common Way to display a pdf for discrete data is with a table CI The probability distribution table always has two columns or rows Cl The first x displays all the possible outcomes Cl The second px displays the corresponding probabilities for these outcomes X 1411 Kn inm new Pz rrgj c O a P xn Examples of Tabulated Probability Distribution Function 5 e Experiment Toss a fair coin twice Sample space HH HT TH TT equally likely Random Variable X number of heads Possible Values for X O l 2 Now construct the table for probabilities of the events PXO PXl and PX2 X nf ears ED 1 0 0 11quot NE JIM Q 1 4 I U E 0 b II Examples of Probability Distribution Graphs 5 ii W Frubabllltsr E 3 M in I I II Frabablllty E I M I It 1 J3 1 2 3 I4 5 IEEIEIIIIEII 1IIS liIIIIiZIIITiIIZIIquotII It F tmr f r LI3LJ5 Stati stits Class I farfemale I for male Amaznnbnm Ratings g g Expected Value Cl The expected value of a random Variable X is its theoretical mean Value denoted by u or EX Cl The formula is I 1300 Z X1906 Cl Think of expected Value as a Weighted average of all possible Values for X by their probability density Example Finding Expected Value EX 116 216 K px XPX X d 1 16 16 516 616 2 16 2 2U635 3 16 36 4 16 46 3 5 16 56 E 6 16 66 Q j E L L in E Theoretical Variance and Standard Deviation Cl The variance of a random Variable X denoted by 5393 or VarX can be calculated as VafX ZX21x Cl The theoretical standard deviation of a rV X is the square root of Variance Example Finding Variance 3 WWquot quot 39C l11TE the 1JaariaI1EElt P 8 1 B 39FJi Ii filh frrllarmriiI1g pr a ity disi139ihm ti war I39I39EEIE iE the number Elf 3 pers rents fm a i iB E1 iI1 rI39E dunri g a siI1glElt1ri5it n L53 Ehl 3 l H 1 E 3 4 5 In I 3 r 3 ixn1Pi1 2 W 0H9 4 7 H H 149 222r1 1331n 3 39 6 99 15 353 449 1m1 1391513 E Q 9327 P 4 3 5 1 1 45f111 01 G3 151 1233r1 12 ar1 uU3 H12 251 3 3971 139m3 H 3H1 H105 3251 123j1iiL 1231nIL E u149 E LnIhata hiiH p Sample Question IiIEI1HIiuITiL ham mil n f awaits x T313 f mp IIflEl ilquot IlEE E IIUil IiE E 1 e vvu1TEd 1 thus pmvi u a Jl E39 Til f lEnwi g is g 1J 1L Z fEH11 aid 1 pm in H la1quotga Ej l ji39 V f Amii En 1iE il 1 2 3 4 P F l39 habiiiift3939 ill 2 1 Q1 Isatil xim PX1 J Q2 Cr w a gapill f the pI39 um I quot 394 jIiF i1 139ij 3eF f matly 3 Ejltl ltlif in H gv m p 3913 dist1 1quotbutim11 f Elf aaidelmirgz quot A prubah inty Hf 0st 2 aacvzis njfts l H r quot i11na tJlta tiiit3 r Hf 1151 3 nti P F quot W l T isfmeid395 in H vm y Hf Eh f laid Iii n giwa 2 p favzrzi its in a gv 0 I11 lflil lf Eterlnime the prwemiumg tha 139in5L1rm1cE c mamy WI393iL1M ask 0 0 11 is the expected average f ccidents 2 Is the Elf accide t the ezxm y tn E t varies fmm ay tn day 3 Ifill G fm day tn P en What i3 E variance P d P deviati Elf the f c idents 4 dines the disf ihuti n f1hE39 f EE 1E IE k Hike ll mlha litlp W EX oo5 1o25 23 32 41 5O8 6O2 W 237 y Continuous Data and Probability Distribution Functions This area represents Piljli352 imalty wailing Tlii39E39 39lmirui EE Cl Often represented as a curve Cl The area under the curve between two Values of x represents the probability of x being between these two Values Cl The total area under the curve must equal 1 Cl The curve cannot lie below the xaxis l7lTA139yampQN Finding Probabilities for Continuous 5 Uniform Distribt I III 5 1 ma Waitiin TITIE ijlnlinlllyesj Cl The curve above shows the probability distribution function for time to Wait for a bus that comes every 12 minutes Find the probability that you will wait between 5 and 10 minutes 3 Shade in the area Cl Find the area of the rectangle Cl P5 lt x lt10 Base X Height 5 x 08333 041655 7ITAvsampN f The Normal Model iii am F E A 3 V 3 E In E IEI I J p9 I I I EH 15 T5 1 a atnliI Bland Pressure CI The Normal Model is a good model if Cl The distribution is unimodal Cl The distribution is approximately symmetric Cl The distribution is approximately bell shaped CI The Normal Distribution is also called Gaussian p Center and Spread of the Normal Distribution Cl u stands for the center or mean of a distribution Cl 5 stands for the standard deviation of a distribution Cl Note that the Greek letters u and G are used for distributions and 3c and s are used for sample data m Notation and Area E quotF aw ELEIE quot m 39 n i ii 1IE 1392 Cl N62 means the normal distribution With mean LL 6 and standard deviation 6 2 Cl The area under the normal curve above the xaxis and to the left of x 4 represents Px lt 4 Cl Px lt 4 Px 3 4 for a continuous Variable Cl The probability of a continuous rV taking exactly any single discrete Value is O l7lT AIyampQN b Empirical Probabilities for Normal Model um S 1 an i 1 an V 3 an pX Y Limitation of the Empirical Normal Model A AL V 11 1 1 V i r 1 r E 1 7 71 39 F KL 4 I 7 L E J I 1 A I L r1 ml J F L 1 H L 4 c II 1 J FA 1 L 1 1 1 1 1 1 L 3 L L 11 L 1 1 I F quot A L L Eela ve L L 1 r4 4 1 L 1L 1 4 J 1 1 L r r n r 39 1 1 L L1 A 4 rr 7 L J 1 J r 1 I L r 1 A r 1 1 FA 139 L r L L r l 4 A 4 39 L L 1 1 1 1 39 4 L 1 1 IE 39E L 1 r v 1 1 1 I 1 r 139 4 1 1 1 r 1 L r 1 L r 1 L 1 L 1 L J1 1 L 1 1 F 11 L 2 1 1 T Going From X to Z Standardization Zscore Value mean sd If a continuous random Variable X NtL 5 Then Z Xtio NO 1 The cumulative probability PZ S z is given by the Standard Normal Table 0k Finding Probabilities With a Ztable 3939 3939 saw Cumulative 4 Probability Standard quotNormal Curve to The Left ofz Cumulative Probability for 2 is The Area Table 2 Standard Normal Cumulative Probabilities z 00 01 02 03 04 05 06 07 03 339 34 0003 0003 0003 0003 0003 0003 0003 0003 0003 33 0005 0005 0005 0004 0004 0004 0004 0004 0004 ggg3 0i74 32 0007 0007 0006 0006 0006 0006 0006 0005 0005 0oo i1i9 31 0010 0009 0009 0009 0008 0003 0003 0003 0007 00o 9623 30 0013 0013 0013 0012 0012 0011 0011 0011 0010 PZ 29 0019 0018 0013 0017 0016 0016 0015 0015 0014 0012 28 0026 0025 0024 0023 0023 0022 0021 0021 0020 00ii1i9iriii3 l7lT AIyampQN 6 When Using a Ztable Cl Round Z scores to two decimal points Cl Take note of the shaded region in the diagram The four digit probabilities represent areas of the shaded region Cl Always draw a picture and shade the region that you wish to find probability for CI A single real number has no probability mass i PZ E Z PZ lt Z h Example Baby Seals Cl Research has shown that the mean length of a newborn Paci c harbor seal is 295 in and that G 12 in Suppose that the lengths follow the Normal model Find the probability that a randomly selected pup will be more than 32 in El Let X be the length in inches of a newborn Paci c harbor seal chosen at random Cl PXgt 32 0019 Cl httpsocruclaeduhtmlsdistNorrnalDistributionhtml 1 25 7l7AIampEQN K Snapshm Chase M Cnmpute l Step 1 Sketch a Picture Shade the Region Win Step 2 Calculate Zscore X 32 gt z 3229512208 I1r 395 Cumulative Probability 39 Step 3 Use Ztable With Caution PZgt208 1PZlt208 19812 0188 39 Cumulative probability for z is the area under 9 the standard normal curve to the left ofz Sta11da1IdNor111al Culnulative Probabilities continued 2 00 01 02 03 04 05 06 07 03 09 00 5000 5040 5080 5120 5160 5199 5239 5279 5319 5359 01 5398 5438 5478 55 17 5557 5596 5636 5675 57 14 5753 02 5793 5832 5871 5910 5948 5987 6026 6064 6103 6141 Does not match the area we drew 19 9713 9719 9726 9732 9738 9744 9750 9756 976 9767 20 9772 9778 9783 9788 9793 9798 9803 9808 9812 9817 21 9821 9826 9830 9834 9838 9842 9846 9850 9854 9857 l7lTAIyampQIl y Finding Probabilities in Between Two b Values Pa Z bPaSZltbPaltZ bPaltZltb Use PZ lt b PZ lt a PZ lt 2 PZ lt 1 P1ltZlt2 9772 8413 1359 s 136 Em Br 1 1 W CLICKER QUESTION Finding Normal Probabilities giazi Based on the numbers from the previous slide PZ lt 2 9772 and PZ lt 1 8413 Which of the following statements about normal probability must be incorrect A PO lt z lt 2 4772 B P1 lt Z lt 1 6826 C PZgt 11587 CLICKER QUESTION Finding Normal Probabilities Answer Z Based on the numbers from the previous slide PZ lt 2 9772 and PZ lt 1 8413 Which of the following statements about normal probability must be incorrect A PO lt z lt 2 4772 B P1 lt Z lt 1 6826 C PZ gt 1 1587 lt should be the same as PZlt18413 Some Practice Problems 1 PZ lt 075 2 PZ gt 125 3 P O8 lt z lt 12 4 XN100 15 P9O lt X lt 120 ii Properties of Standard Normal Distribution Mean Median Mode O PZ gt Z PZ lt Z PZgtz1 PZltz PaltZltbPZltb PZlta PZ lt2 Pz ltZ lt z1 2PZ gt Z c K Looking Up Zscores What are the Zscores for Q1 and Q3 9iEiative frequen y 25 25 pA a 0 e T a a 1 L iwr rrLuaun1tiLl E Q Tlllgpper 3uartiil Q3 lquotEA139ampEILN 3 A P Finding Zscore Given Probability Cumulative 0 Probability Cumulative Probability for z is The Area Standard Normal Curve to The Left of 2 Table 2 Sta11dard N0rma1Cumu1ative Probabilities z 00 01 02 03 04 05 06 07 03 4 09 34 0003 0003 0003 0003 0003 0003 0003 0003 0003 33 0005 0005 0005 0004 0004 0004 0004 0004 0004 0003ff Problem Find z such that PZ lt z 25 02500 Find the CLOSEST probability 07 2420 2339 2353 2327 2296 2266 2236 2206 2177 0 06 2743 2709 2676 2643 2611 2573 2546 2514 2433 05 3035 3050 3015 2931 2946 2912 2377 2343 2310 27763iiii 7lT AIyampQN Exercise What Is the Zscore for 95 m Percentile Hint We need a z Value such that PZ lt Z 95 Cunlulative Probability 5 9 Cumulative probability for 2 is the area under Z the standard normal curve to the left ofz Sta11da139dNorn1a1 Cumulative Probabilities continued 2 00 01 02 03 045 05 T 06 07 08 09 T 00 5000 5040 5080 5120 5160 5199 5239 5279 5319 5359 01 5398 5438 5478 5517 5557 5596 5636 5675 5714 5753 15 9332 9345 9357 9370 9382 9394 9406 9418 9429 9441 16 9452 9463 9474 9484 9495 9505 9515 9525 9535 9545 17 9554 9564 9573 9582 9591 9599 9608 9616 9625 9633 l7lTAIyampQN 0 Example Finding Theoretical Percentiles Suppose a prestigious private grade school only accepts kids who could score in the top 12 on their IQ test Assume that the test scores are normally distributed with mean 100 and standard deviation 8 Find the cutoff IQ score for prospective students H quotigamp Step 1 Draw a Picture erH 39 iL iquot gtlt N J H W J J E 1 Step 2 Locate Zscore B Step 3 From Z to X 03 07 J z 00 01 02 04 05 05 08 09 00 5000 5040 5080 5120 5160 5199 5239 5279 5319 5359 01 5398 5438 5478 5517 5557 5596 5636 5675 5714 5753 02 5793 5832 5871 5910 5948 5987 6026 6064 6103 6141 HQ 1quot39lquot Cquotquot11quotquot3quot 755139 quotquotquotquotquotquot quotquot39 quotquotquotquot quotquot quotquot39quot 10 8413 8438 8461 8485 8508 8531 8554 8577 8599 8621 11 8643 8665 8686 8708 8729 8749 8770 8790 8810 8830 12 8849 8869 8888 8907 8925 8944 8962 8980 8997 9015 Z 1175 X oz 6 11758 100 1094 Bernoulli Random Variable Cl A Bernoulli random Variable can only take on two Values 0 and 1 Cl We often use Bernoulli rV to model successfailure headtail hitmiss yesno type of binary observed Values Cl There is one parameter for Bernoulli Variable p success rate in the sense that PXl p and PXO 119 Mean and Variance of Bernoulli a Cl If XBernp then its pdf looks like D X PX XPX Xp2PX 0 119 0 p2119 1 19 1 1p2p Cl EX p success rate 539 VarX p1pp1p p119 success ratefailure rate p q Examples of Bernoulli RV Experiment Toss a fair coin once RV Let Xl if a head comes up XO otherwise Then XBernpl2 Experiment Roll a fair die once RV Let Xl if 5 or 6 is rolled XO otherwise Then XBernpl 3 Experiment Choose one number at random from a random number table RV Let Xl if the number is nonzero and is divisible by 4 otherwise XO Then XBem O2 9 Multiple Bernoulli Trials Cl If a fair coin is ipped 5 times Assuming that each ip is independent let X of heads Cl In terms of Values of X 2Ol2345 Cl PX5 Pr0ball ve tosses come up heads PHead AND Head AND Head AND Head AND Head PHead5 1212121212132 Cl Likewise PXOl 32 Exactly One Success How about PXlProbexactly 1 head First off exactly 1 head means that the other four tosses must be tails But We have more than one Way to get exactly 1 head namely HTTTT THTTT TTHTT TTTHT and TTTTH Each has identical probability PHeadPTail4 l7E3ayampQN 3 0 A The Probability for One Success Since We are not specifying the position for the one and only head We get PX1 512124 532 0156 Now how many ways can We observe exactly two heads among 5 such independent Bernoulli trials The Binomial Model 3 Suppose that a Bernoulli trial with success rate p is repeated 14 times independently Let X be the number of successes Cl X Binn 19 Cl X can be any Value in O l 2 141 14 CI HT ifri n a1 MHWp 1 pl PM km Cmr1pW CI Ck is the number of Ways We can place k successes I among 14 positions Cl 14 l2 3 n2nl n nfactorial 7lTAIyampQN Bir1omia1n p Model As of 201 1 Gallup polling found that 31 of registered voters identified as Democrats 27 as Republicans and 40 as independents Suppose We choose 5 registered Voters at random Let X be the number of people who would identify themselves as Democrats Then XBir1n5 pO31 PXO 5CO1p5 PX1 5C1plp4 PX2 5C2p21p3 PX3 5C3p31p2 PX4 5C4p41p PX5 5C5 p5 l7lTAIyampQN Explaining Binomial Model Cl Take a closer look at PX2 ie the probability that We sample exactly 2 Democrats 3 We are only interested in Whether a person is D or D 3 Since their positions are not speci ed there are 10 such combinations for different ordering Cl This can be done by simply listing all possible combinations Where We can place those 2 successes Cl l2l3l4l5232425343545 Cl 24 means D39 D D D D Cl All has probability 3 lquot 3 l 696969 7EayampQN kg Calculating nCk Cl 5C2 10 means that there are 10 Ways to place two successes among 5 spots 539 5 5 12345 2 52 213 12123 45 Cl Likewise SOC3 can be calculated as 3 47 l23 When to Apply Binomial Model CI The Binomial Model applies if 1 There are a xed number of trials 2 Only two outcomes are possible for each trial Yes or No Success or Failure Heads or Tails ie each trial is Bernoulli 3 The probability of success p is the same for each trial 4 The trials are independent d CLICKER QUESTION Which Problem Can We Apply Binomial Model Cl A Pick 5 cards from a deck one at a time Without replacement Let X of spades Cl B Assuming Kobe can make 84 of his shots from the foul line he attempts the next 100 shots from randomly chosen spots on the court Let Y of made baskets D C You spend 1 each on the next 5 draws of Mega Million lottery with your favorite combination of numbers Let Z of Winning tickets 7E3IyampQN c CLICKER QUESTION Which Problem e c Can We Apply Binomial Model Cl A Pick 5 cards from a deck one at a time Without replacement Let X of spades Not independent Cl B Assuming Kobe can make 84 of his shots from the foul line he attempts the next 100 shots from randomly chosen spots on the court Let Y of made baskets Not identical 3 C You spend 1 each on the next 5 draws of Mega Million lottery with your favorite combination of numbers Let Z of Winning tickets 7E3IyampQN 0m Words and Inequalities gza3 539 Since Binomial model is D Exactl for discrete random y variables be careful with Cl L353 Than gt lt the equality as it Will D At Least gt change the probability Cl More Than gt lt D At M0 539 Notice that Less Than and At Least are complements and More Than and At Most are Complements 0 Finding a Binomial Probability httpwvwvsoorucaeduhtmIsSOCRDistributionshtml Cl 14 of all clothing bought online is returned If an online retailer sells 35 items of clothing What is the probability that Cl At least 5 Will be returned I9 3519 14 PXgt 5 S 055 Cl Fewer than 7 will be returned I9 3519 14 PXlt 7 S 079 CI More than 6 will be returned I9 3519 14 PXgt 6 S 021 Cl At most 4 will be returned I9 3519 14 PXlt 4 S 045 The Expected Value of Binomial RV Cl If We roll a six sided die 30 times then We Would expect to roll a six 30 x 16 5 times Cl For a Binomial Distribution u np is called the mean or the expected Value Cl A Binomial Distribution with 14 trials and probability of success p has standard deviation 0 np1 P Deriving Expected Value of Binomial Cl Recall that Binomial rv tracks the number of successes in multiple independent identical Bernoulli trials Cl Let X Binn p Since each individual success advances X by 1 unit Consider that C39XY1Y2F Yn where Y1 is the Bemoulli rv indicating the result for the ith trial Cl Y5l means that the 5th trial results in a success Y7O means that the 7th is a failure 7EIyampQN l Deriving Expected Value of Binomial Cl Each Yi has mean p and Variance p1 p U EX EYl Y2 0Id Yn EYl EY2 EYn 19 19 c 19 np trialssuccess rate 3 Since each trial is independent 539 VarX VarYl p i Yn VarYl h VarYn 19119 P0 19119 np119 trials success ratefailure rate l7lT AIyampQN The Standard Deviation T COn any particular day there is a 6 chance of a fatal accident in the city Find the mean and standard deviation for the number of accidents in a 365 day year CIunp 365 x 006 219 CI 6 365O694 V2059 454 D We expect about 22 fatal accidents per year give or take four or five accidents 0 Visualizing the Binomial Distribution 4 Prnnanilim Prnhahilitgr 0 U3quot 1 P 025 I I 115quot IZILEquot EI391E 11 EI391 ELEIEquot nn5 I U I o M I I I I I II 391 E 3 4 5 Ei El 2 1 E E III F39rIIHtIg39 N I tn4ip 1 E If n is large and p is Hquot close to 05 then the Hquot binomial distribution is approximately normal iIi39 El quot El I 5 i i p II IIIgi 35 I 5 5 1II P Surveys and Independence CI The Binomial Model may be used if the respondents of a survey are selected with replacement Cl If n is large enough We may sometimes use the normal model to approximate Binomial model Chapter 6 Case Study Copyright 2013 Pearson Education Inc All rights reserved ALWAYS4EARquotNNG Too Heavy or Not Cl McDonald s claims that its ice cream cones Weigh 318 ounces However one of the authors bought ve cones and found that all five Weighed more than that Is this surprising Cl Assume the ice cream Weights are normally distributed or at least symmetric about the IIICEIII p McDonald s claims that its ice cream cones Weigh 318 ounces However one of the authors bought ve cones and found that all ve Weighed more than that Is this surprising DnipO5 Cl Px 5 3 0031 Cl If the mean is 318 ounces as McDonald s claims then there is only a 31 chance that out of 5 randomly chosen cones they will all Weigh above 318 ounces Cl With such a small probability suspicion is raised l7lT AIyampQN Chapter 6 Guided Exercise 1 Copyright 2013 Pearson Education Inc All rights reserved ALWAYS4EARquotNNG SAT Scores Cl According to data from the College Board the mean quantitative SAT score for female college bound high school seniors in 2009 Was 500 SAT scores are approximately Normally distributed with a population standard deviation of 100 What percentage of the female college bound high school seniors had scores above 675 N500100 Cl To nd the zscore for 675 subtract the mean and divide by the standard deviation Report the z seore 39i N500lOO Eziansiw Cl Refer to the Normal curve Explain Why the SAT score of 500 is right below the zscore of O The dots on the axis mark the location of zscores that are integers from 3 to 3 Cl The mean is always at the center of the Normal curve The mean SAT score is 500 and the mean zscore is O by definition l7lT AIyampQN N500100 Density s 1 L L L 1 g 5 3 Carefully sketch a copy of the curve Pencil in the SAT scores of 200 300 400 600 and 700 in the correct places 3 Notice that the mean is 500 and the standard deviation is 100 The zscores are 3 2 1 and l l7lT AIyampQN N500100 lanaity above the 675 indicated on the graph with Put in the corresponding z score We Want to nd what percentage of students had scores above 675 Shade the area to the right of this boundary because numbers to the right are larger l7lTAIyampQN N500100 1 1 I L r 4 1 L x L1 L1 u L 39539 39n39 as 39fquoti1 IquotEa 3 E Ei I Ii E E 4 C Use StatCruneh to nd the area to the right of this zscore CI Pz gt 175 z 004 1T N500100 1 1 I L 4 1 L l uu LIL A I54 I Ia r E L L L N N 0 0 Cl Finally Write a sentence telling What you found 39539 39139 Eh f iIquot5 3 I F F V V L I A if V II Cl About 4 of the female college bound high school seniors had scores above 675 Chapter 6 Guided Exercise 2 Copyright 2013 Pearson Education Inc All rights reserved ALWAYS4EARquotNNG 0 0 SAT Scores Cl According to data from the College Board the mean quantitative SAT score for female collegebound high school seniors in 2009 Was 500 SAT scores are approximately Normally distributed With a population standard deviation of 100 A scholarship committee Wants to give awards to collegebound Women Who score at the 96th percentile or above on the SAT What score does an applicant need e SAT Scores N500lOO Cl A scholarship committee wants to give awards to collegebound women who score at the 96th percentile or above on the SAT What score does an applicant need 539 Will the SAT test score be above the mean or below it Explain Cl The 96th percentile is the score such that 96 of all scores are at or below this score The 50th percentile is the mean so the 96th percentile is above the mean 7E3ayampQN 0 SAT Scores N500100 1 n F L 4 A F L 1 L m Li L L L J 4 F L 1 I 1 F r 4 I 4 J 1 I I I r39 1 I J E E J II39 lr39 r J I 1 J 1 4 In 4 J I I 1 I f l I 1 F I L I Cl Label the curve with integer zscores The dots represent the position of integer zscores from 3 to 3 quotl7AIyampQN l b SAT Scores N500100 U V I g L L r L L Ll Ll Ll L L L LL I I L J I I L L L1 I 1 73974 I 1 4 L 4 l L 4 r 1 r 4 I E L lr Ir I 4 I j J j j 1 4 4 4 4 I I 1 1 LL 39 4 I I I C Use StatCruneh to nd the zscore that has area to the left 096 323175 H SAT Scores N500100 z 2 175 Eiarwty n 1 E3 rJ Cl Add that zscore to the sketch and draw a Vertical 3 E 1 line above it through the curve Shade the left side because the area to the left is What is given c SAT Scores N500100 z 2 175 Danny a 11 Cl Find the SAT score that corresponds to the zscore The score should be 2 standard deviations above the mean so Cx tLzo CI 500 1751OO 539 675 SAT Scores N500100 P z 2 175 x 675 Emmy 3 3 1 1 1 511 515 mu Cl Add the SAT score on the sketch Where it says Answer l EayampQN O SAT Scores N500lOO W z z 175 x 675 hm y 3 E 71 391 23 3 2 f n Cl Finally Write a sentence stating What you found Cl The applicant needs a score of at least 675 to receive the scholarship 7EayampQN Chapter 7 Survey Sampling and Inference ALWAYS4EARquotNNG Survey Terminology F3939 El The Population is the group of people or objects We Wish to study Cl A Parameter is a numerical value that characterizes some aspect of the population Cl A Census is a survey in Which every member of the population is measured Cl A Sample is a collection of people or objects taken from the population Cl A Statistic also called an Estimator is a number derived from the data 7lTAIyampQN A A survey asked 1000 US college students if they preferred to study alone or with others 420 preferred to study alone Cl The population is all US college students Cl The sample is the 1000 students who were surveyed Cl The parameter of interest is p the proportion of all US college students who study alone Cl The statistic 13 042 is the proportion of the 1000 students who study alone 3 Statistical inference We estimate that 42 of all US college students prefer to study alone gt Statistical Inference 3 Statistical inference is the art and science of drawing conclusions about a population on the basis of observing only a small subset of that population Cl Statistical inference always involves uncertainty so an important component of this science is measuring our uncertainty Bias Cl A survey method is Biased if it has a tendency to produce an untrue value Cl Sampling Bias results from taking a sample that is not representative of the population 3 Convenience sampling and voluntary response sampling Cl Measurement Bias comes from asking questions that do not produce a true answer Cl Misleading questions Intrusive questions Insuf cient choices for answers l7lT AIyampQN Other Sources for Errors aJ Cl Nonresponse Cl Incorrect data entry Cl Respondents intentionally provide fake data Some Ways to Avoid Bias Cl What percentage of people who were asked to participate actually did so Cl Did the researchers choose people to participate in the survey or did the people themselves choose to participate Cl Did the researcher leave out whole segments of the population who are likely to answer the question differently from the rest of the population Identify Possible Biases Population All Americans Cl A student asked all 2500 of her Facebook friends if they preferred Facebook to Twitter Cl A researcher stood outside a grocery store and asked 250 shoppers Do you eat out at a restaurant at least three times per Week 3 Gallop randomly selected 1000 phone numbers from the yellow pages and then called to ask if they supported government funding of high speed rail p Simple Random Sampling Cl Sampling Frame is the collection of subjects where samples will be drawn from Ideally it is the population itself Cl Simple Random Sampling SRS involves randomly drawing people from the sampling frame Without replacement Cl Each subject in the sampling frame has exactly the same chance of being chosen P SRS Pros Cons and an Example Cl Can give arguably the best estimator for population parameters with equal Weight Cl May deliver uneven results low response rate Cl Example UCLA Student Records office chooses at random 500 current students and send them an email asking them to fill out a survey about proposed tuition hike Population Sampling frame All current UCLA students Result 142 responded 121 freshman sophomore l7EayampQN Strati ed Sampling A Strati ed Sample is one obtained by first separating the population into homogeneous nonoverlapping groups called strata and then obtaining a simple random sample from each stratum Male FElmallle g Balm ple Accuracy and Precision Cl a Both accurate and precise Cl b Precise but not accurate Cl c Accurate but not precise Cl d Neither accurate nor precise Accuracy and Precision Bias and Standard Error Cl Bias is a measure of the accuracy Cl If only basketball players are measured to estimated the proportion of Americans Who are taller than 6 feet then there is a bias for a larger proportion Cl Standard Error is a measure of precision Cl If the sample size is only three the estimate of the proportion of tall people using the sample is likely to be far from the proportion of tall people in the US The standard error will be large lquotEA139ampEILN p A Simulation Small Sample Size 43 Cl Present Mike Rick Sue Mary Rose Cl Percent Male p 40 Cl Take a few random samples of n 3 people and record the percent male Cl Rick Sue Rose 3 33 Cl Mike Rick Rose 13 67 3 Sue Mary Rose 13 0 Cl There are a total of 5C3 10 equally likely samples of size 3 available p Distribution of the Sample Proportions Cl This distribution has Er E1rtEn1nE Prprliin5 mean 40 Cl This distribution has a standard deviation I about 2108 i Cl Notice the mean of U runa E the collection of all the sample proportions equals the population proportion of 40 Cl By taking one sample We can never obtain the exact population proportion Larger Sample Size 3 Consider a population of 1000000 people 40 male Randomly select 14 300 of them Cl If We look at all possible samples With n 300 then the distribution of the sample proportions would have Cl Mean 40 Cl Standard Error 2 28 Cl Notice that the mean sample proportion equals the population proportion Cl The standard error is 10 times smaller When the sample size is 100 times larger 7lTAIyampQN Sample Sizes Mean and Standard Error Cl The mean of all sample proportions always equals the population proportion CI The standard error is smaller for larger sample sizes Cl The size of the population has no effect on the distribution of all sample proportions as long as the population size is at least 10 times larger than the sample size Discovering the Distribution of Sample Proportion ie Sampling Distribution 3 Suppose that the underlying population proportion is p for a certain population such as ProbHead for a coin or proportion of people Voting for Proposition 30 Cl p is unique but unknown Cl The proportion p is calculated by sumsize or D of headstotal number of ips of Votestotal population size 3 We may assume that everyone in the population holding a Value that is either 1 or O l7EayampQN I Modeling Distribution of Sample Proportion Using Binomial Cl Now if We sample from this population pool of 139s and 039s We get a sample of size n containing a bunch of 139s and O39s Cl The sample proportion is then calculated by Xn where X is the sum of 139s yes votes in the sample Cl As we have seen sample proportion is different every time This is because We sample different people every time Cl If We model everyone in our sample as a Bemoulli random variable with success rate p then XBinnp 7EayampQN Modeling Distribution of Sample Proportion Using Binomial Cl Recall EX np and VarX nplp D We the have Xn follows E mpn p u VarP39 plpn Cl This result is consistent With Law of Large Numbers As We increase the sample size the Variability in our estimate for the proportion becomes smaller Cl The standard deviation of sample proportion the square root of success ratefailure rate is called the Standard error l7lTAIyampQN 1 Formulae for the Mean and the Standard a Error 1319 gAV 71 Cl The mean of the sampling distribution is equal to the population proportion Cl If the sample size is increased by a factor then the standard error will be decreased by the square root of that factor Bias Precision Mean and Standard Error 3 5134 El For a SRS the bias is 0 Cl Equivalent to the statement that the mean of all the sample proportions equals the population proportion Cl For a SRS the precision is better for larger sample sizes Cl Equivalent to the statement that the standard error is smaller for larger sample sizes Cl The precision and bias are independent of the population size as long as the population size is as least 10 times larger than the sample size Cl Only 65 of insured Women get annual Pap tests Find the mean and standard error for the sampling distribution With sample size 500 CMean tt p 65 CI Standard Error 0 13 0 65 1 03965 21 500 l7lT AIyampQN p Probability Distributions for Sample Proportions Frunlrmlluilitgr i539 F39rsnnluaIiIit5r i ISLEquot H 10 p ss 09 DE EH 1 I Ill 1 Jf f c E5quot Fnnlruallailitgr F hialhilitif M EIiIE 5 o n ss ioo pZ p ss 07 M ELEEquot IEIJE l p I II a E 1 H 0H 39 quotL i J e The Trouble With p T53 Cl The main application of sampling distributions is to address the bias and precision of a sample that is taken Cl The purpose of taking a sample is to estimate the population proportion p is unknown Cl The standard error SE 0 13 N cannot be calculated quot C Use 13 as a estimate for p This gives an estimate for the standard error SEW J sh 5T I l 2 Requirements for the Central Limit k Theorem for Sample Proportions Cl Random and Independent The sample is collected randomly and the trials are independent of each other Cl Large Sample SuccessFailure Condition The sample has at least 10 successes np Z 10 and at least 10 failures nl p nq Z 10 Cl Large Population 10 condition If the sample is collected Without replacement then the population size is at least 10 times the samle size l7lTAIyampQN 5 The Central Limit Theorem for Sample Proportions Cl The Central Limit Theorem for Sample Proportions If the trials are random and independent and the sample and population sizes are large then the sampling distribution of I3 is approximately normal and follows 394 T wl Ji Cl If you don t know 19 fa can be substituted to find the standard error q1P 127 The Central Limit Theorem 7112 El 200 randomly selected American drivers Were asked if they text While driving 48 of them admitted that the did Cl The drivers Were randomly selected Cl Successes 48 2 10 Failures 152 2 10 Cl Population Size American Drivers is very large 539 Conclusion The Distribution is approximately normal Cl Mean 48200 024 CI 512 v2476200 003 Notes About the Requirements Cl Since random sampling is usually impossible to do other sampling techniques are often used instead Cl A large sample size is absolutely necessary as Central Limit Theorem applies in theory as n gt 00 Cl Typically the population of interest is Very large but one should still be aware of this requirement the eertter tfer ieeeee CerttireI end F39rettehtieh nrepertee thet 0l et391t yeeretlt wehteh ih the ILlll Iii tE ettee hue tttltl et 25 er there Ie tI Ll E 1IhEH5tEIEtf I5i EFEEHISG he eeeeietet wiihth interteeetee theetlth rietkll At llerge eellllege 239 xreheemllyr eeleetee fernelle etunehte regperted their height ehe weight hly 31 etf theee etunehte hee VNIIIIE gareter then 25 Ie the rettIi39n uf HtgthJMI students lll39Ill55lllHquot g39 etnell ehdIhtizettiehIe hetitirt 12 ehditirt The eemplte eize tn must he rte Ierger thet p Lttlhe JlIEtit39t quot 5IUEEE55fFEiLtl E t dii ti HI tlthe EEWHIE eize hee te he hi enulgh eepect et tleeet 1 etteeeeeee ehd 139 Feitlttree ite e p 20t3l3922 442 1 end Ht jiilI 0 JTr39315 Eit0 Pi 31 V pX 4 W 22ji ITj p t5 SeEpp2EtIeth SDp H j H 2 W 2 24 5 T Q Finding Probabilities With the Central Limit Theorem Cl 78 of all laboratory mice can make it through a maze If 600 randomly selected mice attempt the maze what is the probability that more than 80 of them Will make it through the maze 3 Note that all requirements are met Cl Random Sample Cl Successes mp 600 x 078 468 210 El Failures nlp 600 x 022 132 210 El Large population size All mice in existence PS 78 of all laboratory mice can make it through a Y maze If 600 randomly selected mice attempt the maze What is the probability that more than 80 of them Will make it through the maze Cl By CLT the distribution for all possible sample proportions the sampling distribution is approximately Normal Cl Mean 78 CI SE 7822600 0017 539 Sampling Distribution for NO78 0017 7lT AIyampQN H 78 of all laboratory mice can make it through a 0 M maze If 600 randomly selected mice attempt the maze What is the probability that more than 80 of them Will make it through the maze Cl Sampling Distribution N Y l J L NnrvmaIcalcuIai3ur I u D 0 LEIDI 157quot Pf9 gt 08 012 Tll quot liquotI1 39LTquot39E ZlTB E ELE2 4 Failure of the CLT Cl About half a percent of all people in the World are living With HIV You Want to nd the probability that out of 1000 randomly selected people at least 1 of them are living With HIV Cl np 1000 X 0005 5 lt10 CI The CLT does not apply Cl Do not use the Normal Distribution to calculate this probability 2 Con dence Intervals The Idea Cl A Con dence Interval for a Population Proportion is an interval Where the unknown population proportion is likely to lie Cl Example Suppose the CLT applies and one Wants to estimate 19 Letf 024 and SE 003 Cl Estimate l96 SE from the mean 3 024 196 X 003 019 D 024 196 X 003 030 D We can be 95 con dent that the population proportion is between 019 and 030 pO Con dence Interval Interpretation Cl The proportion of green MampM s is 016 You take several samples of 80 MampM s each and come up with the following 95 con dence intervals CI 1418 1217 1519 1115 1217 152o 1317 1419 1318 1519 152o Cl All of the above confidence intervals except l 1 15 successfully contain the population proportion 0 1 95 Con dence Interval Interpretations 399 3931 39 Cl For every random sample that can be taken from a population there corresponds a 95 confidence interval Cl That is 95 of these confidence intervals constructed independently With the same sample size are expected to successfully contain the population proportion and 5 will not Cl If We only construct CI once then We are 95 confident that our confidence interval captures the true population proportion 393 399 Con dence Intervals The Formula In order to compute a con dence interval for p We use 3 2 Z SE6SI Where 2 Zstar or critical Z is determined by the con dence level and SEest is the estimated standard error The combined term following the i sign is called the Margin of Error Since SE not the estimated SE requires the true value of p We almost always use SEest Computing Con dence Intervals 3 Of 500 random people surveyed 72 were smokers Find the 95 con dence interval 72 CI p M44 L 14451OO144 00157 500 El Margin of Error 19600157 0031 Cl 0144 0031 0113 Cl 0144 0031 0175 D We are 95 con dent that between 113 and 175 of all people are smokers Con dence VS Margin of Error ii II II II II II II i i a i T g Ilia g ll ti IJLEE 3 lE l2 I155 El Increasing the level G p jeanua Lamail MTQIHJ Inf Enrur ls p p sialrcl El39fIll3975 of con dence s s ah urt 2 0c mime increases the margin g A Qf effQ139 sg sl lrcl Errata Cl Decreasing the level of con dence decreases the margin of error l7lTAIyampQN co Computing 90 Con dence Intervals Cl 176 of 200 patients randomly selected to receive treatment survived Find the 90 confidence interval a Az zogg SE 881 88 OO23 p est 0 Cl Margin of Error 16450023 0038 Cl 088 0038 0842 Cl 0880038 0918 D We are 90 confident that between 842 and 918 of all people who receive the treatment survive quotEA139yampQN T Interpreting Con dence Intervals Cl 300 randomly chosen voters were asked if they favored the bond initiative to fund a new college sports arena 120 did support it The 95 con dence interval is O34046 Cl Since a bond initiative requires over 50 of the votes to pass and the 050 is above the con dence interval it is unlikely that the bond initiative will pass 2 Con dence Intervals Summary Cl Use a con dence interval to get plausible bounds on a population proportion Cl Do not use if nfy lt10 or nl f9lt1O CI The confidence level of 95 is standard Cl A lower level eg 90 can be used if you need a smaller margin of error Cl A higher level eg 99 can be used at the expense of a higher margin of error p i The Sample Size For a Speci c ME Cl Too small a sample size will result in a larger margin of error than is Wanted Cl Too large a sample size will result in unnecessary expense of time and resources Cl The sample size is determined by the i 0 f formula to The Sample Size Formula for a Con dence Interval for a Population Proportion 2 Cl 2 is the 2 value corresponding to chosen con dence level 2 196 for a 95 level of con dence level l 1 P Z 4 ME Cl 14 is the sample size Cl ME is the margin of error qP The Sample Size Con dence Level and Bs Margin of Error Cl Increasing the level of confidence increases the required sample size Cl Increasing the margin of error decreases the required sample size Cl If We Want a smaller margin of error We should either increase the sample size or decrease the level of confidence 7lTAIyampQN p Finding the Required Sample Size Cl If you Want to generate a 95 0 i con dence interval for the P quot proportion of people Who are lactose intolerant and Want to have a margin of error of no more than i3 how many randomly selected people do you need to survey Cl 2 196 ME 003 n 1 96003 14 106711 Cl You need to survey 1068 people 147 Chapter 7 Case Study Copyright 2013 Pearson Education Inc All rights reserved ALWAYS4EARquotNNG I The Study Cl In 2006 the AMA gave the press release 3 Sex and intoxication among Women more common on spring break according to AMA poll Cl Eightythree percent of the female college attending respondents agreed spring break trips involve more or heavier drinking than occurs on college campuses and 74 percent said spring break trips result in increased sexual activity t t The Design Cl Posted a survey on the AMA Website Cl 644 Women chose to respond Cl Cited a margin of error of i4 Cl Cited 21 95 con dence interval 0S is The Issues 0T Cl Not a scienti c study Cl Voluntary Response Bias Cl Should never cite a margin of error and con dence interval from a biased study G G The Conclusion 5133 Cl AMA changed its posting to state that the results were not based on a random sample Cl They removed all remarks about the margin of error from the paper Chapter 7 Guided Exercise Copyright 2013 Pearson Education Inc All rights reserved ALWAYS4EARquotNNG R The Oregon Bar Exam Cl According to the Oregon Bar Association approximately 65 of the people who take the bar exam to practice law in Oregon pass the exam Cl Find the approximate probability that at least 67 of 200 randomly sampled people who take the Oregon bar exam will pass it p The Oregon Bar Exam Population Proportion Cl According to the Oregon Bar Association approximately 65 of the people who take the bar exam to practice law in Oregon pass the exam Cl The sample proportion is 067 What is the population proportion Cl 065 to The Oregon Bar Exam Checking Assumptions ll quot Cl Randomly Sampled Cl Yes Cl Large Enough Sample Size CI np 20065 130 Izlp 2o035 70 Both are greater than 5 so Yes Cl Population large enough 3 Since more than 10 X 200 2000 people take the Oregon bar exam the population is large enough quotEA139yampQN Calculate the Standard Error SE Fl O65S1 0652 200 0034 w5 Calculate the zScore 067 065 Z 0034 059 t The Normal Curve 4 Cl Pz gt 059 is the area to the right on the normal curve L jjjjjjjj L jjjjjjjj L jjj L 4 r 3 to The Final Solution 7 T39 Cl Using StatCrunch this area is about 028 CI There is about a 28 chance that at least 67 of 200 randomly sampled people who take the Oregon bar exam will pass it Chapter 8 Hypothesis Testing for Population Proportions ALWAYS4EARquotNNG Learning Objectives Cl Know how to test hypotheses concerning a population proportion and hypotheses concerning the comparison of two population proportions 3 Understand the meaning of pvalue and how it is used 539 Understand the meaning of significance level and how it is used Cl Know the conditions required for calculating a pvalue and significance level 539 Understand the connection between hypothesis testing and confidence interval Nc Hypothesis Testing An Analogy at 7 39 39 CI A court case Claim Prosecutor claims the person committed a crime Purpose To decide whether or not the person committed the crime as charged Null Hypothesis The person is assumed not guilty Altemative Hypothesis The person is guilty Rule Law Data Evidence chain presented by prosecutor and defender39s attome CI Hypothesis Testing Claim The proportion of CA residents p With health insurance is less than 70 Purpose To detennine if the claim is evident or not Null Hypothesis p7O Altemative Hypothesis plt7O Rule Central Limit Theorem assuming that NH is true check it with sample data Data Survey on health insurance coverage from a samle of CA residents 7EaampQN A court case Cl Fact No one except the real Cl culprit knows What happened during the time of the incident Cl Decision Based on the Law Cl if the evidence of committing a crime is beyond reasonable doubt then the person is guilty Hypothesis Testing An Analogy Hypothesis Testing Fact One cannot be 100 sure about the Value for the true population proportion without conducting a Census Decision If the test statistic Zscore computed from a representative sample is unusually small say Z lt 2 then We may conclude that the null hypothesis should be rejected and we are inclined to the altemative hypothesis 7EIyampQN Hypothesis Testing Another Analogy 4 J CI Hypothesis Testing Cl Claim A coin is not fair Cl Purpose To detennine if PHead p 05 Cl Null Hypothesis p 05 Cl Alternative Hypothesis 1 75 05 Cl Rule Central Limit Theorem assuming that the coin is fair ip it enough times to see if evidence suggests otherwise Drug testing case Claim A pharmaceutical company claims to have found a better drug for treating Type II diabetes Purpose Test whether there is a significant improvement upon existing treatments Null Hypothesis The new drug does not work Altemative Hypothesis The new drug works better Rule Clinical trials in a randomized study Hypothesis Testing Another Analogy Drug testing case Hypothesis Testing Cl Data A random sample of Cl Data Record from ipping Type II diabetes patients the coin a good number of Cl Fact We may not know the times true effects of the drug on the Cl Fact True PHead requires entire population from this in nitely many ips to nd sample out Cl Decision A biostatistician Cl Decision If the test statistic may use appropriate Zscore is extreme meaning hypothesis tests to that the null hypothesis is determine whether the highly unlikely to be true we improvement is statistically may reject the null hypothesis signi cant in favor of the altematiye hypothesis ie coin is unfair 7EayampQN Hypothesis Testing iiI1 Equot i39 Cl Hypothesis tests are NOT proofs We are unable to prove any fact about a population parameter from merely a single sample Cl We test the null hypothesis using sample data because it is usually impossible or impractical to gain access to the entire population Cl If population data is available there is no need for inferential statistics Cl There are only 2 possible conclusions Fmgg au h Steps in Hypothesis Testing ll y quot 1quot Cl A statement is made regarding the nature of a population Cl Evidence sample data is collected in order to test the statement Cl The data is analyzed to assess the plausibility of the claim Hypothesis Testing for Fairness of a Coin glia3 Cl How do you determine if someone is cheating when they toss a coin Cl If the toss is fair PHeads p 05 Cl Otherwise PHeads p 75 05 CI If only 15 Heads come up out of 40 tosses is the person cheating PA8 Null and Alternative Hypotheses Cl H0 19 05 Cl H naught or the Null Hypothesis Cl The status quo no cheating no surprises no change no effect Cl Ha p 75 05 CI The Alternative Hypothesis Cl This is what we hope or guess is true Cl Note that p is the population proportion Hypothesis testing is always for the population parameter never the sample statistic F Null and Alternative Hypotheses 39939 lz39 L 39 39 Cl H0 The null hypothesis is a statement to be tested The null hypothesis is a statusquo hypothesis a statement of no change no effect or no difference Cl The null hypothesis is assumed true from the beginning until evidence indicates otherwise Cl Ha The alternative hypothesis is a statement We are trying to find evidence to support Cl Both null and alternative hypotheses are statements regarding the value of a population parameter gt Alternative Hypothesis Onetailed Vs p T Twotailed Cl Equal Versus not equal hypothesis twotailed Cl HO parameter null Value Cl Ha parameter 75 null Value Cl Equal Versus less than hypothesis left tailed Cl HO parameter null Value Cl Ha parameter lt null Value Cl Equal Versus greater than hypothesis right tailed Cl HO parameter null Value Cl Ha parameter gt null Value K K Example Forming Hypotheses Fer eeeh ef the fllteihg eleime determirie the mill d alt A etitre hjgpteeee State whether the teet ie tweeteiled leftteiled r righttailed QT 62 f imeriee edulte regularly veltteered their time fer eerity wetrh A reeeereer heliettee P M thie iereer1t egeie differetnt tdey Aeerdirig tee e etudy pt11iehediri Merehj F the me lertui ef e pherre eell he 3 EiEl1ll I39IElEi l E weer 325 miritrtee W reeeereher elietree that the t eeri lerim ef e eell hee iriereeeed eee Ueirig err held l I11llf 1I 39 g preeeeej the eterrderd dettietierr f the e 1t fwir1e 111rt ir1 e ttle weer eirireee With new q iptt tt the qitelity eritrel meireger helietree e eteriderd d tifi t li tll hrle deereeeed l7lT AIyampQN Solution 3 P p6 f A111Bifiitz g audultg 139 E i 1quot1 39f v luntwr d thair tj l f if Earity vmrllt A 139 SE 139El I hE1ilEV S that this pEI39 E1 1 l is differ t t day ThE l l3 lLhEEl lS d3315 with H p pula n pr p rliUrn If thus p m ntae I I39l fllliffllfl lllflil in Earity W Ill i5 mm di hrent thau iJ1 it will ha 0 an the HEHII mtl eyriaf 125 Hg p 62 39Si1 1J 3 EB I39BS B LTE11 139 ll3EliEVBS mt ta pmE139 139quot11L E 39 diff 139E11 t day the ff f lf l jE3 1Ti E395lI 1amp9 fw fal d l j 1il 1f E5 I5T I pgf P Solution Aeen139djn tn n study publiehed the mean length Inf en nhnne enll en n ee nler teleptene wee Injnntee A researcher believes that the mean length ef n enll ntereeeed eiinee ten The hypetheeie deals with en pepulnnen Ineeun tL If the I t ealtl ten en n ee nlnr pnrne is ne different o iin it will be Injnntee S the Hun t gwn eeile 115 H u25 Since te researcher ltznelietres that te nlenn enll H j hen nnlerene ed the eJfemen Ve metheeile tie I gt 325 H V V Ete en feet l7lT AIyampQN n Solution E U jll Hm HM Ill uf lf l pr m sj this stau dmd dEquot5Fi ti emf th f Wjjil p11tiJ1 3 b l W35 nj ll With HEW Eq1liIII1ET L Ire quality m iml m tg f b li FEE fer standard d Tr39 i l ilZ l 1 has d f t d Thug hyp th ais daaals with an p p1I1aLti 1I1 p v d E 5r39 i 1Li iZ l1 1 H thug e davi ti n wit the quot aw Eq11ip39meE11t 3933 11am Bh t d it will ha d unma an the 1111 j f 139 A 4 g H b Sillma e quality m I1139ml Ill t f b liwua tlflat tha d Hi tffi Inn has dE I39aE SEd I11B Hffema H196 J3 E395I T 15quot I D 1 923 3 Ja fmfled EH5 l7lT AIyampQN The Meaning of the Test Statistic CI The test statistic tells us how unlikely that sample proportion could have happened by random chance had the null hypothesis been true Cl If the null hypothesis is true then the test statistic should be close to O or 17 close to p0 Therefore the farther the test statistic is from O the more the null hypothesis is discredited Level of Signi cance 39 r39 CI The Level of Signi cance 06 is the probability of rejecting HO when H0 is true CI The Value of on is chosen by the researcher before the sample data is collected 3 Concluding that the coin tossing is not fair When it is fair 3 Saying that the defendant is guilty but he did not commit the crime 3 Concluding that the person has psychic abilities when she Was just guessing Typical Levels of Signi cance Cl Since the level of signi cance is the probability of making an error it should be small Cl By default We use alpha 005 l in 20 D on 005 or 5 is the most typical 3 on 001 or 1 is used when making this error has Very bad repercussions D on 010 or 10 is used when an error is less an issue than making no conclusion When Ha is true l7lTAIyampQN KH Step 0 Checking Conditions for testing a p S B one sample population proportion Te teethypetheaeeireoard11 the epulaten preerten we can nee the atepe that fellew prevded that 1 The Sample a etaned ample randem aannpln 0xH mpg c 1 A n F 1 A 0 The Sampled Valuee are ndependent efeaeh ether E Leee than 10 ef the pepulaten eendtenj l7lT AIyampQN c s Hypothesis Testing for p Step 1 Determine the null and alternative hypotheses regarding the Value of the population proportion TWTaille LeftTilcd RightTailed H P 0 Hg P f in Ha P pl H1 P e P0 H1 P 1 Pa G P Tquot pm Note pg p the assunie valtne of the pnnllatin proprtin l7EayampQN Ste 2 Seleet 3 level ef eigni eaneej a t aged en the eerieueneee of makln 3 Type I error F 0 quotl 1i1lIl111IE the teet etetietie Ome Prenrliiien r teet P0 J 1 F1 Note We uee p eemputinh the etenderd errer rather p P beeeuee when we teet a hypothesis the 111111 hypetheeie elweye eeeumed true l7lT AIyampQN p The Test Statistic Cl In conducting a hypothesis test one takes a sample and nds CI The CLT for sample proportions 15 Mp vp 1 pn CI The Test Statistic is the zscore Observed Null A Z SE Option 1 Determine Critical Values Draw Rejection Regions n Step 4 Uee Normal Tale to determine the eritieail Vailue TwoTailell LeftTailed A Critical Iirtin al L elm A Critical A Renn 4 A Regien Option 1 Compare Test Statistic to the Critical Value Step Cimpate the critical value with the test statistic Tw0 T iletl LeftTaiMied quotgl1t Ile If z lti mg GI If Z w Q mm If z E z1i I iEjE t 20 322 2 reject rajectt itha mill Ii1 1lli391FpCiilEEiS the null hypothiesis hypothesiis p Deriving a Contradiction Using Test Statistic Cl Why does it make sense to reject the null hypothesis if the sample proportion corresponds to an observed Zscore that is more extreme than i2 Area 223 Example Hypothesis Testing for p Cl In 1997 46 of Americans said they did not trust the media When it comes to reporting the news fully accurately and fairly In a 2007 poll of 1010 adults nationwide 5 25 stated they did not trust the media At the oLOO5 level of significance is there evidence to support the claim that the percentage of Americans that do not trust the media to report fully and accurately has increased since 1997 Pa Step 0 Check Conditions 3939 r 2 Cl In 1997 46 of Americans said they did not trust the media when it comes to reporting the news fully accurately and fairly In a 2007 poll of 1010 adults nationwide 525 stated they did not trust the media At the 0L005 level of signi cance is there evidence to support the claim that the percentage of Americans that do not trust the media to report fully and accurately has increased since 1997 El We want to know ifp gt 046 given n1010 X525 Cl It is reasonable to assume that the sample is randomly chosen CI np010100464646 and n1p0101010465454 are both greater than 10 Cl Since the sample size is less than 10 of the population size the assumption for independence is met 7lT AIyampQN 7 Solution Step 13 Cl Let p represent the population proportion of Americans who do not trust the media Cl Step 1 HO p 046 Ha p gt 046 3 Step 2 The level of significance for this onetailed test is OL005 3 Step 3 The sample proportion phat 525lOlO520 The test statistic is then zoo g Uquot 333 39 Ju41 n4i l l g Solution Step 4 o005 size of the Rejection Region RR Cl Since this is a righttailed test We determine the critical Value at the OL005 level of signi cance by looking up the cutoff Value Zscore for the top 5 01 j to p h j 95 percentile 4 io Y 095 Sta11da1 dNo139n1al Cumulative Probabilities continued N T Z 00 01 02 03 04 05 06 07 08 09 00 5000 5040 5080 5120 5160 5199 52 5279 53 19 5359 01 5398 5438 5478 55 17 5557 5596 5636 5675 57 14 5753 16 9452 9463 9474 9484 9495 9505 9515 9525 9535 9545 17 9554 9564 9573 9582 9591 9599 9608 9616 9625 9633 l7E3IyampQN mN Solution Step 5 7 39 CI Since our test statistic z 383 is more extreme 0 greater than the critical Value 1645 and therefore it lies in the rejection region We reject the null hypothesis in favor of the alternative hyp othes Z 383 Option 2 The p Value CI The pValue is a conditional probability Assuming the null hypothesis is true the p Value is the probability that if the experiment Were repeated you Would get a test statistic as extreme as or more extreme than the one you actually got Cl A small pValue suggests that a surprising outcome has occurred and discredits the null hypothesis p Value Example a95quot Cl A hypothesis test was conducted to see if 10 of the time that the Weather report states a 10 chance of rain that there is rain Of 300 randomly selected days with a 10 chance of rain 6 had rain The pValue was found to be 002 Interpret this pValue E quotHa E I 0 1 ET G 11 rm 33jj3jj33 4 4 Cl If another 300 days with 10 chance of rain were randomly selected then there is only a 2 chance that the number of days of rain would be less than 18 6 or more than 42 14 7E3ayampQN t to Three Types of Hypothesis Tests Cl Left Tailed Hypothesis Cl HO p pO Cl Ha p lt pO Cl Right Tailed Hypothesis Cl HO p pO Cl Ha p gt pO Cl Two Tailed Hypothesis Cl HO p pO Cl Ha p 75 p0 3 h0 Left Tailed Hypothesis DHO ppO CHa pltpO CI The pValue represents the probability that if p p0 and another random sample is taken With the same sample size then the new sample proportion Will be less than observed sample proportion Right Tailed Hypothesis DHO ppO CHa pgtpO CI The pValue represents the probability that if p p0 and another random sample is taken With the same sample size then the new sample proportion Will be greater than observed sample proportion Two Tailed Hypothesis j39 539 H0 19 130 4 Ha 1975190 t E p Cl The pValue represents the probability that if p p0 and another random sample is taken With the same sample size then the new sample proportion Will be farther from p0 than observed sample proportion 7lT AIyampQN Summary of the p Value Cl Tells us how surprising the sample data is if the null hypothesis is true Cl A Very small p Value less than 005 for example indicates that the results that Were obtained Would be Very surprising Cl A larger pValue greater than 005 for example indicates that the results that Were obtained Would not be Very surprising Option 2 Convert Test Statistic to PValue Step Use Ntennal Tab lle to eetimate the P value Tw Tee pc LeftT xep Igl1t 39I te 0 sum ef the area p3 The area lexft ef 31 Th 3533 right 393 e the tale ie the P eeEue is the Pvaltlle 15 tklmj Pquot39F3Jl E Izll Em QP Option 2 Compare PValue to ot Step 5 Cl If pValue lt oz reject the null hypothesis Cl If pValue gt OL fail to reject the null hypothesis r t Alternative Solution Step 4 T 3 Since this is a righttailed test the pValue is the area under the standard normal distribution to the right of the test statistic 20 383 That is pValue equals CI PZ gt 383 z 0 Standard Normal Cumulative Probabilities continued 2 00 01 02 03 04 05 06 07 03 09 S 00 5000 5040 5080 5120 5160 5199 5239 5279 5319 5359 33 9995 9995 9995 9996 9996 9996 9996 9996 9996 9997 34 9997 9997 9997 9997 9997 9997 9997 9997 9997 9998 7lTAIyampQN E Alternative Solution Step 5 3 3 Since the pvalue is less than the level of signi cance We reject the null hypothesis in favor of the alternative hypothesis Cl But the above statement does not answer the research question directly and is hard to understand for someone who has not taken a statistics course Solution Final Step Interpret the Result 3 Step 6 There is sufficient evidence at the OL 005 level of significance to conclude that the percentage of all Americans that do not trust the media to report fully and accurately has increased since 1997 Hypothesis Testing Summary 1 Hypothesize 3 Come up with an idea or hypothesis 3 Write down HO and Ha 2 Prepare 3 Choose the level of significance on 3 Select the test statistic 539 Check conditions for the sampling distribution Hypothesis Testing Summary Continued 3 Compute to Compare 3 Compute the test statistic 3 Compute critical Va1ues or the pValue 4 Interpret Cl If the test statistic falls in the rejection region reject HO otherwise fail to reject HO Cl If pValue lt 06 reject HO If pValue gt 06 fail to reject HO 3 State the conclusion in the context of the study 7lT AIyampQN 3 3 Hypothesize D Eaeh Year 5 er the Cl 90 of all restaurants Werrd trger Pepuratreh rs fail after one year Is killed hy Peaehers New that number different that there is 3 earhpargh for Chinese restaurants to educate people on the 0f the 108 new Prehrerha has this Pereeht Chinese restaurants 87 gerre dewh 300 trgers had failed after one Were observed and 18 year Were killed by poachers D HO P 2 090 C H0 P 2 005 CI Ha p 72 090 Cl Ha p lt 005 J Prepare L 539 Each year 5 of the World tiger population is killed by poachers Now that there is a campaign to educate people on the problem has this percent gone down 300 tigers were observed and 18 were killed by poachers 539 Use the standard 06 005 CI npo 300005 15 2 10 CI nlpo 300095 285 Z 10 3 Can use the zstatistic Cl Assume that the 300 tigers were randomly selected and that there are more than 3000 tigers in the World l7E3ayampQN J Prepare Cl 90 of all restaurants fail after one year Is that number different for Chinese restaurants Of the 108 new Chinese restaurants 87 had failed after one year 5 Use the standard on 005 CI npo 10809 972 2 10 a nlpo 10so1 108 210 5 Can use the zstatistic Cl Assume that the 108 Chinese restaurants Were randomly selected and that there are more than 1080 Chinese restaurants in existence 7lTAIyampQN 4 Compute to Compare Cl Each year 5 of the world tiger population is killed by poachers Now that there is a campaign to educate people on the problem has this percent gone down 300 tigers Were observed and 18 were killed by poachers lefttailed test Cl z 3 0795 539 pValue PZ lt 0795 3 079 D pValue gt 005 fail to reject H0 Cl If 5 of the tigers are killed by poaching and if another sample is observed there is a 79 chance of getting a sample proportion less than 6 79 is Very high Compute to Compare Cl 90 of all restaurants fail after one year Is that number different for Chinese restaurants Of the 108 new Chinese restaurants 87 had failed after one year twotailed test Cl z z 327 Cl pValue PZlt327 PZgt327 5 0001 D pValue lt 005 reject the H0 Cl If 90 of all Chinese restaurants fail after one year and if another sample is observed there is a 01 chance of getting a sample proportion less than 81 01 is a Very small chance P Interpret Cl Each year 5 of the World tiger population is killed by poachers Now that there is a campaign to educate people on the problem has this percent gone down 300 tigers were observed and 18 were killed by poachers Cl z 2 0795 pValue S 079 15 006 3 Since pValue S 079 gt 005 0L fail to reject H0 There is statistically insignificant evidence to support the claim that since the campaign began less than 5 of the World tiger population is killed by poachers l7E3IyampQN 0i 0i Interpret Cl 90 of all restaurants fail after one year Is that number different for Chinese restaurants Of the 108 new Chinese restaurants 87 had failed after one year Cl z S 327 pValue 3 0001 13 081 3 Since pValue 0001 lt 005 0L reject H0 and accept Ha There is statistically significant evidence to support the claim that the percent of Chinese restaurants that fail after one year is different from 90 e Four Possible Outcomes from b Hypothesis Testing 1 We reject the HO When the Ha is true The decision Would be correct 2 We fail to reject the HO When the H0 is true The decision Would be correct 3 We reject the HO When the H0 is true The decision Would be incorrect This type of error is called Type I error 4 We fail to reject the HO When the Ha is true The decision Would be incorrect This type of error is called Type II error l7E3ayampQN T Exercise Type 1 Type II Errors Fer eeeh ef tl1ef ewirmg m j l E eezlle 39 L whet it well meem te Type e1TeI wem it mean the 3 Type 1em391ei1 2003 62 ef Amerieen ewulte reg7ulle1 ly we Iiee139e their time f l ehe1 i y we lr A 1lteeee1 e1e1 helievee heul ere u 0 e ie 7e139em 1e Aeeetlti g it e eibuzy 1211139ehe T Mezrehj 2206 the me lemgv h ef 3 Jaime eall e e i2ETH1 ILLT1 1 1Z 1 3l Jl4 p 2 c mJiijI11 ee A ijlereee reeeerelher Ielieeee a fthe 11ei T lermgf lr ef e ea l l e me mm 3 p Solution to Exercises Aeeerelhlg he study aulhlieht ih M reh 20 the mean tlte gtlh let he il1lZlIl1 eel en eetlluler tetltephe e wee 25 t11l1tee 0 I39ESl ElI1T hEIT helievee that the mean lent ellquot e eall hale inereeeel eihe then A Type I enter eseeujre if the ee lple evidenee leads the 139ESE IEl1EITt eehelude that pD when i11fIeeI the t lll tl meant eell lent en El eellulet phetle lie Still 325 minutes A Type II erlfer eeeute zlif the teeeerehterl fezille te 1ej eet the hypetheeie thet the een length ef e phene eell en a eellhuletr phte e ie minutes when ll feet it ie letlger then 325 hljlmltee l7lTAIyampQN o P HT Greeks TM eclslon Reject HO Type 1 Error Good Fail to reject HO Good Type H Error Cl OL PrType I error Prreject HO H0 is true Cl 3 PrType 11 error Prfai1 to reject HO Ha is true Cl 1 X Speci city Prfai1 to reject HO H0 is true Cl 1 3 Sensitivity Power Prreject HO Ha is true 7E3ayampQN Application Clinical Test E39i Cl In a blood testing setting H0 The person does not have HIV Ha The person has HIV Cl Decision Type I error reject HO HO false positive Type II error fail to reject HO when Ha is true false negative o A Few Notes on Alpha and Beta Cl Notice that the probability of a Type I error is exactly the same as the level of signi cance or and is therefore controlled by the researcher This Value is chosen before the sample data is collected Cl The probability of making a Type 11 error is called 3 Cl There is ALWAYS a risk of being Wrong measured by or and 3 Why Do We Use ot005 Cl In 1931 Sir Ronald Fisher discussed in his book The Design of Experiments the amount of evidence needed to reject a null hypothesis Cl He said that it Was Situation dependent but remarked somewhat casually that for many scientific applications 1 out of 20 might be a reasonable Value 3 Since then some people even some entire disciplines have treated the number 005 as a sacrosanct q q Power HQ CI The Power of a hypothesis test is the probability of rejecting the null hypothesis when the null hypothesis is false Cl Example If the alternative hypothesis is that the person cheated then the power is the probability that if the person is cheating then we will conclude correctly that cheating occurred Cl Always strive for a large power 162 CLICKER HILO Power jquot quot5 Cl Drug test for an Olympian Cl H0 The Olympic athlete did not use any banned substance Cl Tool Urine sample Cl Decision Cl Question This is an example of a test with Cl A High statistical power Cl B Low statistical power Relationship Between 0L and 3 G Itllhme p1quotehewilllity ef 3 Type em139e139 em39eeeee e pm eb jty f 3 Type I e1391 m39 eereeeee end vineelve1 ee They enife eemp1eme1ti11e ef eeeh lllh ilf e Power vs Level of Signi cance Cl Increasing the level of signi cance or increases the power as it decreases B Cl If you Want a better chance of rejecting HO when H0 is false you can decrease the level of significance Cl The only Way to have a small or and a small 3 is to have a large sample size 99 PValue Revisited Cl Recall that We reject HO When p value lt or Cl Indeed the smaller the pvalue the more confident We feel about our decision to reject the null hypothesis but HO does not get any more false Cl Fisher The null hypothesis is never proved or established but is possibly disproved in the course of experimentation Every experirnent may be said to exist only in order to give the facts a chance of disproving the null hypothesis 0 Statistical Signi cance VS Practical Signi cance CI The term Statistical Signi cance is used when the null hypothesis is rejected There is a small probability that if H0 is correct then results as extreme as were obtained would happen randomly CI The term Practical Signi cance means that the results obtained are clearly far from the hypothesized Value Pq Statistical Signi cance and Practical Signi cance Cl A poll to see if the incumbent Will get more than 50 of the Votes results resulted in a sample proportion 58 06 005 and the pValue 006 Cl This is practically signi cant but statistically insigni cant Cl A study to see if there is a difference in pass rates for male VS female statistics students resulted in a sample proportion of men 721 and women 723 on 005 and the pValue 004 Cl This is practically insigni cant but statistically signi cant P Hypotheses Should Never Be Changed After the Data is Analyzed CI The null and alternative hypotheses must be written down before the data is analyzed Cl Never adjust HO and Ha to fit your results Cl If you change your mind based on the data you must collect a new data set in order to support the adjusted hypotheses Using Proper Language Cl If the p Value is less than the level of significance X then state that We reject HO Do not state that the alternative hypothesis is proVen or true Cl If the p Value is greater than 06 then state that We fail to reject HO and that no conclusion can be made Do not state that H0 is accepted or true It is possible that the power is too small and that H0 is still false l7E3IyampQN Comparing Two Populations Cl Comparing two groups men VS Women blacks VS Whites with treatment VS Without treatment etc Cl Instead of one sample size 14 there will be two sample sizes nl and 142 CI pl and p2 instead of a single population proportion p Cl 13 and 13 instead of one sample proportion 13 l7lTAIyampQN pp Hypothesis Test for the Difference in Two Population Proportions Cl The Null Hypothesis will be UHO pl p2 orpl p2O Cl Three possibilities for the Alternative Hypothesis Cl Ha pl lt p2 or pl p2 lt 0 Left Tailed Test Cl Ha pl gt p2 or pl p2 gt 0 Right Tailed Test Cl Ha pl 75 p2 or pl p2 75 0 Two Tailed Test L Pooling Proportions Pop 1 V p X1 X2 13 n1 112 Calculating Test Statistic With Pooled Proportion The best pint estmate ef pt 5 teed the estimate deineted where 11 ox H1 H np eefterrf H1 Checking Conditions Cl Large Sample Check to make sure that all 4 terms are 10 or larger 7115 n11 15 quot215 273913 Cl Random Sample or at least close to random Cl The two samples are independent of each other Cl Independent Within Samples The observations within each sample must be independent of one another Check 10 condition if necessary Caffeine Therapy For Premature Infants Cl Apnea of prematurity occurs when premature babies have shallow breathing or stop breathing for more than 20 seconds Cl Treatment group received caffeine therapy While the other group received a placebo 1 g Does caffeine therapy lower the rate of bad events J Of the 937 infants given the therapy 377 suffered from death or disability The placebo group had 932 infants and of these 431 suffered from death or disability 1 Hypothesize Cl H0 pl p2 Ha pl ltp2 2 Prepare Cl Use OL 005 Cl zstatistic Cl 937 x 043 937 x 057 932 x 043 932 x 057 are all greater than or equal to 10 Assume that the infants were randomly and independently selected 7lTAIyampQN 393 p K 5 39 39 Does caffeine therapy lower the rate of bad events Of the 937 infants given the therapy 377 suffered from death or disability The placebo group had 932 infants and of these 431 suffered from death or disability 3 Compute to Compare 13 y 043 CI SE v04305719371932 937 932 D Zo Cl pvalue PZ lt 262 0004 4 Interpret Cl pvalue 0004 lt 005 06 Cl Reject H0 There is statistically signi cant evidence to conclude that a lower proportions of babies will die or suffer with this therapy than without this therapy Is there a difference between the proportions of men m and Women Who play guitar 82 of the 400 men and 54 of the 300 Women surveyed played guitar Cl Hypothesize Cl H0 p1p2 Ha p175 p2 Cl Prepare Cl Use OL 005 Cl Large Sample 400 x 019 400 x 081 300 x 019 300 x 081 are all greater than or equal to 10 Cl Assume that the men and Women were randomly and independently selected z g Is there a difference between the proportions of men and Women Who play guitar 82 of the 400 men and 54 of the 300 Women surveyed played guitar 82 54 p j 019 3 Compute to Compare 4 3 5 Check z 083 pValue 041 CI Interpret Cl pValue 041 gt 005 06 Cl Fail to Reject HO There is statistically insuf cient evidence to make a conclusion about there being a difference between the proportions of men and Women who own a guitar J CLICKER1 Click A for female B for male CLICKER2 If you just clicked A Click C if you are a north campus major Click D if you are a south campus major CLICKER3 If you are a male Click C if you are a north campus major Click D if you are a south campus major quotEA139ampEILN What to do if Conditions Fail CI The sample size is too small Cl Redo the study with a larger sample size 3 Use an advanced test eg Fisher s Exact Test CI The samples are not independent Cl Take an advance statistics class or 539 Consult a statistician 3 Samples not random 3 State the conclusion for the sample only Chapter 8 Guided Exercise 1 Copyright 2013 Pearson Education Inc All rights reserved ALWAYS4EARquotNNG Gun Control Cl Historically the percentage of US residents Who support stricter gun control laws has been 52 A recent Gallup Poll of 1011 people showed 495 in favor of stricter gun control laws Assume the poll Was given to a random sample of people 3 QUESTION Test the claim that the proportion of those favoring stricter gun control has changed from 052 Perform a hypothesis test using a significance level of 005 y t Hypothesize Cl Test the claim that the proportion of those favoring stricter gun control has changed from 052 Cl H0 The population proportion that supports gun control is O52p 052 UHmpO52 is Prepare Cl Choose the one proportion ztest Cl Random sample yes Cl Sample size Cl I lpO lOllO52 S 526 210 539 nl p0 lOllO48 S 526 210 El Population size is more than 10 times 101 l Compute and Compare 0 Eire 1zan pile Pmnur iitzn mri i 1T39LEIquotII139il l39 l39 Dptiuns Ilmlnthesis test results p prnpn inn nfsuccesses fur pnpulatinn Hg 1 I152 Hg p at I152 Prnpnrtinn Cnunt Tntal Sample Prnp Std Err p 495 I E Stat P 1u39alue II 7quotIE55E H I ti 1 HI F F HI IiI1E3quotL to Interpret p Cl Reject H0 if the pValue is 005 or less or do not reject H0 and choose one of the following conclusions 1 The percentage is not signi cantly different from 52 A signi cant difference is one for which the pValue is less than or equal to 005 2 The percentage is signi cantly different from 52 Cl Fail to reject H0 There is insuf cient evidence to make a conclusion about the percentages being different Chapter 9 Inferring Population Means ALWAYS4EARquotNNG 8 Learning Objectives 3 Understand when the Central Limit Theorem for sample means applies and know how to use it to nd approximate probabilities for sample means Cl Know how to test hypotheses eoneeming a population mean and eoneeming the comparison of two population means 3 Understand how to find interpret and use con dence intervals for a single population mean and for the difference of two population means Y Learning Objectives Continued glia3 3 Understand the meaning of the pvalue and of significance levels 3 Understand how to use a con dence interval to carry out a two tailed hypothesis test for a population mean or for a difference of two population means Sample P Fnpml limi p 5 mm Sample 5 I i39iEliiilI1I E 2 latinlzmi dmfiaiiinn nr sigmaj Sample 3539 F ll li i p 91 Sample pinjpu iun p Ip hatI P f ll li no 1 inn 3 Cl Mean and Standard Deviation if the survey question has a numerical Variable Cl Proportion if the survey question is YesN o Cl The con dence interval and hypothesis test always refer to the population not the sample Standard Error 1 f 94 Cl The Standard Error is the standard deviation of the sampling distribution D 5 D Standard Error and Sample Size 0 oT Fl Cl The Standard Error is smaller for larger sample sizes Cl Increasing the sample size by a factor of 4 decreases the standard error by a factor of 2 Cl Increasing the sample size by a factor of 100 decreases the standard error by a factor of 10 l7lT AIyampQN e The mean cost per item at a grocery store is 275 and the standard deviation is 126 A shopper randomly puts 36 items in her cart Cl Is 275 a parameter or a statistic an ri 1quot39 Cl Parameter Cl Predict the average cost per item in the shopper s cart 3 275 Cl Find the standard error for carts with 36 items 126 D 0 E 021 Conditions for the Central Limit gY Theorem for Sample Means Cl Random Sampling Technique Cl One or Both of the Following Cl Population is Normally Distributed 3 Sample Size is Large Cl Population Size is At Least 10 Times Bigger Than the Sample Size What is a Large Enough Sample Size Cl If the population distribution is not too far from Normal then the sample size can be small Cl For most population distributions n 30 or higher gives sufficient accuracy Cl If the population distribution is far from normal a larger sample size is needed 0 Central Limit Theorem For Means Cl Central Limit Theorem If the conditions are met and the population has mean LL and standard deviation 6 then the sampling distribution Will be approximately normal As 14 gt 00 jNu 0Z 19 E 3 quot lw Hg 41 Eli E j I I UEIIEIE perquotiquotEar 2 Con dence Interval for One Population PN Mean Normal Approximation 39 Since the snmplin distribulin mi 2 is npracminnlielyr 0 witlii maria vuainll In pingpulliini mean JJIL 3 n stfund devi n quotfu ll In 3 p N rnean tr 5DE it Then Cl lir Shi uld i il orF farm 0r i ME iiz it HQWEVER E is nrHen uriknnw xx t The tDistribution 711 CI If G is unknown use the sample standard deviation s instead 5 Cl S is an estimate for the in standard error SEEST Con dence Interval for One Population Mean Student39s tDistribution 5253 A We p1 the unknmm 039 paIlin srtiundrd devitinr1 by lE knnwn I anmplle slrfundrd devi nj rid EIEILIIIITE T39quot39IE p J mi 7 S ET S E A Then the arinul Cl F0r menu is 939 i tf where F 2 n L l7lT AIyampQN Student39s 139 Wi i m E a55 E13 rm empn rEB Q1 h 3 LJjinrm 5 runquotwew in Dublin rquotIlnnL w rkuzed an FLindin mi the 5 n1puiiirig mmE emf while wnr iimng Uni H13 Eeeetfil n In the beat yieldin vriefiea HF brley GunnE55 hwd pr hifbifed its EI IquotIpWEI jnquotEE5 Franquot pub Iii hi1r7I p FpEF5 in h eFEor r fa gu rt frade marEta sans HE iinsa eid II EvEd Hie paeud nym Sfudenif Far hi pub1licquotfian5 Thus hiJ5a fTquotI E139 F II3fl echirsvemenf i5 nnw rEFerrsEd fa 5 V 7quot quot to Facts About the tDistribution Min U35 M 325 l2 1315 I 1rzn E g N I1 Jw K E El Bell shaped Cl Tails a little bigger than Normal at lf1 dlf2 539 Given sample size I4 there are n 1 degrees of freedom Cl For large degrees of freedom the distribution is almost normal 7lT AIyampQN PA CI Example Airfare Cl A researcher is interested in estimating how much CI college students spend on airfare in a year A random sample of 20 students yielded a mean of 555 With a standard deviation of 210 Assume airfare is distributed nearly normal Find a 95 con dence interval for the true population mean of the amount of money college students spend on airfare per year 95 CI for it 1 3 5 210 r 2 Using TTable 1 5 ffLVTH dFnJ12 1i39lQ j f2U3993 B I 2 555 33993 ti 4 954 555 i 9 gyg l L 5 ih iuii39ii39i 9 2 i 3939I3939 52 i I H IE Er 39l 53 viI LE il a E 39i in r E u 155 L 1iFEn I El 1115 13Ir 1Taii a LT39F39E 136 JlElEE LJ3 L3ljlm 133 L i339a39 quotfli I lii1A 14 E I39EIZi 2333 ll l EE1 i Tl 39 39i39 quotE x 391 P s 1 P s ii E39VIE z NW 5 will 5 gm VL r u x El Gx Liii 1EIIamp mm 53124 pUx E F i x 4 2516J Hw lili Hit 121 11w lll iii 145 T H2 1 Mill Ei JiE EfiSquot 39ia P t EIHE7 H n TEE r I A I 1 t J a39139L39I i H u In q q u391a39r1 L liE39f7i quot5 Eiai1 Right Tail Probability warm 3t Table 4 t Distributiorl Critical Values 1 Confidence Level 80 90 95 98 39 99 o 998 0 Right Tail Probability C11 1100 1050 1025 39 1010 1005 1001 1 3078 6314 12706 t quot 31821 63656 318289 1 886 2920 4303 0 6965 9925 22328 1638 2353 3 182 4541 5841 10214 L Jl3 up u ur 4 P P 40 50 60 1 100 1311 1310 t 1 t 1296 1 1282 1 697 1 614 1 p 1671 1 1 660 1645 P7 P7 1 P7 P7 1 1 94 1960 V 24157 P 2390 I 2704 2678 2660 CI Example 5 X i Cl 45 randomly selected college students Worked on homework for an average of 9 hours per Week Their standard deviation Was 2 hours Find a 90 con dence interval for the population mean 3 df 44 use df40 SEEST f 77 gt t 1684 Cl Lower Bound 9 1684 X 030 S 85 Cl Upper Bound 9 1684 X 030 S 95 CI Interpretations 8595 Cl 45 randomly selected college students worked on homework for an average of 9 hours per week Their standard deviation was 2 hours Find a 90 con dence interval for the population mean Cl Interpretation of Con dence Interval We are 90 con dent that the population mean number of hours worked on homework for all college students is between 85 and 95 hours a p CI Interpretations 8595 Cl 45 randomly selected college students Worked on homework for an average of 9 hours per Week Their standard deviation was 2 hours Find a 90 con dence interval for the population mean Cl Interpretation of Con dence Level If many groups of 45 randomly selected students were surveyed each survey Would result in a different con dence interval 90 of these con dence intervals will succeed in containing the actual population mean number of hours Worked on homework and 10 will not contain the true population mean Hypothesis Test for a Population Mean CI The same four steps apply for a hypothesis test for a population mean 1 Hypothesize State HO and Ha 2 Prepare Choose oc check conditions and assumptions and determine the test statistic to use 3 Compute to Compare Compute the test statistic and the pValue and compare p With 06 4 Interpret Reject or fail to Reject HO Write down the conclusion in the context of the study lquotEA139ampEILN Hypothesis Test Example by Formula 55 595 CI Ford claims that its 2012 Focus gets 40 mpg l 2 on the highway Does your Focus mpg differ from 40 mpg You chart your Focus over 35 randomly selected highway trips and nd it got 395 mpg with a standard deviation of 14 mpg Hypothesize UHO tL40 Ha u 4O Prepare 3 Choose 06 005 Use tstatistic random and large sample l7lTAIyampQN Ford claims that its 2012 Focus gets 40 mpg p on the highway Does your Focus mpg di er from 40 mpg You chart your Focus over 3 5 randomly selected highway trips and nd it got 3 92 mpg with a standard deviation of 4 mpg 3 Compute to Prepare df 35 1 34 c t 395 40 211 Use df 30 t 2042 5 ltl lt l 5 Interpret UpValue lt 06 005 Cl Reject HO There is statistically signi cant evidence to conclude that your Focus does not get 40 mpg on average l7lTA139yampQN Right Tail Probability warm 3t Table 4 t Distributiorl Critical Values 1 Confidence Level 80 90 95 98 39 99 o 998 0 Right Tail Probability C11 1100 1050 1025 39 1010 1005 1001 1 3078 6314 12706 quot 31821 63656 318289 1 886 2920 4303 0 6965 9925 22328 1638 2353 3 182 4541 5841 10214 L Jl3 up u ur 0T 0T 40 50 60 P 100 1311 1310 1 1296 1 1282 1 697 1 614 1 H 1671 1 1 660 1645 1 1 1 94 1960 24157 2390 I 2704 2678 2660 3 3 Independent VS Dependent Paired Cl Two samples are dependent or paired if each observation from one group is coupled with a particular observation from the other group Cl Before and After Cl Identical Twins Cl Husband and Wife Cl Older Sibling and Younger Sibling Cl If there is no pairing then the samples are independent Independent Ind or Dependent Dep Cl Do Women perform better on average than men on their statistics nal 60 women and 40 men were surveyed gt Ind Cl 40 people s blood pressure Was measured before and after giving a public speech Does blood pressure change on average gt Dep Cl Is the average tip percent greater for dinner than lunch 35 Wait staff who worked both lunch and dinner looked at their receipts gt Dep Cl Are Americans more stressed out on average compared to the French 50 from each country were given a stress test Independent Samples Standard Error and Margin of Error 2 2 SEEST S Si V quot1 quot2 D Margin of Error f SE EST Cl Degrees of Freedom is approximately the smaller ofnl l and n2 1 Cl Use a computer or calculator for better accuracy Requirement for Independent Samples Cl Both samples are randomly taken and each observation is independent of any other CI The two samples are independent of each other not paired Cl Either both populations are Normally distributed or each sample size is greater than 30 z Example Independent Samples 3 38 randomly selected engineer majors and 42 randomly selected psychology majors were observed to estimate the difference in how long it takes to graduate E 51 SE 04 5 56 SP 05 Find a 95 confidence interval for the difference Cl The two population are independent since there is no pairing between each engineer major and each psychology major Cl The students were selected randomly independently and the sample sizes are both greater than 30 7E3IyampQN P 3 8 randomly selected engineer majors and 42 9M randomly seleetedpsyehology majors were observed ED to estimate the di erenee in how long it takes to graduate E 51 SE 04 5 56 SP 05 Find a 95 con dence interval for the di erenee 95 ran rlenre interval results J1 mean afpapulatian 1 H3 mean efpepulatian 2 J1 J3 mean difference without peeled variances Difference Sample l39II39lean Std Err JF L Limit U Limit J1J3 15 n1nn1124 F Ei3E TE3 2 D We are 95 con dent that the average time it takes to graduate is between 03 and 07 years longer for psychology majors than for engineer majors l7lTAIyampQN 3 8 randomly selected engineer majors and 42 randomly seleetedpsyehology majors were observed to estimate the di erenee in how long it takes to graduate Find a 95 con dence interval for the di erenee E 51 SE 04 5 56 5 05 CI for IE ap point estimate i ME 51 56 d tO4238 05242 05 i t 01 df min37 41 37 gt Use df 30 t 2042 D We are 95 con dent that the average time it takes to graduate is between 03 and 07 years longer for psychology majors than for engineer majors l7lT AIyampQN Hypothesis Test Paired Samples Cl Does eating chocolate improve memory 9 people were give a memory test before and after eating chocolate The data for the number of Words recalled out of 50 are shown below Assume Normality D Before 24 16 33 9 42 38 27 30 41 After 26 20 29 ll 42 39 25 34 44 1 Hypothesize UHO LLCliffO Ha ttdiff lt0 ggij n Does eating chocolate improve memory 9 people were give a memory test before and after eating chocolate The data for the number of words recalled out of 5 0 are Shown below Assume Normality 2 Prepare D 06 005 TStatistic large sample 4 Compute to Compare 3 Stat gt T Statistics gt Paired Iljmntheeie test results J1 J3 rneen dftne neired difference between Eiefdre end After Hr 3 IJ1 U 39339 HA3 IJ1 39 U2 f 393 Difference Sample Diff Std Err IF TStet Piurelue Eiefuire After 11111112 IIIZiIII43TEZi5 e i22e5en3 2 Does eating chocolate improve memory 9 people were w give a memory test before and after eating chocolate The data for the number of words recalled out of 5 0 are shown below Assume Normality Before 24 16 33 9 42 38 27 30 41 u After 26 20 29 11 42 39 25 34 44 Y230S21084 Diff 2 4 4 2 O 1 2 4 3 Xdiff111 Sdiff271 HO udiffO Ha udiff lt0 Test statistic observed null SE T 111 0 271V9 12288 DF 9 1 8 186 Since observed tvalue is not extreme enough l7EIyampQN 5 For onetail test our t1229 df 8 It lies so pValue is Table 1 I3 IHS1I1 ilJ L1 iiI lI1 Er11tiEaJ Fa1mE5 Il E l I J VJ 3 4 4i 4 535 each Ema 0 V as IiEI7iFItTi3 Fhliatiilirtgr EE 1 3l39T39S 1 3 J 1533 15iTi39iE 4l1 I 4 153 139 1333 II 3 E3 1amp1 EEEFEU 1353 2 I32 ETJ3915 11933 1 3395 1 aI39I L333 312 iima l lquot39a39iI Jil739ii39l39Ei EJIEE 2 25 E4L i39 E365 23 125 211 Eimun ELEER 59 45 E 3quot39W E3 f 3 143 299 0 3 35521 E39TquotiFi4 I 13 quotEE 325 5l I1L 3E 339E1 34399 311355 Ei ESIiIi El tii1 313239 22323 2ilE14 Ti J F3 5394 5211 El39E e 45if 39JZ 439 dr lili lquotEA139ampEILN s Does eating chocolate improve memory 9 people were give a memory test before and after eating chocolate The data for the number of words recalled out of 5 0 are shown below Assume Normality S i g quot it 4 Interpret Cl PValue 013 gt 005 06 Cl Fail to Reject HO 3 Conclusion There is insuf cient evidence to make a conclusion about the mean number of Words increasing after eating chocolate 1z Hypothesis Test Independent Samples Cl Do batteries last longer in colder climates than in Warmer ones The table shows some randomly selected battery lives in months Florida 19 22 25 21 l8 19 27 25 28 15 Montreal 37 49 22 26 47 41 38 37 1 Hypothesize Cl HO LLF LLM Cl Ha LLF lt LLM 0 Do batteries last longer in colder climates than in Warmer ones Cl Prepare UaOD5 Cl Independent Samples CAssume Normal Distributions AT Do batteries last longer in colder rliT r l H I 51 L climates than in Warmer ones 3 Compute to Compare 3 Stat gt T Statistics gt Two sample gt with data Ilyrputheeie test reeu e J1Ir39r IEEH39I 5fF5rid5 J3 meen 5fru15ntre5 J1 J3 meen difference H531 lJ539339 H53J1 J5quot 3939 FE Twmampl r a g wimdata Lfwitneutneeled verienceeju Difference Sample Mean Std Err 39IF T Stet 5 39quotquot39 39quotquot39 l 5151 15555 55555555 555155 455115151 5539quot 395 39 quot i39 FHIEIE HIP F 0 L 1 5 amuleiilc f I5ulm5amIi5r f 39 39IlI1I5IE 5HI1ii551539alJ X T F nnulilimlellnehilneimal 3 sampsm1 Eallnnell 5 OJ Iam1n LEW H5 l7lTAIyampQN P Do batteries last longer in colder climates than in Warmer ones Florida 19 22 25 21 18 19 27 25 Montreal 37 49 22 26 47 41 38 37 7 4 Interpret Cl PValue 00009 lt 005 on El Reject H0 3 Conclusion There is statistically signi cant evidence to support the claim that on average batteries last longer in Montreal than in Florida 28 15 P 6 General Formulas ag Cl Hypothesis Test Statistic estimated Value null hmothesis Value Test stat1st1o SE Cl Con dence Interval CI estimated Value i multiplier SE EST P Finding the pValue Given the Test Statistic Cl Left Tailed Hypothesis Cl Find the probability that a Value is less than the test statistic Cl Right Tailed Hypothesis Cl Find the probability that a Value is greater than the test statistic Cl Two Tailed Hypothesis Cl Make the test statistic negative Then find the probability that a Value is less than the test statistic Finally multiply by 2 E II I ll IEquot fr lquotEA139yampQN 3 Comparing CI and Hypothesis Tests Cl It can be concluded at the 5 level that the value is not the mean proportion or difference if D a value falls outside the 95 confidence interval 3 the pvalue is less than 005 CI A 95 or 90 99 confidence interval is equivalent to a twotailed test with on 005 01 001 when it comes to rejecting or failing to reject H0 U Hypothesis Tests and CI Example Cl Suppose that a hypothesis test H0 LL 80 Ha LL 75 80 Was done for the average height of male college basketball players If pvalue 002 can the 95 con dence interval contain 80 Cl N0 Since the pvalue lt 005 H0 is rejected 80 cannot be in the con dence interval i Hypothesis Test or Con dence Interval Which Should be Used Cl For onetailed testing hypothesis test Cl For two tailed testing either can be used Cl Con dence Intervals usually give more than hypothesis tests 3 CI gives a plausible range for the population value Cl The hypothesis test addresses the question of Whether H0 is false Chapter 9 Case Study ALWAYS4EARquotNNG S Epilepsy Drugs and Giving Birth Cl Four drugs are taken for epilepsy carbamazepine lamotrigine phenytoin and valproate Cl Three years after pregnant mothers took the medicine their children were given a IQ test Cl The New England Journal of Medicine reported that taking valproate increased the risk of impaired cognitive development 7EayampQN b 95 Con dence Intervals e 3 rilms mumps N quotam an 1 A 1 1I h Cd j I A lam T 1214 E Em J plm pHd mud valEamp p d 9 I t 391li1ItE1I3A39l i Cl These give us a visual comparison CI The valporate CI does not overlap with the lamotrigine CI Cl For better comparisons use con dence intervals for the difference between means Con dence Intervals for Differences fete E x will 1 h ll Lamniiriggiime val M 0 EH HE lmmg inin il al HE 3 None contain 0 A hypothesis test for a difference between the means will reject HO Cl There is statistically significant evidence to conclude that the mean IQ for children bom to mothers taking Valproate is different than for any of the other drugs l7EayampQN Chapter 9 Guided Exercise 1 Copyright 2013 Pearson Education Inc All rights reserved ALWAYS4EARquotNNG Is the Mean Body Temperature really c 986 CI A random sample of 10 independent healthy people showed body temperatures in degrees Fahrenheit as follows C 985 982 990 963 983 987 972 991 987 972 CI Use on 005 1 Hypothesize CI HO p 986 Cl Ha Ll 75 986 p a 2 Prepare Z iEIiitgr E quot 4 quot E E 39 IE I 11 5 TEFFiEiEfEl39lZJfE Cl Not far from normal Cl Sample collected randomly Cl Use the tstatistic vc 3 Compute to Compare a Ilyrpetheeis test results J mean afi arialale HE H EIEIEi Ha J a39 E1EIEi lufariahle Sample Mean Std Err DF T 5tat Fquot 1uralue Temperature 9312 IIIE3IZI539l EB Cltz 165 Cl p Va1ue S 013 Cl p Va1ue S 013 gt 005 X p p 4 Interpret Cl A random sample of 10 independent healthy people showed body temperatures in degrees Fahrenheit as follows 539 985 982 990 963 983 987 972 991 987 972 Cl p Value 013 gt 005 on El We cannot reject 986 as the population mean body temperature from these data at the 005 level l7lT AIyampQN Chapter 9 Guided Exercise 2 Copyright 2013 Pearson Education Inc All rights reserved ALWAYS4EARquotNNG ruumm1aimzaariz Y E1 B E39139iF llf quot EH 3 TIquot39l l HE li ET TIBET EH 3 33 l HE E 3 III39EfErtIE III1 391395I1u39 III Hlf quot E39FEIi1lEFrEE En139 li3939EEEE39IElEE liZI E3 Ei39i El fIr iE Iiil 1 ll3EEl1 I39 39I If ifiiuzmn I E us E T quots39Eil11 iE P quotiquotI LIIe 3 H5 Cl A twosample ttest for the number of televisions owned in households of random samples of students at two different community colleges Assume independence One of the schools is in a Wealthy community MC and the other DC is in a less Wealthy community pf C 1 Hypothesize WEE Cl Let uoe be the population mean number of televisions owned by families of students in the less Wealthy community OC and let ume be the population mean number of televisions owned by families of students at in the Wealthy community MC Cl HO uoe um Cl Ha uoe 75 um pp 2 Prepare Cl Choose an appropriate ttest Because the sample sizes are 30 the Normality condition of the t test is satis ed State the other conditions indicate Whether they hold and state the signi cance level that will be used Cl Use a ttest With two independent samples Cl The households were chosen randomly and independently Cl The population of all households of each type is more than 10 times the sample sizes 7EIyampQN G 3 Compute to Compare 3 1 4 I U E A 3 rmHags1 139f39IHEl aI1 E1 B EH Emu EEEIELF IquotIiean IIEIHET HE E3 T39l Jl I 3 Q7 HE E3 13 J1 L5i I 2 I39EfEr39IIi IIIIllLTa3939EIIIIH39Z 39 quot E39FEIi1lEIEIE IEEIIIII I39I39IEfEr39I1i H353 95 I El finr Eh39IEEIIEIgi iil il ILL3H1 39E v IHf39 E I1iI39EIIIInII ill I E a EJFE F39i39H nlIE 339iEi UtO95 Cl p Va1ue 0345 4 Interpret T533 Cl Since the p value 0345 is very large We fail to reject HO Cl At the 5 signi cance level We cannot reject the hypothesis that the mean number of televisions of all students in the Wealthier community is the same as the mean number of televisions of all students in the less Wealthy community Chapter 9 Guided Exercise 3 Copyright 2013 Pearson Education Inc All rights reserved ALWAYS4EARquotNNG Pulse Before and After Fright Pulse Elefnre Pulse After Cl Test the hypothesis that the 1 1 mean of college W0mer1 s pulse an 34 rates is higher after a fright pN 1E using 06 005 3 93 EiEl T2 D 1 Hypothesize 3 3 D H02 LLb fO1 6 uafter as 92 Cl Ha ttbefore gt uafter v EB El PD PD 2 Prepare Cl Choose a test Should it be a paired ttest or a twosample ttest Why Assume that the sample Was random and that the distribution of differences is sufficiently Normal Mention the level of significance Cl Paired ttest since before and after Cl Level of Significance on 005 Hg c 3 Compute to Compare Ilymethesis test results J1 J3 rneen eftne neireei difference ineiween F39Liee After and Pulse Eiefere quot39n 3 IJ1 H 393 quot39e3J1 U f 393 Difference Sample Diff Std Err 39IF TStet Piurelue F39Lieei391ifter F39Liee Eiefere i3Zi23IIITT iEEIII33E3 12 4EiIII iEZi4E Ct49 Cl p Va1ue C 0002 lt 005 0002 Summary statistics Int rprCt Cnlumn n Std Err 343IIIZ1 lET 3EiIIIZ1EI5 l 3TT4T2 l4 F39L1seEefurE I3 Pulse er I3 Cl Reject or do not reject HO Then Write a sentence that includes significant or significantly in it Report the sample mean pulse rate before the scream and the sample mean pulse rate after the scream Cl Reject HO There is statistically signi cant evidence to support the claim that mean blood pressure is higher after a fright Cl ubefore 748 Cl uafter S 837

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

#### "I made $350 in just two days after posting my first study guide."

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "Their 'Elite Notetakers' are making over $1,200/month in sales by creating high quality content that helps their classmates in a time of need."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.