Statistical Methods STA 2023
Popular in Course
Popular in Statistics
This 41 page Class Notes was uploaded by Winnifred Macejkovic on Thursday October 29, 2015. The Class Notes belongs to STA 2023 at Valencia College taught by Patrick Murphy in Fall. Since its upload, it has received 12 views. For similar materials see /class/231202/sta-2023-valencia-college in Statistics at Valencia College.
Reviews for Statistical Methods
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 10/29/15
NOTES tdistributions Hypothesis tests and con dence intervals involving mean u with unknown 039 Suppose that hypothesis tests and confidence intervals involving u are provided with a known standard deviation of the population 039 This would enable us to calculate the standard deviation of the 7 039 sample mean distribution s1nce SDx T where n is the sample Size We now assume that 039 is 7 not known as is most generally the case in real life and therefore replace SD with the Standard Error SI 2 This however causes a problem since the SI uctuates with the sample size n in such a way as to create of the Mean In other words we use SEQ where S1C is the standard deviation of the sample a different possible not normal distributions for each sample size n Each of these distributions is called a tdistribution Facts about tdistributions ellshaped curve mean 0 symmetric about 0 Each one is different from the standard normal curve As 71 gets larger the tdistributions get closer to the standard normal curve When n 2 30 the tdistributions are almost normal are very close to the standard normal curve degrees of freedom n l The degrees of freedom df indicate which tdistribution you will use For h othesis tests fc score is t 5 0 where 0 is the assumed population mean from the null hypothesis H 0 You will a d we will use a ttest statistic or tscore instead of a zscore The equation for the t also need to use tcdf instead of normalcdf on your calculator 2ND VARS tcdf Inputs are tcd LB UB df For confidence intervals we will use I instead of Z The t value depends on the tdistribution you are using and so depends on the degrees of freedom 11 l t cannot be obtained from your graphing calculator and so it must be looked up on a table in your textbook The formula for a confidence interval S for means with unknown 039 is x i I J Reguirements for using tdistributions l SRS The sample mean must be chosen from a random sample 2 Sufficiently large sample size a CASE 7 lt 15 The data should be very close to a Normal model Do not use t methods if there is strong skewness or outliers b CASE 15 S n lt 40 tmethods should work as long as the data is unimodal and reasonably symmetric make a histogram tm ethods should not be used in the presence of outliers or strong skewness CASE 40 S n tmethods can be used even in the presence of strong skewness or a few outliers In this case tmethods are called Robust 0 Matched Pairs Dependent Samples tdistibutions con dence intervals and hypothesis tests Matched Pairs are when two samples are taken from two populations such that the samples are dependent on each other and can be matched up This is not a twosample distribution rather we treat the differences of the two samples as one sample We therefore apply tprocedures to the one population of differences r Imagine that you want to compare two sets of golf club brands Titleist and Calloway Suppose that we want to show that Titleist hits farther than Calloway We do this by comparing 10 random pairs of golf clubs where each pair consists of one Titleist club and one Calloway club of the same type Each pair is hit by a different randomly chosen professional golfer this makes the pairs dependent since one pair can be matched up through the one golfer hitting those two clubs Measurements in the number of yards hit are recorded below ii x x M Q Is this sample of differences sufficient evidence to claim that Titleist hits farther than Calloway Choose a 0 5 ANSWER First note that the first population consists of the number of yards a Titleist club is hit and the second population consists of the number of yards a Calloway club is hit Therefore denote LlT the average number of yards a Titleist club hits Llc the average number of yards a Calloway club hits We will assume H0 u7 Llc and we want to show Ha LlT gt Llc It is important to note that we always subtracted Calloway from Titleist in our sample If we subtract off Llc then it follows that We will assume H0 yT Llc 0 and we want to show Ha yT Llc gt 0 Define the average of the population of differences as le LlT Llc Then We will assume H0 yd 0 and we want to show Ha yd gt 0 Steps conducted for the golf club Hypothesis Test 1 Let 01 ILlT luC39 Hogan 0 Ha yd gt0 2 a 005 Statistics taken from the difference column ONLY 7 15 Sr m 574 n 10 E 0 it 0 08264 Am Pvalue Pt 2 08264 tcdf08264 1E99 9 m 0215 Also STAT gt gt TESTS 2 TTest yields same results 5 Pvalue gt a 0215 gt 005 do NOT Reject H0 6 Our sample is NOT significant evidence that the average number of yards hit by a Titleist club is greater than the average number of yards hit by a Calloway club Some researchers studying vitamin C in a corn soy blend CSB were also interested in a similar commodity called wheat soy blend WSB Both of these commodities are mixed with other ingredients and cooked Loss of vitamin C as a result of this process was another concern of the researchers One preparation used in Haiti called gruel or bouillie in Creole can be made from WSB salt sugar milk banana and other optional items to improve the taste Samples of gruel prepared in Haitian households were collected The vitamin C content in milligrams per 100 grams of blend dry basis was measured before and after cooking Here are the resu ts m Carry out a significance test for these data at 5 to see if vitamin C is lost after cooking Steps conducted for the vitamin C and cooking Hypothesis Test 1 E Let d ILlB luA39 H0 yd 0 Ha yd gt 0 a 005 Statistics taken for differences only BEFOREiAFTER C 55 Sr m 3937 n 5 55 0 t x3 0 3937 m 3124 J Pvalue Pt 2 3124 tcdf3124 1E99 4 313E6 Also STAT gt gt TESTS 2 TTest yields same results Pvalue S a 313E 6 S 005 Reject H0 There is sufficient evidence that vitamin C is lost after cooking on average Difference between two means Independent Samples hvpothesis tests and quot1 intervals 1 Do piano lessons improve the spatialtemporal reasoning of preschool children Below is the score on a spatial reasoning test of 34 preschool children after six months of piano lessons Below is the score made on the same test by 44 preschool children in a control group 04021602102 Data from F H Rauscher et al Music training causes longterm enhancement of preschool children s spatialtemporal reasoning Neurological Research 19 1997 pp 28 a Display the data with histograms and boxplots and summarize the distributions in a paragraph You are interested in 0 Does the sample of each seem to be a normal distribution ANSWER The histogram of the piano lesson group is symmetric and unimodal The histogram of the control group is unimodal and only slightly skewed changing XSCL to exactly 2 reveals a symmetric shape Does either data set have outliers ANSWER The boxplot of the piano lesson group shows no outliers The boxplot of the control group reveals two outliers one at each end b De ne Population 1 The spatialtemporal reasoning scores of all preschool children that have taken six months of piano lessons Population 2 The spatialtemporal reasoning scores of all preschool children that have NOT taken six months of piano lessons The average spatialtemporal reasoning score of all preschool children that have taken six months of piano lessons le The average spatialtemporal reasoning score of all preschool children that have NOT taken six months of piano lessons 1 Make a table with the sample size sample mean and sample standard deviation for each of the two groups ANSWER Sample Size Sample Mean Sample Standard Deviation Population 1 n1 34 f1 3617 S1 30552 P0pulati0n 2 n2 44 f2 0386 S2 24229 d Are all of the assumptions satis ed so that we may use twosample t methods ANSWER Yes We will assume that both of the samples were drawn randomly Since n1 34 gt 15 and the rst sample has a normal shape according to its histogram and no outliers according to its boxplot then the conditions for the rst sample are met Even though there might be some slight skewness in the second sample according to its histogram and there are two outliers according to its boxplot having r12 44 gt 40 shows us that the conditions for the second sample are met In conclusion we note that the two populations described in pa1t b are independent of each other Now all of the assumptions are satis ed so that we may use twosample t methods Write out the siX steps in a test of hypothesis to prove that piano lessons improve spatial reasoning in preschool children Test at the 5 level ANSWER 1 H0 Iul 2 Ha 3 1 gt 2 2 a 005 3 Sample Size Sample Mean Sample Standard Deviation P0plllati0n 1 n1 34 f1 3617 S1 30552 Population 2 n2 44 f2 0386 S2 24229 4 Degrees of Freedom 305522 242292 2 34 44 1 305522 2 1 242292 2 34 1 34 44 1 44 df 61688 Test Statistic I 3617 0386 25059 305522 242292 34 44 Pvalue tcdf 5059 1E99 61688 2025E6 5 Pvalue a 2025E6 S 005 reject H 0 6 There is signi cant evidence that the average spatialtemporal reasoning score of all preschool children that have taken siX months of piano lessons is higher than The average spatialtemporal reasoning score of all preschool children that have NOT taken siX months of piano lessons In other words there is signi cant evidence that piano lessons improve spatial reasoning in preschool children 2 Refer to the previous exercise Give a 95 con dence interval that describes the comparison between the children who took piano lessons and the controls Interpret the con dence interval ANSWER We use df 60 from the t table in your textbook so that we get t 2000 2 2 3617 0386i 2000 195 451 The con dence interval takes f1 f2 and uses it to estimate 1 u2 So We are 95 con dent that the difference between the average test scores of our two populations of preschool children is between 195 points and 451 points Another way of saying this is that we are 95 con dent that the average score for preschool children with piano lessons is 195 to 451 points higher than preschool children without piano lessons E The Chapin Social Insight Test is a psychological test designed to measure how accurately the subject appraises other people The possible scores on the test range from 0 to 41 During the development of the Chapin test it was given to several different groups of people Here are the results for male and female college students majoring in the liberal arts Sex 71 x S1C 133 2534 505 162 2494 544 Group 1 Male 2 Female Do these data support the contention that female and male students differ in average social insight Test at the 5 level ANSWER 1 H0 Iul qu H 44 2 2 05 005 He 3 See table above 4 Test statistic is t 0654 Pvalue 0514 V Pvalue at 0514 gt 005 do NOT reject H0 6 There is not significant evidence of a gender difference in mean social insight score Music and memory Is it a good idea to listen to music when studying for a big test In a study conducted by some Statistics students 62 people were randomly assigned to listen to rap music music by Mozart or no music while attempting to memorize objects pictured on a page They were then asked to list all the objects they could remember Here are summary statistics for each group Create a 90 confidence interval for the mean difference in memory score between students who study to Mozart and those who listen to no music at all Interpret your interval ANSWER Let the Mozart scores be population 1 and no music be population 2 The confidence interval is 5351 0189 The confidence interval takes f1 f2 and uses it to estimate 1 uz So We are 90 confident that the difference between the mean number of objects remembered by those who listen to Mozart and the mean number of objects of those who listen to no music is between 0189 and 5352 objects In other words we are 90 confident that the mean number of objects remembered by those who listen to Mozart is between 0189 and 5352 objects lower than the mean of those who listened to no music olut 0n Praenee Demo Suppose that 33 percent ofwomen behave m the existence of alxens K100 women say Lhatthey beneve m ahens7 UT Role 1 100 women selectedquot 45 percent ofthemquot Role 2 3 91 SEW Paw V 1 R013 13033 SDQWUU47 Role 4 I3 19 24 28 33 38 42 47 SOLUTION P 19 24 28 33 38 42 47 nonnaled 45115910 33 0 047 0 00535 rmalD tr bu onAc g1 lutonl 1 Suppose famlly rneornes m a town are normally dlstnbuted wlth arnean o 01200 and a standard devranon of 0600 per month Wat ls the probablllty that a famlly has an rneorne between 01400 and 022507 SET Role 1 No sample ofslze greater than one was taken Famlly rneornes are the populanon Role2 X l a Role3 1200 7600 RoleM X 600 0 600 1200 1800 2400 3000 SOLUTION X 600 0 600 1200 1800 2400 3000 normaled 1400 2250 1200 600 0 3294 rmal D tr bu on AC Q Int on Q 2 An oplmon poll asks Are you afraldto go outslole at mglt wthln amlle ofyour Yes to thls questlon ls 0 4 Assume Lhatthe poll obtameol 20 answers randomly What percent ofsuch polls Wlth 20 responses have 10 ormore say Yes quot SET 11 Rolem 0 usesquot Uormoresay Yes39 propomon of all adultsquot Role2 3 Isp 3135 1 07 n Role3 140304 31313 Wanna RoleM I3 07 18 29 4 51 62 73 SOLUTION Havmg 10 or more out of a sample of 20 say Yes ls eqmvalentto havlng I 2 1n otherwords 13 2 o 5 I3 07 18 29 4 51 62 73 nomaled 0 5 11599 0 40110 0180655 or about 18 1 ofsueh polls olut 0n 3 Flndthe area under the curve between the zescores of72 andl SET 1112 Role 1 No sample ofslze greater than one was taken escoresquot Role 2 Z l 0 Role 3 4 0 ole 4 3210123 SOLUTION normalcd rZ 1 0 10 81859 olut 0n omm Fmd tlne probablllty tlnat tlne sample mean nose lengtln ls between 44mm and 46mm forrandom samples of36 adults SET Role 1 samples of 36 adultsquot sample mean nose lengthquot 2 a Role2 x n mm 7 0 4t 0 J 6 Role 3 2 45 RoleM x 42 43 44 45 46 47 48 SOLUTION 42 43 44 45 46 47 48 Bytne Empmcal Rule we see tlnattne answens about 68 beeause 44mm and 46mm ls exactly one standard dewauon eaen way on tlne sample meannose lengtln dsmbutlon dstnbuuon More preclsely we have normalcdf44 4o 45 l 0 68269 or 68 269 rmalD tr bu onAc g1 lutonl Welght of 32 ounees and a standard dewauon ofU 3 ounees When we look atthe mean Welght of 20 packages 68 of lhenn wlll be between what two values7 x131 Rolem meanwelgnlofzopaekagey Role 2 2 x Role 3 x 32 0 3 z 0 067 IE RoleM 3180 3187 3193 32 3207 3213 3220 SOLUTION X 3180 3187 3193 32 3207 3213 3220 Slmllarto the prevlous problem where we useol the Emplncal Rule we see that the answer ls When we look atthe meanweightn packagesf values abou168 n ofthem wlll be between 3193 ounees and 32 07 ounees rmal D tr bu on AC g gsolut on Arestaurateur antrerpates senmg about 180 people on a Frlday eyenrng anol belleves that about 20 ofthe patrons wlll order the chefs steak speelal How T t F m 1nle an explanauon ofwhat pretty surequot rneans to you 1112 Role 1 seryrng about 180 peoplequot 20 ofthe patronsquot Role2 3 43p SD05 Pan 1 1 Role 3 i0 2 SD3 0 030 Role 4 p 11 14 17 2 23 26 29 SOLUTION Here the populatron ls all patrons that eat at thatparueular restaurant on Fnday nrgnts The 180 people on thrs Frlday evenlngls a sarnple although not SRSl proporuon or 3 value What eoulol thsvaluebe7 Accordlng to the Emplneal Rule we know that about 99 7 of all 3 values oeeur betwe 0 ll and 0 29 1t ls hlghly unhkelythat 3 ls greaterthan 0 29 srnee ths happens only about 0 15 ofthe tarne 0 3 0 15 Therefore we would expeet that the propomon of the 180 patrons that orolerthe eher steak woulol be no rnore than 0 29 Slnee 29 of180 people ls 52 2 people we eoneluole that the restaurateur shoulolplan on servlng 53 of those rne s That way the restaurateur ean be pretty surequot that orolers ofehefs steak on Fnday evenlngs ean be fllled about 99 7 of Frlday evenlngs rmalD tr bu onAc g1 lutonl Fur n w m l e We take sueh aperson andreoluee hlm to a slngle number mathe usual operataons Fur n w males as anormal populatron wrth mu 150 em and slgma 30 em Ifwe F lr n m outsrole of 140 and1707 In other words what are the ehanees that hell be ether below 140 or hell be above 170 tn helgq That s whatwe mean by the word outsrole SET 1112 Role 1 sample nne northem Eur peanquot Hergnts ofnonhem European males are the populatron Role 2 Role 3 1le 730 Role 4 X 60 90 120 150 180 210 240 SOLUTION X 60 90 120 150 180 210 240 normaled 711599 140 150 30 0 36944 normaled 1701E99150300 2524 The probablllty thathls helght will fall outslde 140 e d 170 em ls 44 0 25249 0 62193 or about 62 2 ofthe time rmal D tr bu on AC g gsolut onsg 8 mdthe proporuon of observauons from the Standard Norrnal Dlsmbuuon wlneh are below 2 45 SET Role 1 No sample ofslze greater than one was taken Standard Norrnal Drsmbunonquot Rolem z z 0 a Role2 Role3 z 3 1 0 1 2 3 SOLUTION 2 10123 The Word propomonquot ls ths exerclse ean be mrsleadng slnce lt ls refemng to the percentage 1 declmal form of zrscores less than 2 45 Here the proporaon value ls equlvalentto the shaded area ofthe curve norrnaled 711599 2 45111 0 99286 Statistics Individuals and Variables Statistics is the collecting organizing and interpreting of information data Individuals are the objects described by a set of data 0 Population is all individuals of interest 0 Inferential Statistics Assume or infer something about the population based on data 0 Sample is a subset of the population 0 Descriptive Statistics Describing your sample when interpreting your data Variables are characteristics of individuals When your data is provided as a list of information you can identify the individuals of the story problem situation by organizing the data in a spreadsheet format In a spreadsheet or table format 0 rows are the individuals 0 columns are the variables A for ofa Salary Hours Worked Vaccinated Gender Tl e in per Week for HlNl TYPES OF VARIABLES In a spreadsheet consider the entries of the variable to determine which type the variable is 1 Quantitative Data is described using numbers such that the values of the numbers are used in calculations For example calculating the average test score They can also have units of measure attached to them such as mileshour gallons inches seconds degrees etc a GRAPHS for organizing include Dotplot Histogram Stemplot Boxplot etc 2 Categorical Data described usually with words but can be numbers if the values are not taken into consideration such as with SS Telephone Number Zip Code ISBN Driver s License etc a GRAPHS for organizing include Bar Graph Pie Chart etc We want to know the Distribution of a variable what values does the variable have and how often do these values occur For a quantitative variable minX is the smallest number in a list of data LaxX is the largest number in the list of data Spread is a descriptive phrase Spread is from minX to maxX Range is a number Ring maxX 7 minX Freguency is the number of individuals The total number of individuals in a sample is denoted as n Relative Frequency is the percent of individuals MeanMedian NOTE S n number of individuals in the data set The symbol 2 represents sum It is the capital letter S in the Greek alphabet and is pronounced sigma CENTER MEAN the average of a data set The sample mean is denoted by f and is pronounced X bar L 2 n n MEDIAN the 503911 percentile of a data set This says that 50 of the individuals have data values less than or equal to the median CASE 7 is odd The median is the middle number in the ordered list CASE 7 is even The median is the average of the two middle numbers in the ordered list The mean is affected by outliers and strong skewness The median is more resistant not affected to outliers and skewness However the mean provides more information than the median because it looks at the value of every data We prefer to use the mean as the measure of center except when we have outliers or definite skewness When that happens we use the median as the measure of center FIVENUMBER SUMlIARY minX Q1 Median Q3 maxX VARIANCE Loosely speaking the variance is the average distance the data values are distributed about the mean VARZLlx lz The formula squares the distances of the data from the mean so that we are adding positive numbers together in the sum otherwise we get 2 x 7C 2 0 We divide by n 1 this is the degrees of freedom as we shall see later on in inferential statistics because if you have n l ofthe differences then the last one or nth one is just the additive inverse since 2 x 7C 2 0 STANDARD DEVIATION the square root of the variance The sample standard deviation is denoted by S1C S ZooCy n l x Since we are summing up squares in the variance we take the square root of the VAR in order that the units of measure meters gallons etc are not squared This gives us the standard deviation of data about the mean To get the values of C SI and the F iveNumber Summary on your calculator press STAT go to the CALC menu and press 1 for lVARSTATS State the list you are using and then press ENTER Normal Distribution Notes Population all measurements or observations of interest Sample a subset ofthe population population V Notation Mean Standard Proportion Deviation S ample 7 S 00 x Population 4 U P u is pronounced mu samples 039 is pronounced sigma Parameter anumber that describes apopulation Examples are u 039 and p Statistic a number that describes a sample Examples are 97 SK 1 and n A normal curve visually describes a normal distribution We draw a mathematical model normal curve to represent a n rrnal population distribution The curve is then used as an approximation to real life normal distributions and is accurate enough for practical purposes Properties of a normal distribution Is metric so mean median center 0 Has one major eak and no outliers 0 Mathematical model shown to the right has the xaxis as a horizontal asymptote 0 Area under normal curve equals 100 This corresponds to 100 of the data falling below the curve For anormal distribution the area under a piece of the curve corresponds to the percentage of data along the xaxis under that curve For example ifthe area under a piece of the normal curve is 034 then the percentage of data under the curve would be 34 If X represents a quantitative variable that is normally distributed then we may write X N Nl 0 Example X N Nl 73 means that the variable X has a normal distribution with mean u 17 and standard deviation 0 3 EMPIRICAL RULE THE 6895997 RULE Applies only to normal distributions About 68 of the data are within one standard deviation from e mean In other words about 68 ofthe data fall in the interval u i 1039 About 95 of the data are within two standard deviations from the mean In other words about 95 ofthe data fall in the interval u i 2039 About 997 of the data are within three standard deviations from the mean In other words about 997 ofthe data fall in the interval 4 i 3039 X 47301 471039 u u ai u3039 472039 u2039 ORMALCD To calculate the percentage of data under the normal curve between a given lower bound LB and upper bound UB use norrnalcdf on your calculator E El and enter in the following norrnalcdf LB UB mean of the distribution standard deviation of the distribution If the mean and standard deviation are not entered then the calculator will assume that they are 0 and 1 respectively Extreme lower bound like negative infinity is lE99 on calculator Extreme upper bound like positive infinity is lE99 on calculator To get E on the calculator press m COMMA The comma button is above the El button H IFO H I PARAGRAPH WANT TO FWD 0 Percentage of data in decimal form between LB and UB 0 Proportion of data between LB and UB LB UB mean standard deviation 0 Probability of choosing one data between LB and UB In notation PLB lt X lt UB 0 Area under the curve between LB and UB Commute times 1 Assume that commute times of students to school from home are normally distributed with a mean of 17 minutes and a standard deviation of 3 minutes a What percent of students take between 11 and 23 minutes to commute to school b What is the probability that a randomly chosen student commutes between 17 and 23 minutes c Pat Murphy takes about 20 to 23 minutes to get to school What is the area under the curve between 20 to 23 minutes 1 What proportion of students take less than 14 minutes to get to school IQ scores 2 Suppose that a data set of IQ scores are normally distributed with a mean of 110 and a standard deviation of 10 In symbols X N Nl 1010 a What percentage of IQ scores are between 100 and 120 What percentage of IQ scores are between 100 and 113 What percentage of IQ scores are lower than 95 What is the probability of picking an IQ score between 90 and 110 Calculate P 112 ltX lt 115 What is the probability of an IQ score being above 130 What proportion of IQ scores are greater than 110 What is the area under the normal curve between IQ scores of 103 and 123 F IPWFDFpg 39 To calculate the unknown data X given the area to the le of X in decimal form use Invnorm on your man calculator El and enter in the following Invnorm Area to the le mean of the distribution standard deviation of the distribution If the mean and standard deviation are not entered then the calculator will assume that they are 0 and 1 respectively Use InvNorm when you are asked to find an unknown number along the horizontal axis that borders a given area WANT TO FIND INFO IN PARAGRAPH 0 Percentage of data in decimal form to the left of unknown data X X Inva 0 Proportion of data to the left of unknown data X 0 Probability of choosing one data to the left of unknown data X 0 Area under the curve to the left of unknown data X Use InvNorm when you are asked to find an unknown number along the horizontal axis that borders a given area Commute times continuedquot Exactly 10 of students take longer than Ricardo What is Ricardo s commute time f How long does it take you to commute to school if your commute time is the 75 percentile of all other students g What commute times are in the lower 20 of all commute times IQ scores continuedquot i What is the lowest IQ score you can have and still be in the top 15 j What IQ score do you have to have in order to be in the highest 1 of all IQ scores k What is the lowest IQ score that is above 60 of all the others 1 What is the value of Q1 of all the IQ scores Standardized scores are called zscores Every data X can be standardized into a zscore called z also zX Conversion formula data value 7 mean of the distribution In general 2 score standard dev1at1on of the distrlbution The zscore for an X data value is how many standard deviations that X value is away from the mean Zscores can be thought of as falling on the Standard Normal Curve the normal curve with mean 0 and standard deviation 1 In symbols 2 N N 01 The area the data value borders from its normal curve is the same as the area its zscore borders on the Standard Normal Curve zlt0 then Xlta zgt0 then Xgta 20 then Xu Z IQ scores continuedquot What is the zscore for an IQ score of95 What is the zscore for an IQ score of 127 What IQ score corresponds to a zscore of 7 26 What are the top 5 of zscores Is the zscore for Q3 negative positive or zero What is the zscore for an IQ score that is 31 standard deviations below the mean hawopa A Simple Random Sample used to obtain 7c provides an unbiased estimator of u In other words the mean of the sampling distribution of the TC values is u In notation Also the standard deviation of the sampling distribution of the 7c values is given by where 7 is the standard deviation for x and n is the sample size For each fixed sample size 11 there is a corresponding sample mean 7c distribution ConditionsReguirements When is the sampling distribution of 7c normal There are two ways this can happen If you have 0 random samples SRS 0 x has a normal distribution then 7c has a normal distribution CENTRAL LIMIT THEOREM If you have 0 random samples SRS 0 sample size suf ciently large n 2 30 0 nite population standard deviation 039 is a number then the sample means 7c values have an approximately normal distribution The greater the sample size the closer the corresponding 7c distribution resembles a normal distribution EXERCISES Vehicle speeds at a certain highway location are believed to have approximately a normal distribution with mean 60 mph and standard deviation 6 mph The speeds for a randomly selected sample of 36 vehicles will be recorded a Explain why the sampling distribution for sample means has a normal distribution Give numerical values for the mean and standard deviation of the sampling distribution of possible sample means for randomly selected samples of 36 from the population of vehicle speeds ANSWER ncewehzve SRS nndumly selmled sample um vehiles v znd x mm speeds haszn appmlmle numzld39sn39ihu nn linen sample meaasuasaaappmlmte mzld39 m Ammemaywe unvexil y Jul has in zppmximzle nnnnzldislrhul39nn line is by nungill RS n undume selected sample um vehiles vv 6 2 30 7 6 is nite a a mm m Mam new am Emmywelave 2 60 7 7 6 SD x i i l J2 J3 b Use the Emplneal Rule to nd values that flll ln the blanks ln the followlng sentence For arandom sample of 36 vehleles there ls a 95 hanee that mean vehlele speedln the sample will be between and mph ANSWER Thu 395 a 95chance dulmnnvehicle speed in the smplewillhehe39ween SE mph null 62 mph 57 se 59 Sn 6 52 53 e Sample speeds for arandom sample of 3s vehleles are measured at thlS loeataon andthe sampl mean ls 57 2 mph What ls the zeseore whleh corresponds mm 57 27 ANSWER fr1 572760 728 7i728 74 1 mama students mu Amencancnllege TemngACT campaate calleg ammauanhave mm a a buumthhmuan 5 magma devuum Wh 2 The scares 5 mum ex m 5 ans m pmbabdxtythat 5 mg studentnndamly chasm mm m Lhasa lung the a 7 w h 7 ANSWER my 52 127 125 245 3m 35 munde 2112991uss mm Whatare Lhzmunmd 5mm devuuanaf39he dnsmbuumafthz mung scmefmSEl studenqu Whyxsthe samplemun ambuuannmmm mswm Wellave 74 8 5 7 7 5 9 SD x i i U 8344 7 m as n mums znpichd nndumly m x ACT scans has n11sz dism39hulinn um smplz mm has 2 manual 5mm 1 minimum hmw m fulluwing SRS n 2 nndum smpk ursn sunkus n 0 2 30 a 5 9 is nitz 5 maquot emu has 2 mmzldislrhul39nn ishy the CenmlLimiI Thunm c Whntxsthzgmbabmt Lhanhz meanscm39e aftheseSEl studentsxsll magma swan AN 2 15m 1592 1777 125 1942 2n27 21m mmum 21 mum ya nnnz Normal Distribution Activig gin class Practice Demo Suppose that 33 percent of women believe in the existence of aliens If 100 women are selected at random what is the probability that more than 45 percent of them will say that they believe in aliens SET UP Role 1 100 women selected 45 percent ofthem Role 2 3 ma mpg Role 3 m033 SDQG 0332101 333 0047 Role 4 I3 19 24 28 33 38 42 47 Normal Distribution Activig gin class 1Suppose family incomes in a town are normally distributed with a mean of 1200 and a standard deviation of 600 per month What is the probability that a family has an income between 1400 and 2250 SET UP Role 1 No sample of size greater than one was taken Family incomes are the population Role 2 X l a Role 3 1200 a600 Role 4 X 600 0 600 1200 1800 2400 3000 Normal Distribution Activig gin class 2An opinion poll asks Are you afraid to go outside at night within a mile of your home because of crime Suppose that the proportion of all adults who would say Yes to this question is 04 Assume that the poll obtained 20 answers randomly What percent of such polls with 20 responses have 10 or more say Yes SET UP Role 1 20 responses 10 or more say Yes proportion of all adults Role 2 3 ma mpg Role 3 904 SDQG 0402 0110 Role 4 I3 07 18 29 4 51 62 73 Normal Distribution Activig gin class 3Find the area under the curve between the Zscores of 2 and 1 SET UP Role 1 No sample of size greater than one was taken Zscores Role 2 z 0 01 Role 3 0 71 Role 4 Normal Distribution Activig gin class 4Adult nose length is normally distributed with mean 45mm and standard deviation 6mm Find the probability that the sample mean nose length is between 44mm and 46mm for random samples of 36 adults SET UP Role 1 samples of 36 adults sample mean nose length Role 2 2 X SDG Role 3 4 Role 4 i 4 45 SD21 42 43 44 45 46 47 48 Normal Distribution Activig gin class 5The weight of a particular brand of cookies has a normal distribution with a mean weight of 32 ounces and a standard deviation of 03 ounces When we look at the mean weight of 20 packages 68 of them will be between what two values SET UP Role 1 mean weight of 20 packages Role 2 ma sn2 Role 3 432 m0067 Role 4 2 3180 3187 3193 32 3207 3213 3220 Normal Distribution Activig gin class 6A restaurateur anticipates serving about 180 people on a Friday evening and believes that about 20 of the patrons will order the chef s steak special How many of those meals should he plan on serving in order to be pretty sure of having enough steaks on hand to meet customer demand Justify your answer including an explanation of what pretty sure means to you SET UP Role 1 serving about 180 people 20 of the patrons Role 2 3 ma mpg Role 3 902 SDQG 022181302 0030 Role 4 I3 ll 14 17 2 23 26 29 Normal Distribution Activig gin class 7ln this example we will be interested in the heights of northern European males We take such a person and reduce him to a single number via the usual operations for measuring someone39s height Then we model the height of northern European males as a normal population with mu 150 cm and sigma 30 cm If we sample one northern European male what39s the probability that his height will fall oumide of l 40 and 170 In other words what are the chances that he39ll be either below 140 or he39ll be above 170 in height That39s what we mean by the word quotoumide SET UP Role 1 sample one northern European Heights of northern European males are the population Role 2 X l a Role 3 150 a30 Role 4 X 90 120 150 180 210 240 Ox 0 Normal Distribution Activig gin class 8Find the proportion of observations from the Standard Normal Distribution which are below 245 SET UP Role 1 No sample of size greater than one was taken Standard Normal Distribution Role 1 z 0 01 Role 2 0 71 Role 3 NOTES Confidence intervals from sample proportions Suppose that we are estimating an unknown population proportion p We do this by first finding a sample proportion f7 and then calculating is con dence interval In theory the formula for the confidence A A A 1 interval would be p i Z SDp or p i Z M where Z depends on the Con dence Level 11 But this formula involves the parameter p that we are trying to estimate We therefore use the Standard Error for f in other words replace p with f in the standard deviation formula instead of the Standard Deviation for f The formula for the confidence interval then becomes izSE Z can be found with the following where C is the given Con dence Level inputted in decimal form 2 InvNorm 0 5 LetL denote the left endpoint of the confidence interval and letR denote the right endpoint of the con dence interval Therefore the confidence interval written in interval notation would be LR The margin of error m is the distance that our confidence interval branches out in either direction from i Think of this as L m i LgtR Thus the confidence interval formula can be thought of as 151 i1 i i m It follows that the margin of error for these confidence intervals is Z 7 In other 11 Also 7 R L l 2 l INTERPRETATION We are 7 confident that p is between L and R Suppose that the con dence level is 95 Then there is a 95 probability that choosing any con dence interval will give us p in it because p is in 95 ofthe confidence intervals that we can create Once we x f1 and the confidence interval it generates then p is either in that particular con dence interval or it is not Once i1 and its confidence interval is xed then we say that we are 95 con dent that p is in that particular confidence interval That is in repeated trials we expect that 95 of the con dence intervals in the long run to contain p CONDITIONSREQUIREMENTS Before we are allowed to use the con dence interval formula certain conditions must be met The sample proportion i must be obtained from a Simple Random Sample SRS The number of successes in the sample is at least 5 preferably at least 10 ie Hi 2 10 The number of failures in the sample is at least 5 preferably at least 10 ie n1 2 10 39 The population size is at least ten times the sample size n As part of a quality improvement program your mailorder company is studying the process of lling customer orders According to company standards an order is shipped on time if it is sent within 3 working days of the time it is received You select an SRS of 100 of the 5000 orders received in the past month for an audit The audit reveals that 86 of these orders were shipped on time Find a 95 con dence interval for the true proportion of the month s orders that were shipped on time Also interpret the con dence interval ANSWER n 100 086 i 1960 0866686 I3 086 INTERPRETATION 1 I am 95 con dent that the proportion of all orders last month that were shipped on time is between 0792 and 0928 INTERPRETATION 2 I am 95 con dent that between 792 and 928 of all last month s orders were shipped on time A student organization wants to start a nightclub for students under the age of 21 To assess support for this proposal they will select an SRS of students and ask each respondent if he or she would patronize this type of establishment In the time available they were able to obtain 356 responses of which 70 said yes Calculate and interpret a 90 con dence interval for this situation What is the margin of error ANSWER n 356 070 i 1645 m 356 6 070 INTERPRETATION 1 I am 90 con dent that the proportion of all students under the age of 21 that would patronize this type of nightclub is between 066 and 074 INTERPRETATION 2 I am 90 con dent that between 66 and 74 of all students under the age of 21 would patronize this type of nightclub MARGIN OF ERROR WAY 1 WAY 2 0701 070 074 066 m 1645 356 m 2 0 03995 0 04 Hypothesis Tests Steps for a formal test of hygoth 1 State the null hypothesis H assume this until step 5 and the alternate hypothesis H want to show this hopefully in step 6 2 State the level of signi cance 06 comfort level of what you would call a rare event usually 1 or 5 3 Give resulm from a sample Generally this is n and f7 for proportions n and C and SI formeans 4 Use the results to calculate the test statistic po P01 zscore for proportions Z n 7 luo 3 J J Then use the test statistic to compute the Pvalue probability of getting the results from the sample based on assuming Hg is true tscore for means I 5 Reject or do not reject HO There are two outcomes for a hypothesis test as revealed by the table below Pvalue S 06 Pvalue gt 06 Sample is rare Sample is NOT rare Reject H0 Do NOT reject H0 There is significant There is NOT significant evidence to support Ha evidence to support Ha 6 Conclusion state in reallife terms with no math notation ConditionsRequiremenm for conducting a hypothesis test about a population proportion l The sample proportion i must be obtained from a random sample 2 npo 2 10 where p0 is the assumed population proportion from H 0 3 n1 po 2 10 where p0 is the assumed population proportion from H0 4 The population size is at least ten times the sample size n The conditionsrequirements for conducting a hypothesis test about a population mean are the same as the conditionsrequiremenm for using tdistributions see tdistributions handout NOTES sam le r0 ortions sam lin distributions A Simple Random Sample used to obtain f7 provides an unbiased estimator of p In other words the mean of the sampling distribution of the f numbers is p In notation p Also the standard deviation of the sampling distribution of the i1 numbers is given by where n is the sample size ConditionsReguirements Sample proportions f7 numbers have a normal sampling distribution with mean and standard deviation formulas above provided certain conditions are met l The sample proportion f1 must be obtained from a Simple Random Sample SRS 2 np 2 10 Note that some books say that it is sufficient for up 2 5 3 711 p 210 Note that some books say that it is sufficient for quotl p 2 5 4 The population size is at least ten times the sample size n EXERCISES 1 Suppose medical researchers think that 070 is the proportion of all teenagers with high blood pressure whose blood pressure would decrease if they took calcium supplemenm To test this theory the researchers plan a clinical trial experiment in which 200 random teenagers with high blood pressure will take regular calcium supplements a Assume 070 actually is the population proportion that would experience a decrease in blood pressure What are the numerical values of the mean and standard deviation of the sampling distribution of the sample proportions for samples of 200 teenagers b Use the results of part a to calculate an interval that will contain the sample proportion for about 997 of all samples of 200 teenagers c In the clinical trial 120 of the 200 teenagers taking calcium supplements experienced a decrease in blood pressure What is the value of i for this sample Is this value a parameter or a statistic d What is the probability of having 120 or fewer experience a decrease of blood pressure out of a sample of 200 teenagers taking calcium supplements e We have used the properties of a normal distribution Show that the conditionsrequirements are satisfied DO YOU DRINK THE CEREAL MILK A USA Today poll asked a random sample of 1012 US adults what they do with the milk in the bowl after they have eaten the cereal Of the respondenm 67 said that they drink it Suppose that 70 of US adulm actually drink the cereal milk a Explain why you may use the formulas for normal distributions of sample proportions b Find the probability of obtaining a sample of 1012 adulm in which 67 or more say they drink the cereal milk c What proportion would say that they drink the cereal milk if it is in the lower 20 of all samples of 1 012 U S adulm d What is the zscore for a sample of 1012 US adulm in which 75 of them said that they drink the cereal milk NOTES sample means sampling distributions A Simple Random Sample used to obtain 7c provides an unbiased estimator of u In other words the mean of the sampling distribution of the 7c values is u In notation Also the standard deviation of the sampling distribution of the 7c values is given by n where 7 is the standard deviation for x and n is the sample size For each fixed sample size 11 there is a corresponding sample mean 7c distribution ConditionsReguirements When is the sampling distribution of 7c normal There are two ways this can happen If you have 0 random samples SRS 0 x has a normal distribution then 7c has a normal distribution CENTRAL LIMIT THEOREM If you have 0 random samples SRS 0 sample size suf ciently large n 2 30 0 nite population standard deviation 039 is a number then the sample means 7c values have an approximately normal distribution The greater the sample size the closer the corresponding 7c distribution resembles a normal distribution EXERCISES Vehicle speeds at a certain highway location are believed to have approximately a normal distribution with mean 60 mph and standard deviation 6 mph The speeds for a randomly selected sample of 36 vehicles will be recorded Explain why the sampling distribution for sample means has a normal distribution Give numerical values for the mean and standard deviation of the sampling distribution of possible sample means for randomly selected samples of 36 from the population of vehicle speeds 37 0 Use the Empirical Rule to find values that fill in the blanks in the following sentence For a random sample of 36 vehicles there is a 95 chance that mean vehicle speed in the sample will be between 7 and if mph Sample speeds for a random sample of36 vehicles are measured at this location and the sample mean is 572 mph What is the Zscore which corresponds with 572 The scores of individual students on the American College Testing ACT composite college entrance examination have a normal distribution with mean 186 and standard deviation 59 a b 0 What is the probability that a single student randomly chosen from all those taking the test scores 21 or higher Now take a random sample of 50 students who took the test What are the mean and standard deviation of the distribution of the average score for 50 students Why is the sample mean distribution normal What is the probability that the mean score of these 50 students is 21 or higher