# STATISTICAL METHOD STA 291

UK

GPA 3.87

This 70 page Class Notes was uploaded by Helga Torp Sr. on Friday October 23, 2015. The Class Notes belongs to STA 291 at University of Kentucky taught by Staff in Fall.

Date Created: 10/23/15

STA 291 Lecture 16 8 Continuous Probability Distributions 81 Probability Density Functions 82 Normal Distributions STA 291 Lecture 16 1 There are many different shapes of continuous probability distributions We focus on one type the Normal distribution also known as Gaussian distribution or be shaped distribution STA 291 Lecture 16 Normal distributionsdensities Again this is a whole family of distributions indexed by mean and SD location and scale STA 291 Lecture 16 Different Normal Distributions x1 QTA 7Q1 IPP139IIFP1R The Normal Probability Distribution Normal distribution is perfectly symmetric and bellshaped Characterized by two parameters mean u and standard deviation 5 when printing this becomes 3 The 6895997 rule applies to the normal distribution That is the probability concentrated within 1 standard deviation of the mean is always 068 etc STA 291 Lecture 1 6 5 It is very common If you noticed the web page we use in last lecture to plot the binomial probabilities when n gets large you getting a plot approximately Normally shaped STA 291 Lecture 16 Standard Normal Distribution The standard normal 04 I distribution is the normal distribution with mean u0 and standard deviation 01 STA 291 Lecture 16 Nonstandard normal distribution Either mean LL i O Orthe SD 0 i 1 Or both In real life the normal distribution are often non standard STA 291 Lecture 16 Examples of normal random variables Public demand of gaswatereIectricity in a city Amount of Rain fall in a season Weightheight of a randomly selected adult female STA 291 Lecture 16 9 Examples of normal random variables cont Soup sold in a restaurant in a day Stock index value tomorrow STA 291 Lecture 16 Example of nonnormal probability distributions Income of a randomly selected family skewed only positive Price of a randomly selected house skewed only positive STA 291 Lecture 16 11 Example of nonnormal probability distributions Number of accidents in a week discrete Waiting time for a traffic light has a discrete value at O and only with positive values and no more than 3min etc STA 291 Lecture 16 12 Table 3 is for standard normal Convert nonstandard to standard Denote by X nonstandard normal Denote by Z standard normal STA 291 Lecture 16 Standard Normal Distribution When values from an arbitrary normal distribution are converted to z scores then they have a standard normal distribution The conversion is done by subtracting the mean u and then dividing by the standard deviation s STA 291 Lecture 16 14 Example Find the probability that a randomly selected female adult height is between the interval 161cm and 170cm Recall 216590 8 161 165 8 170 165 O5 0625 STA 291 Lecture 16 15 Example oont Therefore the probability is the same as a standard normal random variable Z between the interval O5 and 0625 P161lt X lt 170 P O5 lt Z lt 0625 STA 291 Lecture 16 16 This table presenm the area between the mean and the 2 same When 2136 the shady1 area is 047511 111225 Under the Standard Nurmal Curve Lina Z 0201 1111 0112 11113 11141 11115 11115 1107 009 111 011th l 1111120 1101611 l 1111239 042229 1111312 1111359 1 I 21395 0433 0422 1151 053 0295 3636 0214 122 51 L2 1129 391332 0321 119 D 30943 0921 1026 1064 1103 1141 113 1129 121 1255 1293 1331 1363 M IM39J 14311 151 14 1554 1591 1623 1amp64 1712 1736 1772 1393 JEM 1329 115 1915 I951 1935 2111 2054 2033 2123 2152 2191 mu 15 2257 2291 2324 2257 2339 2422 245 2136 251 2545 0 2530 2151 1 2642 2623 2204 2234 2761 2794 2323 2552 II3 2331 2911 2939 2962 2995 3023 3125 I 3973 Jil 3133 19 3159 3136 3212 3233 3254 3239 3315 33410 33155 3339 iri 7 zScores The z score for a value x of a random variable is the number of standard deviations that x is above u Ifx is below u then the z score is negative The z score is used to compare values from different normal distributions STA 291 Lecture 16 18 Calculating z Scores You need to know x p and 7 to calculatez x 0 Z STA 291 Lecture 16 19 Applet does the conversion automatically recommended The table 3 gives probability HOltZlta STA 291 Lecture 16 20 Tail Probabilities SAT Scores Mean500 SD 1OO The SAT score 700 has a z score of 22 The probability that a score is beyond 700 is the tail probability of Z beyond 2 STA 291 Lecture 16 21 zScores The z score can be used to compare values from different normal distributions SAT u500 s100 ACT u18 s6 Which is better 650 in the SAT or 26 in the ACT x u 650 500 ZS 0 Z 100 x u26 18 ACT 0 15 1333 Z STA 291 Lecture 16 22 Corresponding tail probabilities How many percent of total test scores have better SAT or ACT scores STA 291 Lecture 16 23 Typical Questions 1 Probability righthand lefthand twosided middle 2 zscore 3 Observation raw score To find probability use applet or Table 3 In transforming between 2 and 3 you need mean and standard deviation STA 291 Lecture 16 24 Finding z Values for Percentiles For a normal distribution how many standard deviations from the mean is the 90th percentile What is the value of 2 such that 090 probability is less than u z s If 09 probability is less than u z s then there is 04 probability between 0 and p z s because there is 05 probability less than 0 z128 The 90th percentile of a normal distribution is 128 standard deviations above the mean STA 291 Lecture 16 25 Quartiles of Normal Distributions Median 20 0 standard deviations above the mean Upper Quartile z 067 067 standard deviations above the mean Lower Quartile z 067 067 standard deviations below the mean STA 291 Lecture 16 26 In fact for any normal probability distributions the 90th percentile is always 128 SD above the mean the 95th percentile is SD above mean STA 291 Lecture 15 27 Finding zValues for TwoTail Probabilities What is the z value such that the probability is 01 that a normally distributed random variable falls more than 2 standard deviations above or below the mean Symmetry we need to find the z value such that the righttail probability is 005 more than 2 standard deviations above the mean z165 10 probability for a normally distributed random variable is outside 165 standard deviations from the mean and 90 is within 165 standard deviations from the mean STA 291 Lecture 16 28 Online Tool Normal Density Curve Use it to verify graphically the empirical rule find probabilities find percentiles and z values for one and twotailed probabilities STA 291 Lecture 16 29 Prelude to chap 9 Even the incomes are not normally distributed the average income of many randomly selected families is approximately normally distributed Average does the magic of making things normal transform to normal STA 291 Lecture 16 30 One more homework online STA 291 Lecture 16 31 Attendance Survey Question 16 On a 4 x6 index card Pease write down your name and section number Today s Question Which is better 650 in the SAT or 26 in the ACT STA 291 Lecture 16 32 Which ofthe following statements is false a A parameter is a descriptive measurement about a population b A statistic is a descriptive measurement about a sample c The term average is another name for the arithmetic mean d none ofthese choices The midterm test for a statistics course has a time limit of 1 hour However like most statistics exams this one was quite easy To assess how easy the professor recorded the amount of time taken by a sample of nine students to hand in their test papers The times to the nearest minute are 393329 45 6O 42 19 52 38 36 Compute the mean median and mode How often do you use the internet Response Frequency Every day 1027 A few times a 543 week Once a week 271 Lessthan 175 once a week Never 89 Find the mean median and mode if possible for the data given L F le histogram below is based on the number of doctors per 100000 FOW NK R n 1 SD SM w p is 33 I I I l i 100 130 260 340 420 500 530 850 Dre Based on this histogram which is larger the average or Mi 94 Mzknm 8 Median is larger b lmpossible to tell from the information given c They are equalv d Average is larger In a histogram the proportion ofthe total area which must be to the left ofthe median is a exactly 050 b less than 050 ifthe distribution is skewed to the left c more than 050 ifthe distribution is skewed to the right d between 025 and 060 ifthe distribution is symmetric and unimodal In a positively skewed distribution a the median equals the mean b the median is less than the mean c the median is larger than the mean d the mean median and mode are equal Examine the three samples listed below Without performing any calculations indicate which sample has the largest amount of variation and which sample has the smallest amount of variation Explain how you produced your answer a 1729121611 b 2218232017 c 243763929 Calculate the variance for each part and compare with your answer A friend calculates a variance and reports that it is 250 How do you know that he has made a serious calculation error Calculate the first second and third quartiles ofthe following samples as well as the range and IQR 1 5 8 2 9 5 3 7 4 2 7 4 10 4 3 5 2 105 147 153 177 159 122 100 141 139 185 139 151 147 A set of data whose histogram is bell shaped yields a mean of 50 and a standard deviation of4 Approximately what proportion of observations are between 46 and 54 are between 42 and 58 are less than 58 are greater than 54 Consider the data set 10 15 14 12 9 17 25 15 What is the IQR Is this data symmetric left skewed or right skewed compare mean and median Convert the stem and leaf plot to a box plot 5 HNNUUUUbbU39l 6 556 1124 789 Here are summary statistics for the average daily temperatures for a number of US cities in January and July for 1996 January July Minimum 12 63 Lower Quartile 27 72 Median 31 74 Upper Quartile 4O 77 Maximum 67 85 What is the IQR for July for January Here are summary statistics forthe average daily temperatures for a number of US cities in January and July for 1996 January Minimum 12 Lower Quartile 27 Median 31 Upper Quartile 40 Maximum 67 Which of the following is true July is 70 or above about 75 ofthe time January and July are both about 50 January is 70 or below about 25 of the time July 63 72 74 77 85 There is not enough information to answer the question A randomly chosen January temperature will be lower than a randomly chosen July temperature about 50 of the time A basketball player has the following points for seven games 20 25 32 18 19 22 and 30 Compute the standard deviation 0 The coefficient of variation is 0232 Does this imply that there is a strong linear association between the game played and the points scored 0 In perfectly symmetrical distributions which of the following statements is false The distance from Q1 to Q2 equals the distance from Q2 to Q3 The distance from the smallest observation to Q1 is the same as the distance from Q3 to the largest observation The distance from the smallest observation to Q2 is the same as the distance from Q2 to the largest observation The distance from Q1 to Q3 is half of the distance from the smallest to the largest observation A politician who is running for the office of mayor of a city with 25000 registered voters commissions a survey In the survey 48 of the 200 registered voters interviewed say they plan to vote for her What is the population of interest What is the sample Is the value 48 a parameter or a statistic Explain A manufacturer of computer chips claims that less than 10 of his products are defective When 1000 chips were drawn from a large production 75 were found to be defective What is the population of interest What is the sample What is the parameter What is the statistic Does the value 10 refer to the parameter or to the statistic Is the value 75 a parameter or a statistic The owner of a large fleet of taxis is trying to estimate his costs for the next year s operations One major cost is fuel purchases To estimate fuel purchases the owner needs to know the total distance his taxis will travel next year the cost of a gallon of fuel and the fuel mileage of his taxis The owner has been provided with the first two figures distance estimate and cost of a gallon of fuel However because of the high cost of gasoline the owner has recently converted his taxis to operate on propane He measured and recorded the propane mileage in miles per gallon for 50 taxis What is the population of interest What is the parameter the owner needs What is the sample What is the statistic Describe briefly how the statistic will produce the kind of information the owner wants For each ofthe following examples of data determine the type The weekly closing price of the stock of Amazoncom The month of highest vacancy rate at a La Quinta motel The size of soft drink small medium or large ordered by a sample of McDonald s customers The number of Toyotas imported monthly by the Unites States over the last 5 years The marks achieved by the students in a statistics course final exam marked out of 100 The number of babies born in a day is an example ofa Continuous variable Discrete variable A mechanic asks new clients the make of car they have The response is an example of a Quantitative variable Qualitative variable The placement office at a university regularly surveys the graduates 1 year after graduation and asks for the following information For each determine the type of data What is your occupation What is your income What degree did you obtain What is the amount of your student loan How would you rate the quality of instruction excellent very good good fair poor The incidence of heart disease in a sample of smokers is compared to the incidence of heart disease in a sample of non smokers What type of data collection method was used An observational study Sample survey Census Randomized experiment A soft drink manufacturer has been supplying its cola drink in bottles to grocery stores and in cans to small convenience stores The company is analyzing sales ofthis cola drink to determine which type of packaging is preferred by consumers Is this study observational or experimental Outline a better method for determining whether a store will be supplied with cola in bottles or in cans so that future sales data will be more helpful in assessing the preferred type of packaging Sampling errors are due to differences in the sample and the population 0 Nonsampling errors are any errors that are not sampling errors wording of questions data entry errors selection bias nonresponse errors anything else The fact that two polls about preference in the presidential race could have different results 32 support Whatsiss in poll A and 29 support Whatsiss in poll B or that the results don t match the true population proportion is an example of sampling error Atalk show call in poll is an example of Cluster sampling Simple random sampling Volunteer sampling The operations manager of a large plant with four departments wants to estimate the person hours lost per month due to accidents Describe a sampling plan that would be suitable for estimating the plantwide loss and for comparing departments The operations manager can select stratified random samples where the strata are the four departments SRS can be conducted in each department A statistics practitioner wants to estimate the mean age of children in his city Unfortunately he does not have a complete list of households Describe a sampling plan that would be suitable for his purposes Use cluster sampling letting each city block represent a cluster A researcher selects a sample from a list of all patients at one of five large hospitals in the following manner A patient is chosen from the first 25 on the list then every 25th patient from that point forward is selected 0 This is an example of a simple random sample stratified sample cluster sample systematic sample 0 Which is of more concern sampling error or nonsampling error Nonsampling error is more serious because unlike sampling error it cannot be diminished by taking a larger sample A random sample of individuals in a county has been selected for a survey You have been hired to conduct the survey and decide to use random digit dialing The first individual you contact tells you that she has no free time and you should call someone else What should you do and why Call someone else the next number is in the same county so it does not matter which individual is included Replace this individual with another one chosen randomly from the county one randomly chosen individual is as good as another Ignore this individual and reduce the sample size by one people who do not have a lot of free time probably will not respond truthfully anyway Try to get the person to respond anyway if you substitute someone else the sample will be biased in favor of people with more time Is it possible for a sample to yield better results than a census 0 Explain Yes A census will likely contain significantly more nonsampling errors than a carefully conducted sample survey Which of the following is the best description of selection bias It is a tendency for answers to survey questions to be wrong in some systematic way It is a systematic tendency in a survey to favor the inclusion of elementary units with particular characteristics while excluding other such units with other characteristics It is a systematic tendency for elementary units with particular characteristics not to contribute data in a survey while other such units with other characteristics do It is a tendency to ask questions that lead respondents inexorably to give particular predictable answers STA 291 Lecture 28 Final exam 830PM MondayApriI 28 location Memorial Hall not Memorial Coliseum STA 291 Lecture 28 Makeup final exam Tuesday 1030am to Come to 8th floor 849 POT for directions of room I will be there or look for instruction posted on door STA 291 Lecture 28 Last online Homework Last Online homework assignment Due Saturday April 26 at 11 PM Answer key will be posted April 27 STA 291 Lecture 28

