Introduction to Statistics I
Introduction to Statistics I STAT 1000
Popular in Course
Popular in Statistics
This 34 page Class Notes was uploaded by Blair Williamson on Thursday September 17, 2015. The Class Notes belongs to STAT 1000 at University of Connecticut taught by Staff in Fall. Since its upload, it has received 26 views. For similar materials see /class/205905/stat-1000-university-of-connecticut in Statistics at University of Connecticut.
Reviews for Introduction to Statistics I
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 09/17/15
2 Numerical Descriptive Measures of Central Tendency and Variability Measures of Central Tendency Usually we focus our attention on two aspects of measures of central location Measure of the central data point the average Measure of dispersion of the data about the average The central data point re ects the locations of all the actual data points Arithmetic mean This is the most popular and useful measure of central location Mean Sum of MeasurementsNumber of measurements Sample mean Population mean Here n is the sample size and N is the population size Example 1 The mean of the sample of six measurements 7 3 9 2 4 6 is given by 21x1 x1x2x3x4x5 x6 6 6 7c 45 Example 2 When many of the measurements have the same value the measurement can be summarized in a frequency table Suppose the number of children in a sample of 16 employees were recorded as follows NUMBER OF CHILDREN 0 l 2 NUMBER OF EMPLOYEES 3 4 7 ND l6 l6 212x 2 x1x2x16 3041722315 l6 39 The median The median of a set of measurements is the value that falls in the middle when the measurements are arranged in order of magnitude Example 3 Seven employee salaries in 1000s were recorded 28 60 26 32 30 26 29 Find the median salary Odd number of observations First sort the salaries Then locate the value in the middle 26262829303260 Suppose one employee s salary of 31000 was added to the group recorded before Find the median salary Even number of observations First sort the salaries Then locate the value in the middle 26262829303l3260 We have two values in the middle In this case median is 29302295 The mode 0 The mode of a set of measurements is the value that occurs most frequently 0 Set of data may have one mode or modal class or two or more modes Example 4 The manager of a men s store observes the waist size in inches of trousers sold yesterday 31 34 36 33 28 34 30 34 32 40 The mode of this data set is 34 in Relationship among Mean Median and Mode 0 If a distribution is symmetrical the mean median and mode coincide 0 If a distribution is nonsymmetrical and skewed to the left or to the right the three measures differ A positively skewed distribution skewed to the righ typically gives moale lt median lt mean A negatively skewed distribution skewed to the left typically gives mean lt median lt mode The geometric mean This is a measure of the average growth rate Let R denote the rate of return in period i il2n The geometric mean of the returns R1 R2 Rn is the constant RIg that produces the same terminal wealth at the end of period n as do the actual returns for the n periods ie R quot 1R1R2lt1Rn 1 Example 5 A firm s sales were 1000000 three years ago Sales have grown annually by 20 10 5 Find the geometric mean rate of growth in sales Solution Since RIg is the geometric mean 1 R3 l 2l ll 5 12540 Thus Rg 3 12111 05 10784 or 784 Measures of variability Measures of central location fail to tell the whole story about the distribution A question of interest still remains unanswered 0 How much spread out are the measurements about the average value The range 0 The range of a set of measurements is the difference between the largest and smallest measurements 0 Its major advantage is the ease with which it can be computed 0 Its major shortcoming is its failure to provide information on the dispersion of the values between the two end points The variance This measure of dispersion re ects the values of all the measurements The variance of a population of N measurements x1 x2 xN having a mean u is de ned as N Z 2 211x l N 039 O The variance of a sample of n measurements x1 x2 xquot having a mean u is de ned as 2 274ch f2 n l S The standard deviation of a set of measurements is the square root of the variance of the measurements Sample stande deviation 3 J37 Population standard deviation 039 O392 The standard deviation can be used to 0 compare the variability of several distributions 0 make a statement about the general shape of a distribution The empirical rule The empirical rule If a sample of measurements has a bellshaped distribution the interval 9 s f s contains approximately 68 of the measurements Tc 23 Tc 23 contains approximately 95 of the measurements Tc 33 TC 33 contains practically all the measurements Range approximation s m Range4 Chebyshev Theorem Given any set of observation and a number k 2 l the fraction of these observations that lie within k standard deviations of their mean is at least 1 l k2 Exercises 241 2 42 257 272 References 1 Chase and Bown General Statistics Hildebrand and Ott Statistical Thinking for Managers Keller and Warrack Statistics for Management anal Economics McClave Benson and Sincich A First Course In Business Statistics 59 6 Discrete Random Variables and Probability Distributions Random Variables and Probability Distributions O A random variable is a function that assigns a numerical value to each sample point in a sample space A random variable re ects the aspect of a random experiment that is of interest to us There are two types of random variables 1 Discrete random variable 2 Continuous random variable A random variable is discrete if it can assume only a countable number of values A random variable is continuous if it can assume an uncountable number of values Discrete Probability Distribution 0 A table formula or graph that lists all possible values a discrete random variable can assume together with associated probabilities is called a discrete probability distribution 0 To calculate PX x the probability that the random variable X assumes the value x add the probabilities of all the simple events for which X is equal to x Example 1 Find the probability distribution of the random variableX describing the number of heads that turnup when a coin is ipped twice Solution The possible values are 0 l and 2 PX0PTTl4 PX 1 PTHPHT 12 PX2PHHl4 Requirements of discrete probability distribution If a random variable can take values x1 then the following must be true 10 S pxl S l for all x1 2 Z poc 1 all x The probability distribution can be used to calculate probabilities of different events Probabilities as relative frequencies In practice often probabilities are estimated from relative frequencies Example 2 The numbers of cars a dealer is selling daily were recorded in the last 100 days This data was summarized as follows Daily sales Freguency 5 l 15 2 35 3 25 4 E 100 1 Estimate the probability distribution 2 State the probability of selling more than 2 cars a day Solution 1 The estimated probability distribution table 2 Pmore than 2 cars a day PX gt 2 PX 3 PX 4 25 20 45 Joint Distribution Consider two discrete random variables 0 X that takes values x1 x 2xn 0 Y that takes values y1y2ym If need to see the relation ship between these two random variables the distributions of X and Y separately are not going to provide the story For this we need the joint distribution table X x1 x2 x1 xquot PY y Y yl p11 p12 p11 pln p10 yZ p21 p22 p21 pin p20 y iv1 p12 P p p ym PM PM Pm pm pm PX x p1 p2 pa p 1 In this table 0 Goint probability p PY y and X x1 0 marginal probability of X pol p11 p21 pm PX x1 0 marginal probability of Y p p1 p2 pjn PY y Example 2a A fair coin is ipped three times LetXbe a number of heads Y is equal to 1 if the rst ip is head and it is 0 if it is tail Give the joint distribution of X and Y Solution X 0 l 2 3 PY y Y 0 18 28 18 0 12 1 0 18 28 18 12 PX x 18 38 38 18 1 Expected Value and Variance The expected value Given a discrete random variable X with values x1 that occur with probabilities pxl the expected value of X is EX 296110061 The expected value of a random variable X is the weighted average of the possible values it can assume where the weights are the corresponding probabilities of each xi Laws of Expected Value 0 Ec c o EcX cEX o EXYEXEY o E X Y E X E Y if X and Y are independent random variables Note X and Y are said to be independent if for any possible values x1 and y j of X and Y respectively we have PX xlY yPX xlPY yj ie in terms of the joint distribution P P Xi a Variance Let X be a discrete random variable with possible values x that occur with probabilities px and let EX u The variance of X is de ned to be VarX 02 EX 2 Z x ml pee The variance is the weighted average of the squared deviations of the values of X from their mean u where the weights are the corresponding probabilities of each x Properties of the variance 0 Varconst 0 VaraX a2 VarX VarXY VarX VarY2COVX Y where COVX Y EX EXY EY If X and Y are independent then C O VX Y 0 and VarX Y VarX VarY If VarX 0 then X const Standard deviation The standard deviation of a random variable X denoted 039 is the positive square root of the variance of X Example 3 The total number of cars to be sold next week is described by the following probability distribution x 0 1 2 3 4 px 05 15 35 25 20 Determine the expected value and standard deviation of random variable X the number ofcars sold Solution EX0gtlt051gtlt152gtlt353gtlt254gtlt2024 VarX 0 242 X 05 1 242 X 15 2 242 X 35 3 242 X 25 4 242 X 2 124 039 lVarX V124 1114 An expected value of f X is EfXl ZfX39PX The Binomial Distribution Bernoulli trial The Bernoulli trial can result in only one out of two outcomes Typical cases where the Bernoulli trial applies 0 A coin ipped results in heads or tails 0 An election candidate wins or loses 0 An employee is male or female 0 A car uses 87 octane gasoline or another gasoline Binomial experiment There are n Bernoulli trials n is nite and xed Each trial can result in a success or a failure The probability p of success is the same for all the trials All the trials of the experiment are independent leP N Binomial Random Variable The binomial random variable counts the number of successes in II trials of the binomial experiment By de nition this is a discrete random variable Calculating the Binomial Probability In general the binomial probability is calculated by PX x 1796 CEPXU P where n n xn x is the number of different ways of choosing x objects from a total of n objects Here n l X 2 X 3 X X n by convention 0 l Example 4 Suppose that we have a group of 4 people say A B C and D How many different pairs can we select from this group Solution The answer is 4 choose 2 4 4 l X 2 X 3 X 4 C2 2gtlt2 lgtlt2gtltlgtlt2 Indeed we have six pairs AB AC AD BC BD and CD Mean and variance of binomial random variable EXquotP V0VX quot100 P Poisson Distribution The Poisson experiment typically ts cases of rare events that occur over a xed amount of time or within a speci ed region Typical cases The number of errors a typist makes per page The number of customers entering a service station per hour The number of telephone calls received by a switchboard per hour Poisson Experiment Properties of the Poisson experiment 0 The number of successes events that occur in a certain time interval is independent of the number of successes that occur in another nonoverlapping time interval 0 The average number of a success in a certain time interval is the same for all time intervals of the same size proportional to the length of the interval 0 The probability that two or more successes will occur in an interval approaches zero as the interval becomes smaller The Poisson Random Variable The Poisson variable indicates the number of successes that occur during a given time interval or in a speci c region in a Poisson experiment Probability Distribution of the Poisson Random Variable e PXxpx 39 x012 x EX VarX u Poisson Approximation of the Binomial 0 When n is very large binomial probability table may not be available If p is very small p lt 05 we can approximate the binomial probabilities using the Poisson distribution 0 More speci cally we have the following approximation PX x PX where the binomial distribution with parameters n and p and Poisson with np Binomialpn Poisson np x Exercises p 203 4277429 p 216 44451 References 1 Chase and Bown General Statistics 2 Hildebrand and Ott Statistical Thinking for Managers 3 Keller and Warrack Statistics for Management and Economics 4 McClave Benson and Sincich A First Course In Business Statistics 5 Moore The Basic Practice of Statistics Exercises 1 Ten thousand Instant Money lottery tickets were sold One ticket has a face value of 1000 5 tickets have a face value of 500 each 20 tickets are worth 100 each 500 are worth 1 each and the rest are losers Let X face value of aticket that you buy Find the probability distribution for X 2 A dentist has determined that the number of patients X treated in an hour is described by the probability distribution given here Find a the mean b the variance and c the stande deviation I x I l I 2 I 3 I 4 I I PXxI 215 1015 215 115 3 An altered die has one dot on one face two dots on three faces and three dots on two faces The die is to be tossed once Let X be the number of dots on the upturned face Find the mean and variance of X 4 A card is to be selected from an ordinary deck of 52 cards Suppose that a casino will pay 10 if you select an ace If you fail to select an ace you are required to V39 05 gt1 9 50 pay the casino 1 a If you play this game once how much money does the casino expect to win b If you play the game 26 times how much money does the casino expect to win A bus company is interested in two potential contracts one for an express and the other for local stops The probabilities that the bids will be accepted are 70 and 50 with costs of 500 and 750 respectively The estimated total incomes are 6000 and 10000 respectively If the company is allowed only one bid which bid should it enter In the game of craps a player rolls two dice If the rst roll results in a sum of 7 or 11 the player wins If the rst roll results in a 2 3 or 12 the player loses If the sum on the rst roll is 4 5 6 8 9 or 10 the player keeps rolling until he throws a 7 or the original value If the outcome is a 7 the player loses If it is the original value the player wins The probability that a player will win is 493 Suppose that a player pays 5 to a casino if he loses and is paid 4 for a win What is the expected loss for the player if he plays a one game b ten games Consider Megabucks again a lottery game conducted by the Massachusetts State Lottery Commission Megabucks consists of selecting six numbers from the 42 numbers 1234 4142 no repetitions and order does not matter The commission selects by chance the six winning numbers You pay 1 to play You get a free ticket if three of your numbers match three of the winning numbers 75 for matching four numbers 1500 for matching ve numbers Assume you get 1000000 if you match all six numbers The sample space S consists of all possible sixnumber combinations that can be selected It can be shown that nS 5245786 The number of elements in S that match 0 1 2 345 or 6 winning numbers is given here What is your expected loss Number of Matches Number of Possibilities 0 1947792 1 2261952 2 883575 3 142800 4 9450 5 216 6 1 A high school class decides to raise some money by conducting a raf e The students plan to sell 2000 tickets at 1 apiece They will give one prize of 100 two prizes of 50 and three prizes of 25 If you plan to purchase one ticket what are your expected net winnings Hint The probability of getting the 100 ticket is 12000 of getting a 50 ticket is 22000 and of getting a 25 ticket is 32000 Forty percent of the student body at a large university are in favor of a ban on drinking in the dormitories Suppose 15 students are to be randomly selected Find N LA 4 UI ON 00 the probability that a Seven favor the ban b Fewer than 4 favor the ban c More than 2 favor the ban Sixty percent of the voters in a large town are opposed to a proposed development If 20 voters are selected at random find the probability that a Ten are opposed to the proposed development b More than 13 are opposed to the proposed development c Fewer than 10 are opposed to the proposed development Sixty percent of the Framingham Heart Study participants have a total serum cholesterol level below 238 An HMO statistician is interested in interviewing 15 people If 15 people are randomly selected find the probability that a Eight people will have a total serum cholesterol level below 238 b Five people or more have a total serum cholesterol level below 238 It has been reported that 30 of the population of women who had given birth in the last year and had less than a high school education were in the labor force In a random sample of 25 from this population find the probability that a Ten are in the labor force b The sample will contain 456 or 7 women in the labor force A retailer decides that he will reject a large shipment of light bulbs if there is more than one defective bulb in a sample of size 10 If the defective rate is 10 what is the probability that the retailer will reject the shipment It is estimated that about 10 of Caucasians aged 4574 have diabetes In a random 11 sample of 19 from this population find the probability that a Three will have diabetes b Fewer than 2 will have diabetes A person has a 5 chance of winning a free ticket in a state lottery If she plays the game 25 times what is the probability she will win one or more free tickets A screening examination is required of all applicants for a technical writing position The examination consists of 16 questions Each question has five choices consisting of the correct answer and four incorrect answers A curious applicant wonders about some probabilities if she were to randomly guess at each question a What is the probability of getting 3 correct by guessing b What is the probability of getting 2 or more correct by guessing c If 50 applicants took the exam and each guessed randomly at all the questions what would you guess the mean number of correct answers to be It has been reported that about 25 of all resumes contain a major fabrication Eighteen applicants for an actuarial position submitted resumes Find the probability that five resumes will contain a major fabrication Thirtyeight percent of people have blood type A In a random sample of 20 people find the probability that a One will have type A b Two or three will 0 have type A c One or more will not have type A Suppose 100 samples of size 20 were selected and the number of people in each sample with type A was recorded d What should be the approximate mean number of people with type A Thirtysix percent of more than 348000 college students reported being frequently bored in class during their last year in high school In a random sample of 17 from this population nd the probability that 3 students would say they were frequently bored in class Ten percent of people have blood type B In a random sample of 20 people nd the probability that a Three will have type B b More than 5 will have type B c Fewer than 2 will have type B Suppose 100 samples of size 20 were selected and the number of people in each sample with type B was recorded d What should be the approximate mean number of people with type B 3 Measures of Relative Standing and BOX Plots Percentile The mth percentile of a set of measurements is the value for which 0 at least 100m of the measurements are less or equal than that value 0 at least 100lm of all the measurements are greater or equal than that value Commonly used percentiles 0 First lower quartile Q1 25th percentile 0 Second middle quartileQ2 50th percentile 0 Third quartile Q 3 75th percentile Algorithm for nding 100mth percentile Assume that we have n observation in our data set 1 Sort you data X1X2Xn I gt X1S Xm S S X 2 Compute rough location of the percentile locator in sorted sequence of observations l m X n XlXl1 3 If l is a whole number then Pm f 1f1t is not then round l up to Ill and Pm XM Box Plots Interquartile Range IQR Q3 Q1 Innerfences Q1 15IQR Q315IQR Outerfences Q1 3IQR Q33IQR Box plot is a pictorial display that provides the main descriptive measures of the measurement set L the largest measurement inside the inner fences Q 3 The upper quartile Q 2 The median Q1 The lower quartile S The smallest measurement inside the inner fences A potential outlier is a value located at a distance of more than 15 QR from the box An outlier is a value located at a distance of more than 3IQR from the box Example I Suppose that the return on investment for 21 companies in a certain industry for a certain year is 246 26 24 27 38 56 59 67 70 72 75 80 82 85 86 88 90 92 97100 me a boxplot of these data Solution n 21 median is the eleventh score 75 The 25 11 percentile is 56 The 75 11 With percentile is 88 Thus IQR 88 56 32 The fences are lower outer fence 56 332 lower inner fence 6 7153 8 upper inner fence 88 1532 136 upper outer fence 88 332 184 The fence test identi es two outliers 246 and 205 and one potential outlier 26 The smallest and largest nonoutliers are 24 and 10 20 7 10 7 E 3 0 039 05 10 7 20 7 i 30 7 z scores T he sample z score for measurement x is x7 z s If the absolute value of the sample z score is greater than 3 the corresponding measurement is an outlier Scatter Diagrams O en we are interested in the relationships between two quantitative variables Typical Patterns 0 Positive linear relationship 127 117 107 97 o 8 0 o gt 77 O 0 67 o 57 o o 4 37 0 27 x x x 0 5 10 0 N0 relationship 0 Negative linear relationship 10 20 7 0 Negative nonlinear relationship 0 7 o 0 O O O 10 0 gt 0 O O O O O O 20 7 o l l l 0 5 10 X 0 Nonlinear concave relationship 100 7 O 50 7 gt O 0 0 0 7 5quot l l l 0 5 10 Measures of Association Two numerical measures are presented for the description of linear relationship between two variables depicted in the scatter diagram 0 Covariance is there any pattern to the way two variables move together 0 Correlation coef cient how strong is the linear relationship between two variables Covariance Z x Population covariance COVX Y W Sample covariance covX Y W n LLX uy is the population mean ofthe variable XY N is the population size n is the sample size 0 If the two variables move the same direction both increase or both decrease the covariance is a large positive number 0 If the two variables move in two opposite directions one increases when the other one decreases the covariance is a large negative number 0 If the two variables are unrelated the covariance will be close to zero The coef cient of correlation COV X Y Population coef cient of correlation p oXO39y X Y Sample coef cient of correlation r M sxsy This coef cient answers the question How strong is the association between X and Y 0 Close to 1 strong positive linear relationship 0 Close to 0 no linear relationship 0 Close to l strong negative linear relationship References 1 Chase and Bown General Statistics 2 Hildebrand and Ott Statistical Thinking for Managers 3 Keller and Warrack Statistics for Management and Economics 4 McClave Benson and Sincich A First Course In Business Statistics 10 Introduction to Estimation Confidence Interval for a Population Mean Statistical inference is the process by which we acquire information about populations from samples There are two procedures for making inferences Estimation Hypotheses testing Concepts of Estimation The objective of estimation is to determine the value of a population parameter on the basis of a sample statistic There are two types of estimators Point Estimator Interval estimator Point Estimator A point estimator draws inference about a population by estimating the value of unknown parameter using a single value or a point An unbiased estimator of a population parameter is an estimator whose expected value is equal to that parameter An unbiased estimator is said to be consistent if difference between estimator and the parameter grows smaller as sample size grows larger If there are two unbiased estimators of a parameter the one whose variance is smaller is said to be relatively ef cient Interval Estimator An interval estimator draws inferences about population by estimating the value of an unknown parameter using an interval Estimating the Population Mean When the Population Standard Deviation is Known 39 How is an interval estimator produced from a sampling distribution To estimate u a sample of size n is drawn from the population and its mean X is calculated Under certain conditions X is normally distributed or approximately normally distributed thus is standard normally distributed random variables Using this fact one can show that 039 i 039 P u za 7ltXltyza 7 1 a xZ J2 and i J i i 039 PX z2 ltLzltXz2 1 a a a J Interval Estimator of u i 039 i 039 X z 7 X 21 7 aZ In 2 n The probability 1 a is called the con dence level X 21 m is called the lower con dence limit n X 21 m J is called the upper con dence limit We often represent the interval estimator as 039 aZ J Three commonly used con dence levels Xiz 1 0 a 90 10 Interpreting the interval estimate 0 Population mean u is a parameter not a random variable 0 Note that LCL and UCL are random variables 0 Thus it is correct to state that there is l a chance that LCL will be less than u and UCL will be greater than u Selecting the Sample size 0 We can control the width of the interval estimate by changing the sample size 0 Thus we determine the interval width rst and derive the required sample size 0 The phrase estimate the mean to within W units translates to an interval estimate of the form J r W o The required sample size to produce an interval estimator Y i W with l a con dence is ZaO39 2 M o If the standard deviation is unknown we can use socalled historical value of 039 based on past experience or the value which is given by the range approximation Example 1 A research project for an insurance company wishes to investigate the mean value of the personal property held by urban apartment renters A previous study suggested that the population standard deviation should be roughly 10000 A 95 con dence interval with width of 1000 a plus or minus of 500 is desired How large a sample must be taken to obtain such a con dence interval Solution The required sample size 2 210 2 quot2 g 196gtlt10000 m1537 500 Estimating the Population Mean When the Population Standard Deviation is Unknown 1 When 039 is unknown the real world situation and the sample size n is large more than 30 the con dence interval is given by Y i za2 i where s is the sample n standard deviation We do not need any additional assumption because the Central Limit Theorem guarantees the normality of the sample mean If the sample size is small less than 30 the con dence interval is equal to N X it i where t is based on n 1 degrees offreedom A random sample is aZmrl a n 2 assumed to be taken from a normal population Example 2 The caffeine content in milligrams of a random sample of 50 cups of black coffee dispensed by a new machine is measured The mean and standard deviation are 100 milligrams and 71 milligrams respectively Construct a 98 CI for the true mean caffeine content per cup dispensed by the machine Solution The sample size n 50 gt 30 confidence level is 1001 a 98 therefore or 02 and 201 233 X 100 and s 71 Thus 98 Cl is given by Yiz i100i233l100i234 n aZ Example 3 A furniture mover calculates the actual weight of shipment as a proportion of estimated weight for a sample of 25 recentjobs The sample mean is 113 and the sample standard deviation is 16 Calculate a 95 CI for the population mean Solution The sample size n 25 S 30 confidence level is 1001 a 95 therefore or 05 and to 24 2064 X 113 and s 16 Thus 95 Cl is given by Etta27H 113r 2064 113r066 We have to check normality assumption Exercises p 299 510 p 300 515 p 301 519 p 310 5277528 p 326 5657566 References 1 Chase and Bown General Statistics Hildebrand and Ott Statistical Thinking for Managers Keller and Warrack Statistics for Management anal Economics McClave Benson and Sincich A First Course In Business Statistics 59 Exercises 1 A physician wanted to estimate the mean length of time u that a patient had to wait to see him after arriving at the of ce A random sample of 50 patients showed a mean waiting time of 234 minutes and a standard deviation of 71 minutes Find a 95 con dence interval for u 2 The owner of a small computer company wished to estimate the mean download rate u for the company39s update to one of its programs Forty ve downloads gave a mean rate of 31 and a standard deviation of 16 kilobits per second Find a 95 con dence interval for u 3 Barbara wanted to estimate the mean connection time u to the Internet She connected 38 times with a mean of 42 and a standard deviation of 52 minutess Find a 90 con dence interval for u 4 A study was done to estimate the mean annual growth Ll in a population of Conuspennaceus trees in Hawaii For those with an initial size of 241260 centimeters a sample of size 33 yielded a mean annual growth of 72 centimeter and a standard deviation of 31 centimeter Find a 90 con dence interval for the population mean u of annual growth of those trees with an initial size of 241260 centimeters 5 Sixty pieces ofa plastic are randomly selected and the breaking strength of each piece is recorded in pounds per square inch Suppose that Y 26 and s 15 pounds per square inch Find a 99 con dence interval for the mean breaking strength u If you were to obtain 200 99 con dence intervals for Ll about how many can be expected to contain Ll 6 A city assessor wished to estimate the mean income per household The previous mean income was 25300 A random sample of 40 households in the city showed a mean income of 29400 and a standard deviation of 6325 a Find a 95 con dence interval for u the population mean income per household in the city b Based on your answer in part a would the assessor conclude that the mean income had increased over the previous estimate of 25300 7 An electrical company tested a new type of oil to be used in its transformers Thirty ve readings of dielectric strength were obtained Dielectric strength is the potential in kilovolts per centimeter of thickness necessary to cause a disruptive discharge of electricity through an insulator The results of the test gave Y 77 kV s 8 kV a Find a 95 con dence interval for the mean dielectric strength of the oil b The old transformer oil had a mean dielectric strength of 75 kV Would you conclude that the new oil has a higher mean dielectric strength on the basis of your answer in part a 8 Noise level tests were done on 40 new light rail vehicles LRVsthe new name for trolley cars The results of the test gave a sample mean of 65 decibels and a sample standard deviation of 6 decibels a Find a 90 con dence interval for the mean decibel level u for this type of transit vehicle b Based on your answer in part a would you conclude that the new LRVs are quieter on the average than oldertype trolley cars that had a mean decibel level of 80 9 An educator wishes to estimate the mean number of hours u that 10yearold children in a city watch television per day How large a sample is needed if the educator wants to estimate u to within 5 hour with 90 con dence Use 7 175 10 How many households in a large town should be randomly sampled to estimate the mean number of dollars spent per household per week on food supplies to within 3 with 80 con dence Assume a standard deviation of 15 11 Consider a population with unknown mean u and population standard deviation 7 20 a How large a sample size is needed to estimate u to within four units with 90 con dence b Suppose that you wanted to estimate u to within four units with 95 con dence Without calculating would the sample size required be larger or smaller than that found in part a c Suppose that you wanted to estimate u to within two units with 90 con dence Without calculating would the sample size required be larger or smaller than that found in part a 12 A production manager noticed that the mean time to complete a job was 160 minutes The manager made some changes in the production process in an attempt to reduce the mean time to nish the job A stemandleaf plot of a sample of 11 times is as follows 1319 14125 15101356 16124 1710 Note 1415 145 minutes The sample mean and standard deviation are 15336 and 947 respectively Construct a 95 con dence interval for the mean time 13 A manufacturer claimed that her company39s product would not require repair for more than 18 months on the average A sample of 12 customers who had purchased her product provided the following information on how many months elapsed before repair was needed on their purchases 165170175180185185185190190195 200 205 a Construct a stemandleafplot let 18 l 5 185 b The sample mean and standard deviation times are 18542 and 1177 respectively Construct a 95 confidence interval for u Does the CI support the belief that the mean repair time is more than 18 months 14 A psychologist wanted to estimate the mean selfesteem level u of his patients Fourteen patients were given a test designed to measure selfesteem The sample mean and standard deviation were 253 and 53 respectively Assume the population is approximately normal a Construct a 98 confidence interval for u b Using the confidence interval could you conclude that u is smaller than the norm of 285 9 Limit theorems Sampling distribution The Law of Large Numbers Let X1X2 X3 be a sequence of independent random variables each with the same distribution and the mean u Then for every 8 gt 0 X X Pr in gts gt0 J as n gtoo The Central Limit Theorem Let X1X2 X3 be a sequence of independent random variables each with the same distribution and the mean u and the variance 0392 Then for any numbers a lt b X X Palt1 Lmltb gt IxZ 117i Ejezdy Ira as n gt 00 The Sampling Distribution of the Sample Mean 0 In real life 39 39 quot J ofr r 39 quot is often 39 r quot 39 because populations are very large 0 Rather than investigating the entire population we take a small sample calculate a statistic related to the parameter of interest and then make an inference O The sampling distribution of the statistic is the tool that tells us how close is the statistic to the parameter Properties of sample mean Sample mean YX1NXn n 1 X 2 039 2 of 7X 3 If X is normal X is normal If X is nonnormal then X is approximately normally distributed for suf ciently large n Sampling Distribution of a Proportion O The parameter of interest for qualitative data is the proportion of times a particular outcome success occurs A X 0 To estimate the population proportion p we use the sample proportion p 7 where n X is the number of successes in the sample and n is the sample size 0 The sampling distribution of X nj is the binomial b p n O In the case of the large n we prefer to use normal approximation to the binomial distribution to make inferences about 7 Normal approximation to the Binomial Normal approximation to the binomial works best when 0 the number of experiments sample size is large and o the probability of success p is close to 05 For the approximation to provide good results npl p gt 5 If npl p gt 5 then p 23 p1 ivy n has approximately standard normal distribution Continuity Correction Factor Let X has the binomial distribution b p n and Y is a normal random variable that approximate the binomial random variable X ie Y N N np npl p When n is small less than 30 we will use the following approximations 0 PX kEPk 5 ltY ltk5 0 PX SkEPY ltk5 0 PX ZkEPY gtk 5 Sampling Distribution of the difference between two means What is the sampling distribution of the difference between two sample means when independent random samples are drawn from two normal populations Let X1X2 an be normally distributed with the mean LLX and the variance 039 and Y1 Y2 Y 2 be normally distributed with the mean y and the variance 039 We assume that these two sequences of the random variables are independent Then the difference between two sample means T 17 is normally distributed with the mean LLX y and the variance 039 039 Exercises p 230 488 490 492 p 249 4116 4118 4120 Homework 487 489 491 4117 4119 References 1 Chase and Bown General Statistics 2 Hildebrand and Ott Statistical Thinking for Managers 3 Keller and Warrack Statistics for Management anal Economics 4 McClave Benson and Sincich A First Course In Business Statistics Exercises 1 It was estimated that 154 of the United States population in 1995 had no health insurance Find the probability that fewer than 70 of 400 randomly selected people will have no health insurance 2 Using past data an airline believes that 8 of the people who make reservations for a certain ight will not appear The seating capacity for the ight is 300 the airline sells 315 tickets What is the probability that everyone who shows up has a seat on the ight 3 The heights X of players in a division of high school football teams are approximately normal with mean 71 inches and standard deviation 25 inches Consider the distribution of sample means with sample size n 100 a Find ILL b Find 039 c What percentage of sample means are larger than 705 d What percentage of heights are more than 705 4 The treatment time X of patients with an eye disease is approximately normal with mean 70 minutes and standard deviation 9 minutes In parts a and b find the proportion of treatment times a Less than 79 minutes b Between 58 and 82 minutes For parts c and d assume that a sample of 36 treatment times is selected Find the following c P67 lt E lt 73 d Po gt 73 6 A dairy claims that the mean amount in its milk containers is 128 ounces LetX be the number of ounces of milk per container and assume that X is normally distributed with standard deviation 1 ounce If the claim is true what percentage of containers will have a Less than 126 ounces b More than 129 ounces c Between 1275 and 1305 ounces A random sample of 25 containers gave a sample mean of 1274 ounces d Find Plt 1274 e Using part d do you think that there is eVidence that the true mean is less than 128 ounces Why Example Simulatinn following pmbabthty dumbuuon 1 Fnstwe needto enter the pmbabthty dumbuuon Into the Wmhsheet Wmduw 2 t t Window Type the sample size 100m Generate laws adam c3 tn mm m calurms Window se1eetXas Values and P 2a as Prababxlmes Chek OK 3 It creates column C Wth 100 data points that follow the given pmbabthty dumbuu on Haw tn test nnrm ality nrthe data 1 Choose Sm msts Smashes ampNmmahzy Test 2 1n Vanahzs enter c3 In my enter Narmale T251 Chek OK 3 It creates the following picture Nurmahty Test 03 xr gm Mm avulgy f wnn 4 prvalue Lhatxs less than 05 mdmates nonrnormahty othe data