# INTRO TO STATISTICS (QL)(SSS) STAT 1040

Utah State University

GPA 3.72

## 25

## 0

This 17 page Class Notes was uploaded by Geovanny Lakin on Wednesday October 28, 2015. The Class Notes belongs to STAT 1040 at Utah State University taught by Staff in Fall.

Date Created: 10/28/15

Ch 5 The Normal Approximation to Data The or is the most important curve in statistics This curve is that important that it is even honored on a German bank note Many data sets approximate the normal curve closely and many statistical procedures use the normal curve The equation of the normal curve is 100eim22 y 7m where e 211828 but we won39t use the equa tion The graph is symmetric has a total area of 100 between the curve and the horizontal x axis and the curve is always above the x axis though it gets very close Many histograms fit the normal curve closely if they are drawn in Standard units say how many SD39s above or below the average a value is Q In the HANES sample women 18 74 had an average height of 635 and an SD of 25 If a woman is 66quot tall she is 66 635 25 above average which is 1 SD above average Therefore 66quot 1 standard unit 0 What is a height of 56quot in standard units 0 675 o 635 o What height is 6 standard units If a histogram follows the normal curve the area under the histogram is approximately the same as the area under the normal curve Recall from Chapter 4 about 68 of values are often within 1 SD of the average about 95 are within 2 SD39s and about 997 are within 3 SD39s This is based on the normal curve The table on page A 105 gives percentages within certain numbers of standard units 2 of the average The modified table at the end of this set of handouts does not contain the Height column which we don39t use We can calculate the area for any range of values by using this table Sketches often help in using this table Note different books contain different tables 45 i What is the area under the normal curve Between O5 and 05 Between 0 and 15 Between 2 and 0 Between 2 and 15 Between 15 and 2 More than 15 Less than 05 Between 05 and 15 The Normal Approximation We use the to estimate ar eas for histograms which closely follow the normal curve For women in HANES avg 635 SD 25 o What percentage were between 62 and 685 o What percentage were less than 675 How to proceed 1 Rewrite the question putting all values in standard units 2 Answer the question as though it were asked about the normal curve Percentiles If data is not normal then the average and SD are not as good as summary statistics Q For the 1987 income data avg 44500 and SD 32000 By the normal approxima tion 0 is at 14 standard units How did we get this value of 14 The area to the left of 14 under the normal curve is about 8 Thus by the normal ap proximation about 8 of families should have a negative income With non normal data we can summarize more accurately than with the average and SD by using The is the value such that p of the values are below and 100 p of the values are above We commonly use percentiles including 1 10 25 50 75 90 and 99 The 50th percentile is called the The 25th percentile is called the The 75th percentile is called the Percentiles of 1992 US family income Percentile Income 1 1300 10 10200 25 20100 50 36800 75 58100 90 85000 99 151800 Interquartile Range The IQR is IQR 75th percentile 7 25th percentile The IQR is used as a measure of spread when the SD is too heavily influenced by one or two extreme tails Q What is the IQR of the 1992 US family in come data Calculating Percentages When a histogram follows the normal curve we can use a normal table to estimate the percentiles of the data H Work backwards in the table to go from area to 2 if a percentile greater than 50 or 2 if less than 50 M Convert z or z to the original units Q What is the 99th percentile of women39s heights in the HANES study Q What is the first quartile of the women39s heights Change of Scale Suppose we wish to work with our data in new units such as changing meters to feet or degrees Fahrenheit to degrees Celsius Generally this will involve multiplying every value by the same con stant and perhaps adding another constant to every value How will this change the average and SD 0 Adding the same constant to every value on a list adds the same constant to the average The SD doesn39t change 0 Multiplying every value on a list by the same constant will multiply the average by the same constant and the SD by the absolute value of the constant 0 Neither will affect the standard units Q To convert degrees Fahrenheit to degrees Celsius we use the formula 5 5 160 7 F 7 2 7F 7 7 C 9 3 9 9 Given Fahrenheit temperatures 32 50 59 68 and 86 with average 59 and SD 18 1 Convert the temperatures to Celsius and find the average and the SD 2 How could we calculate the Celsius average and SD without converting all of the values Ch 21 The Accuracy of Percentages So far we assumed we knew what was in the box at least the average and SD eg o Games of chance 0 Sampling from a known population known average and SD In practice usually we don t know the population ie the box That39s why we samplell 165 Q We want to estimate the percentage of voters in a large population who support the Republican candidate and obtain a SE of this percentage HOW 0 Estimate the population percentage using the sample percentage o Estimate the SE by pretending that the pop ulation is just like the sample and calculate SE This is called 166 Suppose we sample 1600 voters from this popula tion and find out that 56 of the sample support the Republican candidate 1 Estimate the percentage of voters in the whole population who support the Republican can didate 2 Find the standard error of this percentage 167 Confidence Intervals The normal approximation tells us that there is a 95 chance that a sample sum or percentage will be within 2 SE s of the EV Therefore if we consider a range sample i 2SE there is about a 95 chance that this will include the EV the population percentage We call this range a CI For this interval to be valid the sample size should be large and the sample percentage should not be too close to 0 or 100 168 Q What is the 95 confidence interval for the percentage of voters who support the Republican candidate Q Lemons are premium grade table grade or juice grade A farmer takes a random sample of 500 lemons from a large crop and finds that 75 are juice grade Find a 95 confidence interval for the percentage ofjuice grade lemons in the crop 169

