# Stat Chapter 3 Notes Stat 206

USC

This 13 page Class Notes was uploaded by Brandon Gearhart on Monday October 3, 2016. The Class Notes belongs to Stat 206 at University of South Carolina taught by Angela Ferguson in Fall 2016. Since its upload, it has received 4 views. For similar materials see Business Statistics in Math at University of South Carolina.

## Reviews for Stat Chapter 3 Notes

Date Created: 10/03/16

STAT 206: Chapter 3 (Numerical Descriptive Measures) Ideas in Chapter 3 Central Tendency Variation Shape 3.1 Central Tendency 3 different ways to consider the “center” of the distribution… “Balancing point” (mean/average) Value divides the upper half from the lower half of the data (median) Value(s) occurs most often (mode) Let’s call the variable X ∑ x i means MEAN – arithmetic average “Balancing point” (MEAN/average) Let’s use our variable X with3.6 mean of density curve is point at which it would balance. Moore, Notz, STATISTICS Concepts and Controversies, 8th ed., p.292 x (or x i ) = value for one record (x 1, x 2and, xn = number of values X (“x-bar”) is the mean of the variable X Sample mean is ´ X=¿ 1 Example: what is the (typical) MEAN time it takes you to get ready in the morning? Measure time between when you get up until you leave your home (rounded to the nearest minute) for 10 days. Day 1 2 3 4 5 6 7 8 9 10 Time 39 29 43 52 39 44 40 31 44 35 X=¿ Notice: Example: what is the (typical) MEAN time it takes you to get ready in the morning? Measure time between when you get up until you leave your home (rounded to the nearest minute) for 10 days. Day 1 2 3 4 5 6 7 8 9 10 Time 39 29 103 52 39 44 40 31 44 35 ´ X=¿ Notice: 2 Median – middle value MEDIAN is NOT affected by extreme values MUST RANK IN ORDER n+1 MEDIAN= ranked value 2 If n is ODD, If n is EVEN, Example: what is the (typical) MEAN time it takes you to get ready in the morning? Measure time between when you get up until you leave your home (rounded to the nearest minute) for 10 days. Day 1 2 3 4 5 6 7 8 9 10 Time 39 29 43 52 39 44 40 31 44 35 Values 29 31 35 39 39 40 43 44 44 52 Ranks 1 2 3 4 5 6 7 8 9 10 Median = Consider the extreme value example… Ranks 1 2 3 4 5 6 7 8 9 10 10 Values 29 31 35 39 39 40 44 44 52 3 Example 2: Calories for 7 breakfast cereals. Compute the median. Values 80 100 100 110 240 190 200 Ranks 1 2 3 4 5 6 7 3 MODE is Extreme values do NOT affect the MODE There may be one mode, two modes (bi-modal), three modes (tri-modal), etc. OR there may be NO mode if all values are unique EXAMPLE: times to get ready in the morning (again) Values 29 31 35 39 39 40 43 44 44 52 GEOMETRIC MEAN: When you want to measure the rate of change of a variable over time, use the GEOMETRIC MEAN (instead of the arithmetic mean) Geometric Mean is ´ XG=¿ The GEOMETRIC MEAN rate of return measures the average percentage return of an investment per time period 1 R =[(1+R1)×(1+R2)× … ×(1+R n)] −1 , where R i rate of return in period i G EXAMPLE: The percentage change in the Russell 2000 Index of the stock prices of 2,000 small companies was -5.5% in 2011 and 14.6% in 2012. Compute the GEOMETRIC rate of return. RG=¿ The geometric mean rate of return in the Russell 2000 Index 4 3.2 Variation and Shape RANGE = EXAMPLE: times to get ready in the morning (and again) Values 29 31 35 39 39 40 43 44 44 52 VARIANCE, STANDARD DEVIATION VARIANCE is 2 ∑ (x−X)2 s = (n−1) STANDARD DEVIATION is 2 2 ∑ (x−X) s=√s =√ (n−1) Can be thought of as REMEMBER! Neither the variance nor the standard deviation can ever be NEGATIVE Example: Scores for CLASS A: 30, 65, 70, 76, 93, 99 Scores for CLASS B: 68, 72, 73, 73, 74, 77 What is the difference? Find the standard deviation for each class. Class A x x-xbar (x-xbar)2 30 -42.17 1778.31 65 -7.17 51.41 70 -2.17 4.69 76 3.83 14.69 93 20.83 434.03 5 99 26.83 720.03 Standard deviation, or s, controls the spread. That is, the larger the value of s, the more spread out or “variable” the data are. SD= 0.3 SD=0.7 SD=2.0 Coefficient of Variation is CV = EXAMPLE: times to get ready in the morning (and again) Valu 2 3 3 3 3 4 4 4 4 5 es 9 1 5 9 9 0 3 4 4 2 ´ = 39.6 and s = 6.77 X CV=¿ Coefficient of Variation measures the scatter in the data relative to the mean. That is, CV is expressed as a percentage since it is a relative measure (standard deviation divided by the mean times 100) Useful when comparing two or more sets of data that are measured in different units (see EXAMPLE 3.7, p. 112) Z score is equal to the difference between a value and the mean, divided by the standard deviation Z = (x-mean)/s Z is a unit of measure of the number of standard deviations If positive, ABOVE the mean If negative, BELOW the mean Z helps identify outliers In general, Z < -3.00 or Z > 3.00 indicates an outlier value 6 ´ EXAMPLE: times to get ready in the morning (and again). X = 39.6 and s = 6.77. What is the Z-score for 39 minutes to get ready? 7 Shape of a variable – pattern of distribution SKEWNESS: extent to which data values are not symmetrical around the mean mean vs. median 8 KURTOSIS: 3.3 Exploring Numerical Data Let’s consider Measures of Position: PERCENTILES, QUARTILES, and 5-NUMBER SUMMARY PERCENTILE: the pth percentile is a value such that p percent of the observations fall below (or at) that value QUARTILES: special cases of percentiles Q1= observation at the 25 thpercentile (median of lower half of data set) Q2= Q3= 5-NUMBER SUMMARY includes: 9 Boxplots Which restaurant has higher calories overall? Which restaurant has the least variability in calories? 3.4 Numerical Descriptive Measures for a Population 10 3.1 and 3.2 discuss statistics for a SAMPLE When data are collected for an entire population, analyze population PARAMETERS POPULATION mean ( μ ) is μ=¿ 2 POPULATION variance ( σ ) 2 σ =¿ POPULATION standard deviation ( σ ) is σ=¿ Empirical Rule for normal distributions Remembering that in many/most(?) data sets, a large portion of the values tend to cluster somewhere near the mean For normal (bell-shaped, symmetric) distributions, we are able to use the Empirical Rule Within 1 std dev of the mean (gray area) ~ 68% Within 2 std dev of the mean (gray + yellow) ~ 95% Within 3 std dev of the mean (gray + yellow + orange) ~ 99.7% 11 Example: The Health and Nutrition Examination Study of 1976-1980 (HANES) studied the heights of adults (aged 18-24) and found that the heights follow a normal distribution with the following: Women Mean (): 65.0 inches standard deviation (): 2.5 inches Men Mean (): 70.0 inches standard deviation (): 2.8 inches Find the proportion of men with heights between 67.2 inches and 72.8 inches. Chebyshev Rule Can’t use the Empirical Rule for heavily skewed data sets Chebyshev rule states that for any data set, regardless of shape, the percentage of values found within k standard deviations of the mean must be at least: 1 % (within k std dev) ¿ 1− 2 × 100% ( ) k % of Data Values Around the Mean Chebyshev Empirical Interval (any distribution) (normal) ( μ−σ,μ+σ¿ at least 0% ~ 68% ( μ−2σ , μ+2σ¿ at least 75% ~ 95% (μ−3σ,μ+3σ¿ at least 88.89% ~ 99.7% Example: A population of 2-liter bottles of cola is known to have a mean fill- weight of 2.06 liter and a standard deviation of 0.02 liter. However, the shape of the population is unknown, and you cannot assume that it is bell-shaped. Describe the distribution of fill-weights. 3.6 Descriptive Statistics: Pitfalls and Ethical Issues Massive amounts of data available online Should you report the mean or the median? Should you report 5-number summary or the variance and standard deviation? 12 Unethical is pertinent findings are deliberately NOT reported if they are detrimental to a particular position 13

