Intro to Statistics 1034. Ch 3 Notes
Intro to Statistics 1034. Ch 3 Notes Stat 1034
Popular in Elementary Statistics I
Popular in Statistics
This 8 page Class Notes was uploaded by Alyssa Notetaker on Tuesday February 9, 2016. The Class Notes belongs to Stat 1034 at University of Cincinnati taught by Sarah Myers in Spring 2016. Since its upload, it has received 53 views. For similar materials see Elementary Statistics I in Statistics at University of Cincinnati.
Reviews for Intro to Statistics 1034. Ch 3 Notes
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 02/09/16
Chapter 3 Averages and Variation Material Extracted From Textbook (Brase, Charles Henry., and Corrinne Pellillo. Brase. Understandable Statistics: Concepts and Methods . 11th ed. N.p.: Cengage Learning, n.d. Print.) 3.1 Measures of Central Tendency: Mode, Median, and Mean Mode: The value that occurs most frequently. Note: if the data set has not single value that occurs more frequently than the other, then that data set has not more. If a data set has two values that occur at the same frequency it can be bimodal. Median: The central value of an ordered distribution. How to find the Median 1. Order the data from smallest to largest. 2. For an odd number of data values in the distribution. Median = Middle data 3. For an even number of data values in the distribution. Mean: An average that uses the exact value of each entry. How to find the mean What Do Averages Tell Us? ● The mode tells us the single data value that occurs most frequently in the data set. The value of the mode is completely determined by the data value that occurs most frequently. If not data value occurs more frequently than all the other data values, there is not mode. The specific values of the less frequently occurring data do not change the mode. ● The median tells us the middle value of data set that has been arranged in order from smallest to largest. The median is affected by only the relative position of the data values. For instance, if a data value about the median (or above the middle two values of a data set with an even number of data) is changed to another value above the median, the median itself does not change. ○ A disadvantage to the median is that it is not sensitive to the specific size of data. ● The mean tells us the value obtained by adding up all the data and dividing by the number of data. As such, the mean can change if just one data value changes. On the other hand, if the data values change, but the sum of the data remains the same, the mean will not change. Resistant Measure: One that is not influenced by extremely high or low data values. The mean is not a resistant measure of center because we can make the mean as large as we want by changing the size of the only one data value. The median is more resistant. Trimmed Mean: The mean of the data values left after “trimming” a specific percentage of the smallest and largest data values from the data set. ● More resistant than the mean but sensitive to specific data values. How to Compute a 5% Trimmed Mean 1. Order the data from smallest to largest. 2. Delete the bottom 5% of the data and the top 5% of the data. Note: If the calculations of 5% of the number of data values does not produce a whole number, round to the nearest integer. 3. Compute the mean of the remaining 90% of the data. ● Works for any other amount of percentage! Distributions and Averages ● When a data distribution is moundshaped symmetrical, the values of the mean, median, and mode are the if not almost all the same. ● For skewedleft distributions, the mean is less than the median and the median is less than the mode. ● For skewedright distributions, the mode is the smallest value, the median is the next largest, and the mean is the largest. Weighted Averages “Sometimes we wish to average numbers, but we want to assign more importance, or weight to some of the numbers. For instance, suppose your professor tells you that your grades will be based on a midterm and a final exam, each of which is based on 100 possible points. However, the final exam will be worth 60% of the grade and the midterms only 40%. How could you determine an average score that would reflect these different weights? The average you need is the weighted average” (96). 3.2 Measures of Variation “An average is an example to summarize a set of data using just one number. As some of our examples have shown, an average taken by itself may not always be very meaningful. We need a statistical crossreference that measures the spread of the data” (102). Range: Is the difference between the largest and smallest values of a data distribution. Variance and Standard Deviation “We need a measure of the distribution or spread of data around an expected value (insert symbols). Variance and standard deviation provide such measures. ● Sample standard deviation and sample variance are used to describe the spread of data about the mean ● Standard deviation and sample variance can be used for population (just make sure to use the proper symbols! Post both photos ● If the mean is rounded, the values of the standard deviation will change. What Do Measure of Variation Tell Us? Measures of Variation give information about the spread of the data. ● The range tells us thdifference between the highest data value and the lowest . It tells us about the spread of data but does not tell us if most of the data is or is not closer to the mean. ● The sample standard deviation is based on the difference between each data value and the ,mean of the data set. The magnitude of each data value enters into the calculation. The formula tells us to compute the difference between each data value and the mean, square each difference, add up all the squares, divide by n1, and then take one square root of the result. The standard deviation deviation gives an average of data spread out are around the mean. A smaller standard deviation indicates that the data tend to be closer to the mean. ● The variance tells us thesquare of standard deviation. As such, it is also measure of data spread around the mean. Population Parameters ● The formula for the population mean is the same as the formula for the sample mean just different samples. Coefficient of Variation Coefficient of Variation: Expresses the standard deviation as a percentage of the sample or population mean. ● The numerator and denominator have the same units. ○ This helps to compare the variability of two different populations using the coefficient of variation. “The coefficient of variation can be thought of as a measure of the spread of the data relative to the average of the data” (110). Chebyshev’s Theorem ● When dealing with a symmetrical, bellshaped distribution, then one can make definite conclusions about the proportion of the data that must lie within a certain number of standard deviations on either side of the mean. ● However, this theorem can generally help identify the the data spread about the mean for all distributions (skewed, symmetric, etc). ● Chebyshev’s Theorem refers to the minimal percentage of data that must fall within the specified number of standard deviations of the mean (111). What does Chebyshev’s Theorem Tell Us? ● The minimum percentage of data that falls between the mean and any specified number of standard deviations on either side of the mean. ● A minimum of 88.9% of the data falls between the values 3 standard deviations below the mean and 3 standard deviations above the mean. This implies that a maximum of 11.1% of data fall beyond 3 standard deviations of the mean. Such values might be suspect outliers, particularly for a moundshaped symmetric distribution (111). ● Tells us that no matter what the data distribution, 75% of the data lies within 2 standard deviations of the mean. Thoughts About Averages ● Averages do not tell much about the way data are distributed about the mean. ● The combination of an average (such as the mean) in addition to the variance and standard deviation helps to paint a more holistic understanding of a data set. Easier Computation with Grouped Data: 3.3 Percentiles and Boxand Whisker Plots Quartiles: Special percentiles used so frequently that we want to adopt a specific procedure for their computation. How to Compute Quartiles 1. Order the data from smallest to largest. 2. Find the median. This is the second quartile. 3. The first quartile Q1, is then the median of the lower half of the data; that is, it is the median of the data falling below the Q2 position (and not including Q2). 4. The third quartile Q3, is the median of the upper half of the data; that is, it is the median of the data falling above the Q2, position (and not including Q2). Interquartile Range (IQR): Q3Q1 = IQR ● A useful measure of fata spread utilizing relative position. ● Indicates the spread of the middle half of the data. Box and Whisker Plots 5NumberSummary Lowest value, Q1, median (Q2), Q3, highest value. ● BoxandWhisker Plots provide another useful technique from exploratory data analysis for describing data. How to Make BoxandWhisker Plots 1. Draw a vertical scale to include the lowest and highest data values. 2. To the right of the scale, draw a box from Q1 to Q3. 3. Include a solid line through the box at the median level. 4. Draw vertical lines, called whiskers, from Q1, to the lowest value and from Q3, to the highest value. Why Boxand Whisker Plots are helpful ● They give a graphic picture of how data is spread about the median . ● The location of the middle half of the data to help you identify whether or not the distribution is skewed or symmetrical. ● Identifying o utliers. ● Indicates the values of the “5NumberSummary.”
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'