# Statistical Reasoning & Practice, Week 2 Notes 36-201

Being mathematically precise about describing quantitative distribution: - measuring center (mean vs. median) - measuring spread (standard deviation vs. IQR) Using boxplots to compare distribu...
This 3 page Class Notes was uploaded by Monica Chang on Saturday September 10, 2016. The Class Notes belongs to 36-201 at Carnegie Mellon University taught by Gordon Weinberg in Fall 2016. Since its upload, it has received 28 views. For similar materials see Statistical Reasoning and Practice in Statistics at Carnegie Mellon University.

Date Created: 09/10/16
Week 2 Being mathematically precise about describing quantitative value distribution Measuring center: MEAN - Mean: the arithmetic average The sample mean, which is a statistic, is denoted by X´ (“ex- bar”): X +X +…+X ∑ X X= 1 2 = i n n where n is the sample size - The population mean, which is a parameter, is denoted by μ (“mu”): x1+x 2…+x N ∑ xi μ= N = N where N is the population size - The mean is sensitive/nonresistant to outliers, it is pulled towards the tail in a skewed distribution MEDIAN - The mean cannot be very useful when you have big outlier(s), so we use the median to measure center - To find the median 1. Reorder data value from smallest to largest 2. If there are an odd # of values, the median is the middle value 3. If there are an even # of values, the median is the average of the two middle values - The median is relatively resistant to outliers and skewness, median will not be pulled towards that tail as greatly as the mean in a skewed distribution IN GENERAL - In a symmetric distribution mean and median are almost the same - For bell distributions (symmetric), the mean is appropriate. - For distributions with severe outliers and skewness, use median as measure of center - Use the mean as a measure of center whenever possible b/c is uses data values exactly, so it’s more mathematically useful Measuring spread: STANDARD DEVIATION - Standard deviation: typical/standard deviation from mean - Standard deviation is: deviations Standard deviatio√= adjustedaverageof (¿)2 - For most data, about 2/3 of the data values are within one standard deviation from mean - Larger spread means larger standard deviation - Standard deviation is a positive number with the same units as the data - The sample standard deviation, which is a statistic, is denoted by S: n 2 2 2 ∑ (X iX) 2 (X1−X) +(X 2X) +…+(X −Xn ´ i=1 n−1 =¿√ n−1 S= √¿ where n is the sample size, but we have to divide by n-1 because the minus 1 adjusts for the sample. The sample variance, which is denoted by s , is the square of the sample standard deviation. The population standard deviation is, which is a parameter, is denoted by σ (“signma”): n (x −μ)2 (1 −μ) +(x2−μ) +…+(x nμ) 2 i=1 i =¿ N √ N σ= √¿ where N is the population size. 2 The population variance, which is denoted by σ , is the square of the population standard deviation. - Note that typically, we denote random variables (e.g. sample statistics) with upper-case letters. Note that sample size, since it is not a varying estimate, is denoted with a lower-case n, while population size, as the corresponding population attribute, is denoted with the upper-case n. IQR - The standard deviation cannot be very useful when you have severe outlier(s)/skewness, so we use the IQR. - IQR = Q3-Q1, which tells you the values the middle 50% of the data spans o Q1: the value at which 25% of the data points are below it o Q3: the value at which 75% of the data points are below it - Five-number Summary: min, Q1, median, Q3, max HOW TO CHOOSE APPROPRIATE MEASURES FOR CENTER AND SPREAD: For data that has strong skewed Center: Median and/or outliers Spread: IQR For data that is approximately Center: X´ (Mean) symmetric with no strong outliers Spread: S (Standard Deviation) Boxplot: - Boxplot: another way to display quantitative data using the five- number summary - How to make a boxplot: mark Q1, median, Q3 and connect to make a box. Mark the lowest and highest non-suspected outliers (often they are the min and max) and make “whisker” lines to connect to box. Notes: there is no single formula to find these the lowest and highest non-suspected outliers. Mark outliers. - Boxplots can display skewness but don’t display as many characteristics (e.g. modality) as a histogram, but they are useful for comparing multiple distributions using side-by-side boxplots o e.g. If the difference between group centers, is less than the variability within each group, there may not be an indication of a difference between the groups, and vise versa.

