# STATS1350WeekFiveNotes.pdf Stats 1350

OSU

This 8 page Class Notes was uploaded by Alyssa Leathers on Friday February 13, 2015. The Class Notes belongs to Stats 1350 at Ohio State University taught by Ali Miller in Winter2015.

Date Created: 02/13/15

STATS 1350 21215 136 PM Week Five Chapters 11 and 12 GraphsSummary Statistics Describing Quantitative Data 0 Focus on three key characteristics of the distribution when we describe that variable 0 1 Shape o 2 Center 0 3 Spread 0 How do we determine the centerspread of a distribution 0 Come back to talk about shape issues and why shape matters 0 Center 0 Mean average add up and divide by number in data set 0 Mode biggest peakmost common 0 Median middle number in data set order data from smallest to greatest count of observations n calculate the center position of the data set n12 if n is odd then the center is exact middle point even number take average of two middle pairquot 0 Examples 0 What is the medianmean of data set A 23 25 325 33 67 1 20 20 1 23 25 325 33 67 Median 25 712 4 Mean add all7 2307 Medianmedian not the same 0 Medianmean of data set B 1 2 4 6 8 9 12 13 Median 7 682 45 position between 4th and 5th number position Mean add all8 67 Spread or Variability 0 Standard Deviation the average distance of data values from the mean 0 S square root of E sigma xmeanaverage squared n1 0 Range overall distance between the minimum and maximum data values 0 Max min 0 Note extremely sensitive to outliers Interquartile Range range of the middle 50 of the data distance between Q1 25th percentile and Q3 75th percentile 0 Based on the median Find Standard Deviation 0 Data 0 2 4 o 1 Calculate mean 0243 2 o 2 Calculate the variance 02 squared 22 squared 42 squared 2 find distances square them add them up 4042 4 Backwards square root Square root of 4 2 standard deviation Quartiles and Percentiles Pth percentile value such part that p of the observations fall at or below it 0 25th percentile 1St quartile Q1 25 of data is smaller than Q1 0 50th percentile median M half of data is smaller than median 0 75th percentile 3rd quartile Q3 75 of data is smaller than Q3 0 5number summary is Minimum Q1 M Q3 Maximum 0 How do you find quartiles 0 mini mediansquot 0 Find median of each path of the data set split data into two sets 0 Leave the median out only applies if n is odd 0 Left side Q1 0 Right side Q3 5number summary Dataset A o 20 1 25 33 67 0 Q1 20 1 0 Warning do not include median in each half when finding quartiles different in other books M 25 no perfect quartile 0 Q3 33 67 Boxplot Dataset A 0 Visual summary of what s going on in the 5number summary 0 Get 5number summary draw number line create box accordingly 0 Make number scale first before you make boxarms 5number summary Dataset B o 1 2 4 6 8 9 12 13 0 Q1 242 3 0 Q3 9122 105 o 1 3 7 105 13 5number summary Graphing quantitative data histograms Looks like a bar graph except xaxis has a continuous numerical scale 0 No spaces between bars 0 Show distribution of a quantitative variable 0 Bars should be equal width 0 How many peaks does this distribution have 0 1 unimodal o 2 bimodal o 3 multimodal Where is the long tail means skew To the right or to the left 0 Right skewed long tail on right median lt mode Symmetric no skew median mode 0 Left skewed long tail on left median gt mode 0 Note mean goes with the tailmedian goes with the bump Homework Review 0 Correlation does not equal causation Accurate scale reliable 0 Look at answers that are plausible Confounding working variables mess up results Histogram examples 0 Skewed Left tail to the left Skewed Right tail to the right Bimodal two bumps Boxplots vs Histograms Modified box plot shows outliers Boxplots do a good job of o 1 Showing symmetry vs skew 2 Showing center the median and spread the IQR 3 Making comparisons among groups 0 But they don t show if a distribution is bimodal or pick up spikes in the histogram Boxplots also tell us nothing about the size of our data set or the frequency of values that fall within different intervals Using boxplots for making comparisons Which drive type has the lightest carsquot Front wheel 0 Center is lowerbox is low 0 For which drive type is the distribution most skewedquot 4 wheel 0 For which are they most variablequot 4 wheel 0 Box is longer middle 50 more spread range is bigger Judging Skew in Boxplots In general what should you look for in a boxplot to determine if the distribution is skewed o Tails in histogram o How do the lines extending from the box compare in length Is one line longer than the other Skew o Are there outliers in one particular direction 0 Is the median M closer to one of the quartiles Q1Q3 or is it directly between each quartile skew If the median closer to Q1 skew right indication If the median closer to Q3 skew left indication o Skewed right mean gt median o Symmetrical mean median o Skewed left mean lt median What about outliers Unusually large deviations from the overall pattern How can we spot unusually large deviationsquot Distance from center 0 Do outliers show up differently in boxplotshistograms o What do you think you should do if you have outliers Will it affect your interpretation of the data Interquartile Range and Suspected Outliers IQR Q3 Q1 middle 50 of the data points 0 High outliers gt Q3 15IQR 0 Low outliers lt Q1 15IQR 0 Not in book on exam Suspected Outliers Dataset A 0 Min 20 0 Q1 1 0 Median 25 0 Q3 33 0 Max 67 IQR 331 32 0 Low outliers lt47 lt1 15 x 32 anything smaller than 47 outlier 0 High outlier gt81 gt33 15 x 32 anything greater than 81 outlier 0 Dataset A no outliers Start by looking for outliers with a histogram or stemplot but do the 15 x IQR check to know for sure Online Time Example Data 0 5 number summaries 0 Min 7 0 Q1 30 0 Median 465 0 Q3 77 0 Max 151 Outliers o IQR 47 0 High gt1475 0 LOW lt405 0 High outlier 151 o No Low outliers Boxplot o Modified Boxplot arms stop at the largestsmallest values that are not outliers o What are the most appropriate measures of center and spread o Meanstandard deviation or medianIQR only reasonable answers 0 The median and the IQR Describing Center and Spread 0 5 number summary median and IQR 0 Better for skewed distributions or if you have outliers 0 Mean and standard deviation 0 Better for symmetric distributions without outliers Resistance not affected by skewoutliers ALL data values including outliers are involved in computing the mean and the standard deviation Therefore if there are outliers in a data set they will have an impact on the values of the mean and the standard deviation 0 The mean and the standard deviation are not resistant to outliers To compute the median and the IQR we look at the middle value in the data set or the range of the middle 50 of the data Outliers are not included in our computations o The median and IQR are more resistant to outliers than the mean and standard deviation Mean vs Median The mean reading score half scoring above and half below was 508quot 0 Not accurate should say median When Numbers Mislead Skew messes with the mean 21215 136 PM 21215 136 PM

