Class Note for STAT 528 at OSU 15
Class Note for STAT 528 at OSU 15
Popular in Course
Popular in Department
This 24 page Class Notes was uploaded by an elite notetaker on Friday February 6, 2015. The Class Notes belongs to a course at Ohio State University taught by a professor in Fall. Since its upload, it has received 24 views.
Reviews for Class Note for STAT 528 at OSU 15
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 02/06/15
Stat 528 Autumn 2008 Elly Kaizar Numerical summaries for data Describing distributions with numbers Reading Section 12 0 Measures of the center the mean median mode trimmed mean What do we use i the mean or the median o Quartiles A ve number summary Boxplots Outliers 0 Measures of spread variability Standard deviation variance lQR and range 0 Numerical Summaries in MlNlTAB 0 Changing the units of measurement The sheep weights revisited 0 Here is the dataset of the weights in pounds of 23 sheep 180 160 157 185 159 165 168 165 175 186 155 169 168 170 173 181 189 179 182 177 157 169 166 0 Here is the stemplot Stem and leaf of sheep N 23 Leaf Unit 10 4 15 5779 5 16 O 7 16 5568899 11 17 O3 9 17 579 6 18 012 3 18 569 0 Can you summarize the features of this dataset Measures of the center the mean 0 Let 1 2 ajn be our set of n observations 0 The mean of the observations is 12n i 211 TL TL o a is shorthand notation for take the sum from 239 a to b o For the sheep weights 22 so 3935 and n The mean of the data is The median o The median M is the exact midpoint77 of the data 1 Sort the data in increasing order 2a lf the number of data points 71 is odd the median M is the center value of the sorted data ie the 7ch largest value OR 2b lf n is even the median M is the average of the two center values ie the average of the gth and 7ch largest values 0 For the sheep dataset the sorted weights are 155 157 157 159 160 165 165 166 168 168 169 169 170 173 175 177 179 180 181 182 185 186 189 The order statistics 0 Suppose we have 71 data values 61 2 ajn 0 De ne the order statistics 331 332 301 as the data sorted from smallest to largest 331 smallest value 362 2nd smallest value 3301 largest value 0 Ex What is 361 and 363 for the sheep weight data Other measures of the center o The modes of a set of observations is the value or values which occurs most frequently The modes of the sheep weights is 0 Suppose we remove the bottom 0 and top 0 of the values from a set of observations The 0 trimmed mean is the mean of the remaining values As oz increases the mean is less affected by outliers Ex What is the 10 and 20 trimmed means of the sheep weights What do we use the mean or the median symmetric distributions mean median left skewed distributions mean lt median right skewed distributions mean gt median o Outliers values that lie outside the main body of the data can affect the mean too 0 The median is more robust or resistant for measuring the center of a distribution lf distribution is symmetric without outliers use the mean lf distribution is skewed or contains outliers use the me dian instead 0 Example Applet Mean vs Median Example Density 0010 0008 0006 0004 0002 0000 Adult American Weight mean median 100 i i i i 200 300 400 500 weight in pounds Source NHANES 20052006 Quartiles o The sample median M denotes the half way point of the sorted observations the 50 point Half of the data is below the median and half is above 0 The rst quartile Q1 is the 25 point 25 of the data lies below Q1 and 75 lies above 0 The third quartile Q3 is the 75 point 0 The interquartile range IQR is given by QR Q3 Q1 0 To calculate quartiles NOT standardized 1 Calculate the sample rnedian Split the sorted data in half If n is odd drop the median 2 Q1 is the median of the lower half of the data 3 Q3 is the median of the upper half of the data A ve number summary o The ve number summary is a simple description of the data Minimum Q1 M Q3 Maximum 0 For sheep weights the ve number summary is Minimum Q1 M Q3 Maximum 0 The boxplot illustrates this summary graphically 10 Some example bOXplots 190 r r r a a a m h m H H H spunod u sumam daaus 150 egg found In nest of 25 m N N N N a ompno 10 miua1 21 sparrow robin 11 Outliers 0 An observation further than 15 X QR from the closest quar tile is called an outlier Sometimes an observation further than 3 X QR from the closest quartile is called an extreme outlier o MlNlTAB denotes outliers on box plot with a asterisk separate from the main box plot 0 Sometimes outliers are not included in the numeric sum maries of data such as in calculating the mean median etc o lmportant question in practice should we really leave out data 12 Outliers example The amount of aluminum contamination ppm in plastic of a certain type was determined for a sample of 23 plastic specimens The data is sorted 3O 6O 63 7O 79 87 90 101 115 118 119 119 120 125 140 145 182 183 191 222 244 291 511 Construct a boxplot that shows outliers and comment on the features 500 i 400 i 300 i 200 i 100 aluminium contamination ppm 13 Measures of Spread o The range of the observations is range largest value smallest value 0 The interquartile range is IQR Q3 Q1 the range of the middle 50 of the sorted data 0 Variance 0 Standard Deviation 14 Measures of Spread the variance o The variance 52 of a set of observations is 2 i 221133i 3quot2 71 1 0 measures the average of the squared deviations of the ob servations from the mean Why squared Why 71 1 O Applet http hspm sph sc eduCUURSESJ716demosLeastSquaresLeastSquaresDemo html 0 When 52 O we have no spread c As 52 increases above 0 observations spread out further about the mean 0 BUT variance has squared units compared to the original data 15 The standard deviation o The standard deviation s of a set of observations is the square root of the variance ie 0 Has the same units as the observations 0 5 0 corresponds to no spread c As 5 increases above 0 observations spread out further about the mean 0 5 and 52 are sensitive to outliers and skewness 16 Calculating numerical summaries in MINITAB To summarize our sheep dataset 0 Select Stat gt Basic Statistics gt Display Descrip tive Statistics 0 A dialog box now appears 0 ln variable you select the variable you want to summarize Either type the variable number eg C1 into the box OR in the right hand panel click on C1 sheep and choose Select 0 To produce the summaries click OK in the dialog You can also produce plots of the data by clicking on Graphs 17 Numerical summaries in MINITAB cont o The summaries are presented in the Session Window Descriptive Statistics sheep Variable N N Mean SE Mean StDev Minimum Q1 Median Q3 sheep 23 0 17109 209 1001 15500 16500 16900 18000 Variable Maximum sheep 18900 0 Some headings you might not know N number of observations N number of missing observations StDev standard deviation SE Mean standard error for the mean ignore for just now 18 What summaries do we use o The mean is a good measure of the center of symmetric distributions For skewed distributions or distributions with outliers better to use the median o The standard deviation is a good measure of the spread or variability of symmetric distributions For skewed distributions or distributions with outliers better to use the range or IQR 19 Changing units of measurement transformations c When we collect data what is the effect of changing the units of measurement ls the distribution affected Do the measures of center and spread change 0 Example Suppose we measure the weight of Americans How are the summaries of the data altered by our choice of the units of measurement eg Pounds vs Kilograms 20 Linear transformations 0 Let 131332 ajn be our data and let our transformed data be y17y27quot397yn o A linear transform is de ned by y ab i1n Where a and b are constants Some examples of linear trans formations include ajzr yzr formula a b cents dollars yzr 36 100 pounds kilograms yzr 045mi Fahrenheit Celsius 5 1 9 33 32 o A linear transformation shifts and scales the x axis 0 Thus the shape of a distribution is not affected by a linear transform 21 Linear transformations example Adult American Weight Density 0006 0008 0010 0004 0002 0000 i i i i i i 0 100 200 300 400 500 weight in pounds Adult American Weight Density 0 010 0015 0020 0005 0000 i i i i i i 0 100 200 300 400 500 weight in kg 22 How do measures of center and spread change under a linear transform Remember 33 gt er for each 239 0 F01quot measures Of center y aba My abM Modey abMode o For measures of spread 511 lblsz 513 b2 535 Rangey b Rangeyc IQR b1QR 9 here b means the absolute value of b 23 Nonlinear transformations 0 Statistical methods often perform better when a histogram Density or distribution is a particular shape eg we often consider transformations of the data Which makes a histogram look more symmetric Linear transformations preserve the shape of distributions so we need to consider nonlinear transformations eg log arithms square roots reciprocals Adult American Weight Adult American Weight 00 Ln 0 e O o39 p 0 v I F 8 E 0 Ln 0 O O O 0 o39 l O o 100 200 300 400 500 45 50 55 60 65 weight in pounds ogweight in pounds 24
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'