ST 370

North Carolina State University STAT 370 Probability and Statistics for Engineers Yichao Wu Lass class Factor level complete factorial study fractional factorial study valid accurate precise Simple random sampling Stemandleafplot Homework 1 due Friday 16 at 5 PM Stemplot Separate each observation into a stem consisting of all but the final rightmost digit and a leaf the final digit Stems may have as many digits as needed but each leaf contains only a single digit Write the stems in a vertical column with the smallest at the top and draw a vertical line at the right of this column Write each leaf in the row to the right of its stem in increasing order out from the stem Example Midterm Scores of STAT 101 The following data set contains the midterm exam scores of STAT 101 74 76 78 88 87 87 53 95 82 79 79 78 62 8O 77 7O 6O 6O 84 95 85 93 79 84 71 8510077 72 95 79 83 97 87 73 84 74 83 85 95 62 5O 86 83 86 36 A O OOICDU lb00 Its stemandIeaf display 6 O3 0022 012344677889999 02333444555667778 355557 0 Backtoback Stemplot Number of home run hits each season Babe Ruth New York Yankees 19201934 54 59 35 4146 25 47 60 54 46 49 46 4134 22 Mark McGwire St Louis Cardinals 19862001 49 32 33 39224299 39 52 58 7O 65 32 29 Its backto back stemplot 52 54 976661 1 944 O lCDU lhOONAO 99 29 22399 29 28 5 O Splitting stems amp rounding For a moderate number of obs Split each stem into two one with leaves 04 and the other with leaves 59 Increase of stems reduce of leaves Rounding If many stems have no leaves or only one leaf rounding may help convenient to round or even truncate the data so that the final digit after rounding is suitable for a leaf Do this when the data have many digits Spending in dollars at a supermarket a QUINQUIhWNAO 599 15456775599 00125455665555 25699 1545579 0559 1 0 566 mwmqummmmtsbmumma oo 5 99 154 56775599 001254 55665555 2 5699 154 5579 05 59 1 66 Example A study on litter size mice Litter size the number of offsprings produced at one birth by an animal Data 170 observations 68565453 577974343 664653966 675456656 W67577585 554787566 95376878 441357446 4767m3688 676676576 455935647 474455656 635565866 397577535 776835885 676677799 577559236 626878788 486875448 StemandIeaf plot for litter size 0122333333333333344 35 0555555555555555555555555 132 1 001 Limitations of Stemplot Awkward for large data sets Splitting stemrounding is not very helpful Histogram how Choose intervals or bins that cover the entire range of the data Count the number of observations per interval Draw rectangles with heights corresponding to number in interval for relative frequency histogram height is relative frequency of interval Notes Intervals must be nonoverlapping Typically an observation equal to a boundary value is put in the higher interval Intervals must be contiguous rectangles touch each other Intervals must be equal width Often choose nice boundaries Example A study on litter size 1 2 34567 8910 1 a Example Example The manager at Wendy s is interested in studying typical arrival patterns during lunch hour She records the number of arrivals for 40 randomly selected 15minute intervals over lunch hour and obtains the following data 7 5 2 6 2 6 6 4 6 6 7 5 2 2 8 6 6 6 1 5 9 6 2 9 611 2 3 7 5 6 8 4 4 4 7 5 7 5 5 Make frequency and relative frequency table Draw histogram Example Call Center Data Financial firm call center Calls handled by AVI within 60 seconds October 666 December 523 Avi Service Time Data October Frequency Histogram 120 100 W 80 60 40 20 HHHH 6121824 30 36 42 48 54 60 calling time December Frequency Histogram 120 100 W 80 W 60 W WHMIHe 7quotj 173917 6 12 18 24 30 36 42 48 54 60 calling time Notes for Making Histogram Choose the number of classes sensibly Too few classes skyscraper graph Too many pancake graph Sturge s rule Choose number of classes k such that log n ltk lt log n 1 where n is the sample size Intervals must be of equal width Areas of the bars are proportional to the frequency Examining Distributions Overall Pattern Shape Center midpoint Spread range Deviations Outliers some values that fall outside the overall pattern Shapes of Distributions Graphs can help to determine shapes Modes peaks of a distribution Unimodal one peak Bimodal two peaks Symmetric or skewed Shapes of Distributions Symmetric histogram in which the right half is a mirror image of the left half Skewed to the right histogram in which the right tail is more stretched out than the eftong tail to the right Skewed to the left histogram the left tail is more stretched out than the rightlong tail to the left Bellshaped a special case of symmetric distribtions A histogram looks like a bell Shakespeare s Words Percent of Shakespeare s words 123456789101112 Number of letters in word Tuition and fees Number of colleges O 3 6 9 12 15 18 21 Tuition and fees 1000 24 27 30 A bimodal hi sssss am Shakespeare s Words Percent of Shakespeare s words 123456789101112 Number of letters in word Right skewed Left skewed Iowa Test of Basic Skills vocabulary scores Number of seventh graders m quot G N a o 8 o 8 o O 4 6 8 10 12 14 Iowa Test vocabulary score A study on litter size I Lu s rara Shapes of Distributions Symmetric histogram in which the right half is a mirror image of the left half Skewed to the right histogram in which the right tail is more stretched out than the eftong tail to the right Skewed to the left histogram the left tail is more stretched out than the rightlong tail to the left Bellshaped a special case of symmetric distributions A histogram looks like a bell lnclass exercise The volume of a stock is the number of shares traded on a given day The following data given in millions so that 378 represents 3780000 shares traded represents the volume of Altria Group stock traded for a random sample of 35 trading days in 2004 378 606 532 304 1032 338 1096 874 575 325 564 338 553 450 435 534 657 500 725 474 797 502 692 757 716 652 970 301 840 623 607 488 443 356 558 a Construct a frequency distribution of the data using bin widths of size 2 b Construct a relative frequency distribution of the data using bin widths of size 2 c Construct a frequency histogram and a relative frequency histogram of the data using bin widths of size 2 d On what percentage of the 40 days were at least 6 million shares traded e Describe the shape of the distribution Measuring center the mean 7 Mean Average value The sample mean 37 If the n observations in a sample are x1 x2 xn then their mean is 7c1x2 xnn i ani Measuring center the median Median middle value or center point The sample median M the number such that half of the observations are smaller than it and the other half are larger the midpoint of a distribution Procedure to calculate the median M 1 Arrange all observations in order of size from smallest to largest 2 If the number of observations n is odd the median M is the center observation in the ordered list 3 If n is even then M is the mean of the two center observations in the ordered list Note nhj12 is the location of the median not the median itse Example Fuel economy miles per gallon for 2001 twoseater cars The highway mileages of 18 gasolinepowered twoseater cars 13131619 212123 23 24 26 26 27 27 27 28 28 3O 3O Mean Median The highway mileages of 19 twoseater cars 13131619 212123 23 24 26 26 27 27 27 28 28 30 30 68 Mean Median Example Salary Survey of UNC Graduates Survey a certain number of graduates from UNC A lot of departments are surveyed Question Which department produces students that earn the most on average ten years after they got their degrees Answer Geography Michael Jordan Mean vs Median Mean easy to calculate easy to work with algebraically highly affected by outliers Not a resistant measure Median can be time consuming to calculate more resistant to a few extreme observations sometimes outliers robust Mode The most frequent value in the data Important fOr categorical data Possible to have more than one mode Mean Median and Mode If the unimodal distribution is exactly symmetric the mean the median and the mode are exactly the same If the distribution is skewed the three measures differ Mode I Mean Mean Mode Which one to use Different by definition Mean and median are unique and only for quantitative variables Mode is not unique Mode is defined for categorical variables also The choice depends on the shape of the distribution the type of data and the purpose of your study Skewed median Categorical mode Total quantity mean Outliers Observations that lie outside the overall pattern of a distribution Possible reasons error in data entry most likely reason Equipment failure Human error Missing value code extraordinary individuals Jordan s salary Handling Outliers Detect it using graphical and numerical methods Check the data to make sure correct entry Reducing influence of outlier delete the observation BE CAREFUL Use transformations robust methods Speed of Light Histogram Frequency w O I N U1 l N O l U39l I O I U1 l O 60 I II I 40 20 0 20 Time 40 6O Numerical Summary for Distributions Center Mean Median Mode Spread Fivenumber summary and Boxplot Standard Deviation Choose at least one from each category Take Home Message Examine distributions Overall pattern Shape Symmetric or skewed How many modes Bellshaped Outliers Graphical tools for quantitative data Stemplot Histograms Boxplot next time Mean median mode unimodal bimodal Read Section 31 of Vardeman and Jobe Homework 1 due Friday 16 at 5 PM


