# S301 Week 2 Lecture & Textbook Notes STAT-S301

IU

GPA 3.98

This 3 page Class Notes was uploaded by Lauren Detweiler on Friday January 23, 2015. The Class Notes belongs to STAT-S301 at Indiana University taught by Hannah Bolte in Spring2015.

Date Created: 01/23/15

S301 Textbook Week 2 Ch 3334 amp 4 pgs 3267 Required concepts not covered in lecture 1 Area Principle the area of a plot that shows data should be proportional to the amount of data a Presenting relative sizes accurately b Violations of the area principle include i Decorations that sacri ce accuracy ii Baseline of the chart is not at zero 11 Best Practices a Use a bar chart to show the frequencies of a categorical variable Use a pie chart to show the proportions of a categorical variable Keep the baseline of a bar chart at zero Preserve the ordering of an ordinal variable Respect the area principle Show the best plots to answer the motivating question Label your chart to show the categories and indicate whether some have been combined or omitted III Measures of Central Tendency a Mean i The average found by dividing the sum of the values by the number of values Shown as a symbol with a line over it as in 37 ii Calculated by adding up the data and dividing by n the number of values b Median i The median of an ordinal variable is the label of the category of the middle observation when you sort the values ii Is not available unless the data can be put into order iii The 50th percentile iv If there is an even number of cases n is even the median is the average of the two values in the middle qormpoo c Mode i The mode of a categorical variable is the most common category IV IQR a Distance between the 25th and 75th percentiles calculate the difference between 75th and 25th b A natural summary of the amount of variation to accompany the median V Boxplots a A graphic consisting of a box whiskers and points that summarize the distribution of a numerical variable using the median and quartiles b Shows the five number summary the minimum lower quartile median upper quartile and maximum of a variable in a graph Lecture Week 2 II III IV VI VII See Week 2 Frequency Examplexlsx 9 frequency inclass C To nd median from frequency table a median quartile percentile To nd mean from frequency table a BinFreq bin frequency b Sumbinfreq array i or combine both a and b by using sumproductbinfrequency c Then divide by of observations Variance a Sample variance i Standard deviation sample is square root of sample variance b Population variance i Standard deviation population is square root of population variance c To nd variance nd mean nd deviations square each one sum them divide by n if population variance or 11 if sample variance then take the square root To nd the deviations of a list of numbers take each entry on the original list and subtract the mean a Eg for l23456 i Mean is 35 ii Deviations are 25 15 05 05 15 25 1 Sum of deviations ALWAYS 0 Skewness a Mean higher than median for right skew b Mean lower than median for left skew c Rightleft is based on which side the tail is on Boxplot boxand whiskerplot a A way of displaying a distribution by it s min Q1 median Q3 and max i Also lets us determine outliers b Steps i Sort data from low to high ii Calculate the rst quartile Q1 the median Q2 and the third quartile Q3 iii Draw a box w its left edge at Q1 and its right edge at Q3 iv Draw a vertical line through the box at the median v Compute the following limits 1 Q115Q3Q1 2 Q315Q3Q1 vi Draw a line from Q1 to the lowest data value above the lower limit Draw a line from Q3 to the highest data value below the upper limit Vii Any points outside the limits are considered outliers and should be plotted as individual points

