# Week 2 30002

KSU

This 3 page Class Notes was uploaded by Cole Wojdacz on Tuesday February 2, 2016. The Class Notes belongs to 30002 at Kent State University taught by Dr. Eng in Summer 2015. Since its upload, it has received 15 views. For similar materials see Introductory biostatistics in Public Health at Kent State University.

Date Created: 02/02/16

Measures of Central Tendency Numerical Methods It is often desirable to describe some particular characteristic of a data set numerically. Perhaps the most familiar measure of this sort is what is commonly referred to as the "average" or more precisely, the arithmetic mean of a set of data We now examine three distinct categories of such measures: o Measures of Central Tendency: provide information about typical or average values of a data set Mode: the value is a sample that has the largest frequency or percentage of occurrence; the score or scores that occur most frequently Median: value in an ordered distribution that is in the exact middle; corresponds to the 50th percentile; divides a data set into two equal parts Holds 50% of values above and 50% of values below Calculation: Order data If 'n' is odd - median value is the middle value If 'n' is even - median is the mean of the two middle values A dataset can only have one median Insensitive to extreme observations Mean: the arithmetic average; the sum of all the values divided by the number of values Best known of the measures of central tendency What most people refer to as the average Sum of data divided but quantity of data Sample Mean Population Mean Measures of Variability It is important to be able to quantify the degree of spread or scatter in a data set. Measures of this sort are referred to as measures of variability or dispersion Measures of variation: numerical values that indicate the dispersion or spread inherent in a data set o A small value indicates that data are concentrated around the mea The mean is representative of the data set o Used when comparing the distributions of two or more sets of data Range: a function of only the largest and smallest sores in a dataset Exclusive Range: the difference between the largest and smallest scores in the data The Mean Deviation Highly intuitive measure of variability Unlike range, mean deviation takes into account all the data for which variability is set to be assessed thereby making it a more stable statistic Deviation Score As with other measures of variability as well as other statistics, mean deviation is based on what are termed deviation scores (deviations) o The score minus the data set mean o Small deviations = data clumped around the mean o Large deviations = spread out data Mean of Deviation Scores Plausibly, a reasonable representation of variability could be based on the average of these deviations o Spread out data = larger average of deviations This can be an issue because the sum of the deviation scores always equals zero o This can be overcome by taking the absolute values of deviations o The mean deviation is then the average of the absolute values of the deviations of a set of scores Variance: the average of the squared deviations of each observation in the set from the arithmetic mean Less intuitive but generally more useful measure of variability Different unit than the data being measured Parameter: the average of the squared deviations of the scores that make up the population Statistic Standard Deviation: the square root of variance Follows from the equations for variance Interpreted as "on average, how far each value deviates from the mean" Most common measure of variability Measures of Distributional Shape Certain aspects of distributions shapes can be characterized numerically Skew: the degree of asymmetry in a distribution Various methods have been developed to numerically describe the amount of skew (or lack thereof) that characterizes a distribution Negative or positive value for skew, the distribution is said to be negatively or positively skewed respectively When the value is zero, the distribution is said to be symmetric A distribution is skewed in the direction of its tail o Right (positive) skew has a long tail on the right side of the graph o Left (negative) skew has a long tail on the left side of the graph Kurtosis: the peakedness of a distribution relative to the length and size of its tails Leptokurtic: distributions with sharp peaks Platykurtic: distributions with flattened middles Kurtosis only applies to data with no more than one mode Positive: indicates that the observations show greater peakedness and longer tails than those in the normal distribution Negative: less peakedness and have shorter tails

