Intro to Stats - Lectures 1-3

by: Amy Turk

Intro to Stats - Lectures 1-3 MATH-10041-002

Kent State University > Math > MATH-10041-002
Amy Turk

About this Document

A combination of lecture notes and book notes, all from chapters 1-3.
Introductory Statistics
Dr. Joseph Minerovic
Class Notes
Math, Statistics
This 10 page Class Notes was uploaded by Amy Turk on Saturday April 2, 2016. The Class Notes belongs to MATH-10041-002 at Kent State University taught by Dr. Joseph Minerovic in Spring 2016.


Date Created: 04/02/16
MATH CHAPTER ONE ● data and variation = involved in statistics ○ collecting info, summarizing, organizing, displaying ■ descriptive statistics ● probability statistics = using info to draw conclusions ● inferential stats ● population = everyone about to be studied ○ not always people ● sample = only a portion of the population ○ key to studying stats ● parameter = numerical measure of population ● statistic = numerical measure of sample ● variable = characteristic of people/things ● understanding data ○ look at context ■ what/why/how/who ■ who were the people that the idea was collected about ● 2 types of data ○ numerical = numbers ○ categorical = data that falls into a category ■ qualitative ● if you can perform meaningful mathematics with it, it’s probably numerical ○ 2 types: ■ discreet = where the answer must be given in whole numbers ● no decimals ■ continuous = measurable… like a person’s weight ● can be a decimal ● binary data = yes or no, true or false, one of two answers Sampling ● hope to have random sample = everyone in population had the opportunity to be in the sample ● techniques: ○ convenience sample: not a good technique ■ ex: giving 3 people closest to you the survey ○ simple random sampling: ex. putting everyone’s name in a hat ■ another example: using a random number table ○ cluster sampling: randomly selecting a cluster 1 ■ after selecting a cluster, you ask everyone in the cluster ● ex. airlines selecting a flight and surveying everyone on the flight ○ systematic: figure out how many people you want in a sample, divide that into a population, and use that to determine how many people to skip each time ■ ex. every 11th person gets a candy bar ■ have to find a place to randomly start ○ stratify: separate the groups ■ ex: 5 prizes, 4 to a male, 1 to a female ● variation = if you drew 2 circles, they would be similar but not exactly the same ● binary observation = two possible outcomes only ● it’s not always horrible to use convenience sampling… sometimes it’s the only sampling you can use A good sample is one where everyone in the population is equally likely to be surveyed/chosen. ● variables in algebra class are used to represent numbers or types of numbers ○ variables in stats are used to represent characteristics ● stacked: when each line is dedicated to one person ● stacked: Blood pressure measures Male (coded data) 122 0 117 1 129 1 108 0 111 1 133 0 ● unstacked: Male Female 117 122 129 108 2 111 133 128 138 ● two way table: Females Males Always 52 10 Not Always 28 (35%) 9 (47.36%) ● do not use raw data to make conclusions ● frequency: the counts ● not fair to say that females don’t wear seatbelts more often than males based on the table ○ sometimes you have to convert to percents… when the sizes of the groups are different ● treatment variable: predictor variable ○ independent ○ ex. study time ● outcome variable: response variable ○ dependent ○ ex. test results ● treatment group: receives special medication ● control group: doesn’t get medication ○ don’t want control group to know they’re not getting the treatment ○ placebo: fake pill ○ random assignment to both groups ● anecdotal evidence: testimony ● placebo effect: effect on person who thinks and believes they’re taking something that’s helping them ● observational studies: when the researcher observes people in their natural environments and doesn’t try to interfere ○ researcher has no control ○ you cannot assume that something caused something else ● controlled experiment: when something causes something else ○ researcher has control ● in observational studies, there can be a confounding variable ○ variable that could be the real explanation for why something occurs 3 ● controlled experiments ○ should have large sample sizes: at least 25 ■ random assignment ○ double-blind: researcher doesn’t even know who’s getting which pill so they won’t act differently toward a certain group ○ use placebo if possible 4 MATH CHAPTER TWO ● causation can never be determined from an observational study ○ may be a confounding variable ● in a controlled experiment, you CAN determine causation ● once you gather info, the next step is to organize it and graphically display it ● frequency distribution table: Favorite Color Tally Frequency Red 14 14 Blue 36 36 Yellow 3 3 Green 9 9 Purple 26 26 Orange 4 4 ● relative frequency distribution: same table as a frequency distribution table plus a relative frequency column… frequency divided by the total ○ all relative frequencies equal 1 when added together ● bar graph: y-axis is the frequency ○ spaces ○ make sure bars are equal width ○ start scale down at zero ● relative frequency bar graph: the picture will look the same as a normal bar graph ○ relative frequency (percent) on Y-axis ● pareto chart: bar graph that goes from tallest bar to shortest bar ● organizes bar in descending order of counts ● pie chart: multiply relative frequency by 360 degrees, and you’ll know how big to make the slice ○ write out what percent each category is ● pictographs: bar graphs that uses a visual (smiley face or star) and then a key ● with numerical data, you don’t use a pareto chart of bar graph ● stat crunch… pareto graph - go to graph - bar graph - count descending ● frequency distributions and relative frequency distributions can be used for both types of data ○ only two displays that can be done with both ● bar graphs can never be used with numerical data ○ histograms can: bar graphs that touch ■ no spaces ● histogram ○ one mound = unimodal ○ shape: ■ symmetrical or skewed ● symmetric uniform ● symmetric u-shaped ● symmetric bell shaped ● potential outlier = you can’t declare anything an outlier until you verify it statistic ● two mounds: bi modal ● more than two mounds: multimodal ● left skewed: not symmetrical and heavier on the left ● right skewed: heavier on the right ● width of the class: bin ○ individual pieces of data are lost ● frequency polygon: found by connecting the points on a histogram ● stem and leaf plot: ○ often called stemplot ○ can give us individual pieces of data ○ efficient for small sets of data ■ not for large data sets ○ turn on its side… histogram ● characteristics of a graph… ○ # of mounds ■ unimodal ■ bimodal ■ multimodal ○ symmetric or skewed ○ any outliers ● descriptive statistics: when you give a summary or generalization about the statistic ● inferential: when you try to summarize or generalize the population ● mew = population mean ● X with a line over it = sample mean ● sigma: obtaining the sum ● the mean is the balancing point ● in skewed distributions, you have to be cautious about using the mean because it’s not typical ○ outliers will sway it ● standard deviation ○ measure of the spread ○ the sum of the deviations about the mean will always be zero ● variance ○ the square of the standard deviation ○ always a squared unit, so we prefer the standard deviation ● empirical rule: 68-95-99.7% rule ● deviation: how far a score is from the mean ● sum of all the deviations is zero ● standard deviation - average distance from the mean ● the higher the z score, the higher the percentile ranking ● z score is the equalizing score that levels the playing field of the two data ● median - middle score ● quartile - split data into 4 groups ● median is the 2nd quartile ● 1st quartile - lower quartile ● 3rd quartile - lower quartile ● take quartile 3 and subtract quartile 1 - interquartile range ● if a distribution is skewed, the preferred distribution method is the median ● interquartile range is not the same as range ● statistic is resistant if you change the stat from higher to lower or from lower to higher value without it changing ● the mean is resistant ● the sum of the deviations always equals zero ● outliers greatly affect the standard deviation ● when the distribution is symmetric and unimodal, the measure of center and dispersion prefered are the mean and standard deviation ● changing any value will change the mean ○ nonresistant ● a resistant number does not change ● IQR = Q3-Q1 ● when a distribution is skewed, use the median and interquartile range ● outlier = more than 3 standard deviations away from the mean ○ 2-3 SDs away… value is considered to be unusual ● when a value is far away, it adds a lot to the standard deviation ● z-score = score minus the mean, divided by the standard deviation ● the higher the z-score, the higher the percentile ● usual rule: when the mean is greater than the median, its skewed to the right ○ when the median is greater than the mean, its skewed to the left ● lower standard deviation = lower variability ● direction is not significant to the SD ● the u-shape tends to have a greater SD ● resistant ○ median ○ interquartile range ○ quartiles ● not resistant ○ mean ○ SD ○ variance ● the SD is preferred over the variance because the units for the variance are always squared ● the median is resistant to outliers ● when comparing two distributions, you should always use the same measure of center and spread for both distributions ○ use the median and interquartile ranges for both ● potential outliers are more than 1.5 interquartile ranges below the first quartile or above the third, not above or below the median ● potential outliers are not the same as outliers ● in a boxplot, the vertical line inside the box marks the location of the median ● the length of the box is proportional to the interquartile range ● the whiskers extend to the most extreme values that are not potential outliers ● box plots are best for unimodal distributions ○ not for bimodal or multimodal ● the mean is not shown in a box plot ● the first quartile is the bottom half of the data and divides the lowest 25% of the data from the highest 75% ● if the median is equal to the first or third quartile, then there is no vertical line in the middle of the boxplot ● values are considered potential outliers if they are between the inner and outer fences


