Intro to Stats - Lectures 1-3
Intro to Stats - Lectures 1-3 MATH-10041-002
Popular in Introductory Statistics
Popular in Math
verified elite notetaker
This 10 page Class Notes was uploaded by Amy Turk on Saturday April 2, 2016. The Class Notes belongs to MATH-10041-002 at Kent State University taught by Dr. Joseph Minerovic in Spring 2016. Since its upload, it has received 22 views. For similar materials see Introductory Statistics in Math at Kent State University.
Reviews for Intro to Stats - Lectures 1-3
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 04/02/16
MATH CHAPTER ONE ● data and variation = involved in statistics ○ collecting info, summarizing, organizing, displaying ■ descriptive statistics ● probability statistics = using info to draw conclusions ● inferential stats ● population = everyone about to be studied ○ not always people ● sample = only a portion of the population ○ key to studying stats ● parameter = numerical measure of population ● statistic = numerical measure of sample ● variable = characteristic of people/things ● understanding data ○ look at context ■ what/why/how/who ■ who were the people that the idea was collected about ● 2 types of data ○ numerical = numbers ○ categorical = data that falls into a category ■ qualitative ● if you can perform meaningful mathematics with it, it’s probably numerical ○ 2 types: ■ discreet = where the answer must be given in whole numbers ● no decimals ■ continuous = measurable… like a person’s weight ● can be a decimal ● binary data = yes or no, true or false, one of two answers Sampling ● hope to have random sample = everyone in population had the opportunity to be in the sample ● techniques: ○ convenience sample: not a good technique ■ ex: giving 3 people closest to you the survey ○ simple random sampling: ex. putting everyone’s name in a hat ■ another example: using a random number table ○ cluster sampling: randomly selecting a cluster 1 ■ after selecting a cluster, you ask everyone in the cluster ● ex. airlines selecting a flight and surveying everyone on the flight ○ systematic: figure out how many people you want in a sample, divide that into a population, and use that to determine how many people to skip each time ■ ex. every 11th person gets a candy bar ■ have to find a place to randomly start ○ stratify: separate the groups ■ ex: 5 prizes, 4 to a male, 1 to a female ● variation = if you drew 2 circles, they would be similar but not exactly the same ● binary observation = two possible outcomes only ● it’s not always horrible to use convenience sampling… sometimes it’s the only sampling you can use A good sample is one where everyone in the population is equally likely to be surveyed/chosen. ● variables in algebra class are used to represent numbers or types of numbers ○ variables in stats are used to represent characteristics ● stacked: when each line is dedicated to one person ● stacked: Blood pressure measures Male (coded data) 122 0 117 1 129 1 108 0 111 1 133 0 ● unstacked: Male Female 117 122 129 108 2 111 133 128 138 ● two way table: Females Males Always 52 10 Not Always 28 (35%) 9 (47.36%) ● do not use raw data to make conclusions ● frequency: the counts ● not fair to say that females don’t wear seatbelts more often than males based on the table ○ sometimes you have to convert to percents… when the sizes of the groups are different ● treatment variable: predictor variable ○ independent ○ ex. study time ● outcome variable: response variable ○ dependent ○ ex. test results ● treatment group: receives special medication ● control group: doesn’t get medication ○ don’t want control group to know they’re not getting the treatment ○ placebo: fake pill ○ random assignment to both groups ● anecdotal evidence: testimony ● placebo effect: effect on person who thinks and believes they’re taking something that’s helping them ● observational studies: when the researcher observes people in their natural environments and doesn’t try to interfere ○ researcher has no control ○ you cannot assume that something caused something else ● controlled experiment: when something causes something else ○ researcher has control ● in observational studies, there can be a confounding variable ○ variable that could be the real explanation for why something occurs 3 ● controlled experiments ○ should have large sample sizes: at least 25 ■ random assignment ○ double-blind: researcher doesn’t even know who’s getting which pill so they won’t act differently toward a certain group ○ use placebo if possible 4 MATH CHAPTER TWO ● causation can never be determined from an observational study ○ may be a confounding variable ● in a controlled experiment, you CAN determine causation ● once you gather info, the next step is to organize it and graphically display it ● frequency distribution table: Favorite Color Tally Frequency Red 14 14 Blue 36 36 Yellow 3 3 Green 9 9 Purple 26 26 Orange 4 4 ● relative frequency distribution: same table as a frequency distribution table plus a relative frequency column… frequency divided by the total ○ all relative frequencies equal 1 when added together ● bar graph: y-axis is the frequency ○ spaces ○ make sure bars are equal width ○ start scale down at zero ● relative frequency bar graph: the picture will look the same as a normal bar graph ○ relative frequency (percent) on Y-axis ● pareto chart: bar graph that goes from tallest bar to shortest bar ● organizes bar in descending order of counts ● pie chart: multiply relative frequency by 360 degrees, and you’ll know how big to make the slice ○ write out what percent each category is ● pictographs: bar graphs that uses a visual (smiley face or star) and then a key ● with numerical data, you don’t use a pareto chart of bar graph ● stat crunch… pareto graph - go to graph - bar graph - count descending ● frequency distributions and relative frequency distributions can be used for both types of data ○ only two displays that can be done with both ● bar graphs can never be used with numerical data ○ histograms can: bar graphs that touch ■ no spaces ● histogram ○ one mound = unimodal ○ shape: ■ symmetrical or skewed ● symmetric uniform ● symmetric u-shaped ● symmetric bell shaped ● potential outlier = you can’t declare anything an outlier until you verify it statistic ● two mounds: bi modal ● more than two mounds: multimodal ● left skewed: not symmetrical and heavier on the left ● right skewed: heavier on the right ● width of the class: bin ○ individual pieces of data are lost ● frequency polygon: found by connecting the points on a histogram ● stem and leaf plot: ○ often called stemplot ○ can give us individual pieces of data ○ efficient for small sets of data ■ not for large data sets ○ turn on its side… histogram ● characteristics of a graph… ○ # of mounds ■ unimodal ■ bimodal ■ multimodal ○ symmetric or skewed ○ any outliers ● descriptive statistics: when you give a summary or generalization about the statistic ● inferential: when you try to summarize or generalize the population ● mew = population mean ● X with a line over it = sample mean ● sigma: obtaining the sum ● the mean is the balancing point ● in skewed distributions, you have to be cautious about using the mean because it’s not typical ○ outliers will sway it ● standard deviation ○ measure of the spread ○ the sum of the deviations about the mean will always be zero ● variance ○ the square of the standard deviation ○ always a squared unit, so we prefer the standard deviation ● empirical rule: 68-95-99.7% rule ● deviation: how far a score is from the mean ● sum of all the deviations is zero ● standard deviation - average distance from the mean ● the higher the z score, the higher the percentile ranking ● z score is the equalizing score that levels the playing field of the two data ● median - middle score ● quartile - split data into 4 groups ● median is the 2nd quartile ● 1st quartile - lower quartile ● 3rd quartile - lower quartile ● take quartile 3 and subtract quartile 1 - interquartile range ● if a distribution is skewed, the preferred distribution method is the median ● interquartile range is not the same as range ● statistic is resistant if you change the stat from higher to lower or from lower to higher value without it changing ● the mean is resistant ● the sum of the deviations always equals zero ● outliers greatly affect the standard deviation ● when the distribution is symmetric and unimodal, the measure of center and dispersion prefered are the mean and standard deviation ● changing any value will change the mean ○ nonresistant ● a resistant number does not change ● IQR = Q3-Q1 ● when a distribution is skewed, use the median and interquartile range ● outlier = more than 3 standard deviations away from the mean ○ 2-3 SDs away… value is considered to be unusual ● when a value is far away, it adds a lot to the standard deviation ● z-score = score minus the mean, divided by the standard deviation ● the higher the z-score, the higher the percentile ● usual rule: when the mean is greater than the median, its skewed to the right ○ when the median is greater than the mean, its skewed to the left ● lower standard deviation = lower variability ● direction is not significant to the SD ● the u-shape tends to have a greater SD ● resistant ○ median ○ interquartile range ○ quartiles ● not resistant ○ mean ○ SD ○ variance ● the SD is preferred over the variance because the units for the variance are always squared ● the median is resistant to outliers ● when comparing two distributions, you should always use the same measure of center and spread for both distributions ○ use the median and interquartile ranges for both ● potential outliers are more than 1.5 interquartile ranges below the first quartile or above the third, not above or below the median ● potential outliers are not the same as outliers ● in a boxplot, the vertical line inside the box marks the location of the median ● the length of the box is proportional to the interquartile range ● the whiskers extend to the most extreme values that are not potential outliers ● box plots are best for unimodal distributions ○ not for bimodal or multimodal ● the mean is not shown in a box plot ● the first quartile is the bottom half of the data and divides the lowest 25% of the data from the highest 75% ● if the median is equal to the first or third quartile, then there is no vertical line in the middle of the boxplot ● values are considered potential outliers if they are between the inner and outer fences