# Stats 201 Notes, Week 2 STAT 201

AlliSlaten
CSU
GPA 3.3

## About this Document

These notes cover the material from week 2
General Statistics
Kirk Ketelsen
This 2 page Class Notes was uploaded by AlliSlaten on Monday September 5, 2016. The Class Notes belongs to STAT 201 at Colorado State University taught by Kirk Ketelsen in Fall 2016.

Date Created: 09/05/16
Lecture #2 Bias - Sampling Bias- a type of bias that arises when your sample is not representative of the population of interest • ex. if we only took the height of the CSU women’s volleyball team, that would skew the true average of all US women • ex. land line calling is political bias because there is a difference in age and systematically - Self Selection Bias- when people choose whether or not they want to be included in a sample. The reason they choose to participate can lead to bias - Non- Response Bias- when certain types of respondents are more or less likely to answer a survey - Simple Random Sample- sample of the population where every unit has an equal opportunity of being selected • ex. drawing names out of a hat Experiments - Observational study- variable values are observed and recorded from already existing data • ex. anything from the past is observational - Controlled Experiment- the researcher assigns members of the study to different groups, which are then subjected to different experimental conditions (treatments) • ex. treatment group VS control group (placebo) • The researcher can control the factors in a control experiment - Placebo effect- the group that receives the placebo is called the control group - Correlation does NOT imply causation - Confounding variable- variable that helps explain the data but is not accounted for in the study - Blinding and double blinding are methods that are used to try and eliminate bias that occurs when researchers and subjects are aware of which group is in the study - Double Blinding- both the researcher and sampler are unaware of their position in the study Lecture #3 - Location- where is the dataset “located” along the number line? Where is the center? - Spread- how dispersed (spread out) is the data? - Outliers- Are there any unusual values in the data set? - Shape- What is the shape of the data distribution of values in the data set? - The most succint way to describe the location of a data set is the identify the center Mean and median are used to describe the center - • Mean- the sample average, found by taking sum of all the data and dividing by sample size - Median- put the values in numerical order from smallest to largest and the middle value is the median • ex. 19 observations —-> (19+1)/2 = 20/2= 10, median is the 10th value in the set - Quartiles- breaks the data set into 4 quarters • ex. 10 observations below the median —-> (10+1)/2= 5.5, median is the 5th and 6th value added and divided by 2 - When the mean and median are the same or close it should look evenly distributed - Minimum- smallest value in the data set - Maximum- biggest value in the data set - These could be considered outliers because they may differ from the mean by + or - 3 - 5 number summary- minimum, maximum, Q1 (top half of data), median, Q3 (bottom half of data) -Box plots (image) -50% of the values are in the shaded box -25% of the values are above the box -25% of the values are below the box

