Statistics 201 Notes For exam 1
Statistics 201 Notes For exam 1 STAT-201
Popular in General Statistics
Popular in Statistics
This 8 page Study Guide was uploaded by Jessica Namesnik on Monday September 19, 2016. The Study Guide belongs to STAT-201 at Colorado State University taught by Kirk Ketelsen in Fall 2016. Since its upload, it has received 25 views. For similar materials see General Statistics in Statistics at Colorado State University.
Reviews for Statistics 201 Notes For exam 1
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 09/19/16
Statistics Note Bundle Exam 1 Statistics 201 8/25/16 Statistics how data is collected, analyzed& interpreted descriptive statistics describes a dataset, the data itself don’t generalize the facts about dataset to a larger group. inferential generalize (generalizations are called inferences), contain uncertainty. Data information, takes the form of observed measurements or descriptions Stats how we apply data to the real world Variables items of interest, can take on different values, the type of measurement being taken Population entire overall group we are interested in Population of interest/target population i.e if we want average height of US girls, the population of interest is US girls. Can be large or small. Parameter # pertaining to a population (ie height of U.S. girls Statistic – any # calculated using data to estimate parameters Sample – subset of entire population we collect data on, the variable of interest is measured on them Observation single member of a sample Census measurements obtained from every member of a sample Conerns> is the sample large enough, is the sample representative of the population of interest? Statistics 201 8/30/16 Bias if the statistic is made in a way that shows it might differ from the population parameter it was meant to estimate. Sampling bias when the sample isn’t representative of the population of interest Self selection bias when individuals select themselves. i.e: when voting for the most talented musician and the musician votes for themselves. Nonresponsive bias when certain types of respondants are more or less likely to answer a survey honestly. i.e: high school kids raising their hands for a survey on virginity. Simple random sample (SRS) – sample of a population where each unit of the population has an equal opportunity to be selected. why? because it can help overcome selfselection bias and sampling bias Observational study variable values observed & recorded from already existing data Controlled experiment researcher assigns members of study to different groups which get different experimental conditions. Treatment group undergoes the procedure Control group does not undergo the procedure Placebo effect if a person believes a treatment will be beneficial, there is a chance they might have the beneficial effect regardless of being treated or not. Correlation doesn’t imply causation Confounding variables help explain data but is not accounted for in the study Blinding – an attempt to eliminate bias by not telling the treatment and control group which is getting the treatment Doubleblinding neither the research groups or the researcher know which group is the control and which is the treatment group. 9/01/16 Location where is the data set “ located” in a # line? Where is its center? Spread how dispersed is the data 5 number summary minimum and maximum, Q Q media1, 3, Outliers any unusual values in the data set Shape what is the shape of the distribution of values in a dataset? Center mean and median mean average, sum of data divided by sample size, denoted by an x with a line above it sample size # of obsrvations in a sample “n” mean = sum of data/ sample size Median – if you put data #’s in order smallest to largest, the # in the middle is the median, separates the upper 50% from the lower 50% Compute rank (n+1)/2 tells you which ordered observation will be the median. If the rank is an integer value (3, 5, ect) go right to it in the ordered data set otherwise compute the average of the 2 surrounding observations. i.e. If the rank=5 go to the 5 th ordered observation for the median. Lower quartile (Q )1below the median, separates the lower 25% from the upper 75% of the data Upper quartile (Q )3 above the median, separates the lower 75% from the top 25% of data To calculate: put parenthesis on either side of the median to separate the lower and upper halves of the data set. i.e 1,2,3,4,5,6,7,8,9 n=9 rank= (9+1)/2 =5 so median is 5 . So 1,2,3,4) 5 (6,7,8,9 . Q =the median of the lower half of the data (1,2,3,4) 1 Q 3the median of the upper half of the data (6,7,8,9) Extremes minimum and maximum Box plot /box &whisker plot Min 1 median Q3 Whiskers go to min, max, or furthest outliers, 50% of data in box, 25% below, 25% above Statistics 201 9/06/16 Dispersion spread Info about location (avg or median) Range difference between max and min Range=maxmin Positive/right skew mean is pulled to the right, and is larger than the median Negative/left skew mean is pulled to the left, and is smaller than the median Symmetric/bell shaped mean is approximately the same as median IQR= Q Q3 1 is not effected by extreme values b/c os ca;culated using values that lie close to the center of the data set. IQR is not used in inferential statistics but are useful as descriptive statistics Variance another measure of dispersion Closely related to standard deviation. Computed using all the data values in the dataset Sensitive to outliers, but not as effected if there are a large number of values (observations) in the data set. Sum of Standard deviations (“sum of squares” or “ss”) to calc ss for a single observation subtract mean from observation and square the result. Do this for all of the observations and sum the result. ss= Σ(x xx)2 i Sample variance (s )= average squared distance that a group of “n” points lies from the mean of the group(n is # of observations) s = Σ(xi xx)^2 (n1) Sample standard deviation (s) square root of the sample variance. It’s the average distance a group of points lies from the mean. if large, the data is highly dispersed, high level of uncertainty. This is mainly used for statistical inferences. What counts as “large” or “ small” depends on the magnitude of the data itself. s = note( from last week’s homework) variables: quantitative numbers continuous ie weight, any # value discrete ie # of visits to the doctor usually integers, specific # values, no decimals. Qualitativenonnumbers nominal no inherent order. Ie eye color. ordinal inherent order. Ie rank based on preference. ON Thursday no real lecture. Mainly talked about football. Did one practice problem 9/13/16 Statistics 201: Probability and normal distribution Random event something that may or may not occur, and that we can assign a probability to. Examples: a coin might fall on heads tomorrow might snow the broncos might win the super bowl Gryffindor might win the house cup Random variable “x” , i.e. the roll of a die Complement is the nonoccurrence or opposite of an event. Examples: coin might fall on tails tomorrow might not snow broncos could lose the superbowl Gryffindor might lose the house cup Probability – quantifying the likelihood of a random event occurring, usually percentages(i.e. 50%), formally proportions (i.e.0.5). Must be between 0 (impossible) and 1 (certain) ( both are rare) Relative frequency how often an event occurs as a proportion of how often it could potentially occur states that probability of an outcome is the proportion of times the outcome would occur over the long run ( if we were to keep repeating a random process indefinitely). Doesn’t need to have a sample size or a denominator Probability notation p(x) is probability that event x occurs. 1p(x) is the probability event x doesn’t occur ( the complement) Standardization finding distance from mean in terms of standard deviations (given a common unit) Zscore value that has been standardized in this manner. shows if data point lies above or below the mean. (+) zscore= above () zscore= below Magnitude shows how far the data point is from the mean in terms of standard deviations. We say that they are unitless, but they are expressed in terms of distance from the mean. Z=(xxx)/s is the value to be standardized. xx is the population mean. S is the sample standard deviation. xx and s are sample statistics, denoted w/English letters 9/15/16 Chebyshev’s rule at least 2 (1(1/k ) x 100% of values in a standard distribution bust lie w/in k of standard deviations of the mean. Applies to any distribution. Bell curve/Gaussian distribution most common, any variable that follows a normal distribution is said to be normally distributed. Empirical rule what % of values of a normal distribution of variables fall within 1,2,3 standard deviations of the mean. Basically the same as Chebyshev’s rule, but only applies to normal distributions.
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'