QTM 100 Exam 1 Study Guide
QTM 100 Exam 1 Study Guide 100
Popular in Intro to Stat Inference
verified elite notetaker
PHIL 1070 - 001
verified elite notetaker
PSY 35000 - 002
verified elite notetaker
verified elite notetaker
verified elite notetaker
verified elite notetaker
Popular in Quantitative Methods
This 10 page Study Guide was uploaded by Bethanie Tabachnik on Friday March 11, 2016. The Study Guide belongs to 100 at Emory University taught by Dr. Gong in Fall 2015. Since its upload, it has received 432 views. For similar materials see Intro to Stat Inference in Quantitative Methods at Emory University.
Reviews for QTM 100 Exam 1 Study Guide
Almost no time left on the clock and my grade on the line. Where else would I go? Bethanie has the best notes period!
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 03/11/16
QTM Exam 1 Study Guide Lecture 2: Foundations of statistics: Anecdotal evidence: o informal observations based on small sample size, requires more formal study. Why use sample vs. population? o Very difficult to get on everyone - expensive, complex, hard to complete o Data on everyone is called a census Descriptive vs. Inferential statistics o Descriptive - use numbers to summarize data o Inferential - used to draw conclusions about a population from samples Statistic vs. Parameter o Statistic: numerical summary of the sample o Parameter: numerical summary of the population - true value always unknown Describing variables: Variable = any characteristic observed in a study o Categorical: descriptive words/phrases Dichotomous: if only two outcomes (gender) Ordinal: natural ordering (grades in school) o Quantitative: observations take on a numerical value Discrete: finite # of possible values (counting) Continuous: infinitely many possible values (measuring) Summarizing Categorical Variables: Frequency table: shows distribution(values, occurrence) of a categorical variable o BEST USED: categorical variable with few categories (ex. dichotomous) o Frequency = # of observations o Proportion = # of observations/ total # of observations o Percentage = proportion x100 o Category w/ the highest frequency = modal category Graphs for Categorical variables o Pie charts are misleading o Bar plot is easy to read - x axis has each category, y axis is frequency Lecture 3: Graphs for Quantitative Variables: Dot plot o 1 dot = 1 observation Smallest, largest, most frequent, etc. Stem and Leaf plot o Stem is on the left, leaf on the right o Right most digit is leaf Histogram o Shows distribution of a single variable to describe overall pattern o Very common, esp. for continuous variables o X-axis shows a range of values divided into groups/intervals Summarizes data, does not give exact values o Good for a large data set Time series plot o Displays data collected over time o Time is on x-axis, variable of interest on y-axis o When points are connected, trends are easy to identify Describing the shape of distribution (overall pattern) Unimodal = 1 peak Symmetrical = mirror image Bell shaped Bimodal = 2 peaks Left-skewed - left tail is longer, most data on the right Right-skewed - right tail is longer, most data on the left Uniform = all values seem equally likely Outliers indicated by large gaps Measuring the Center of Quantitative Data Mean = average (X bar) - most common Median = middle value of the ordered data o odd n = median is middle value o even n = median is the average of the two middle values When mean and median are close, data are approximately symmetrical In the presence of outliers: o Mean is much larger than the median Highly affected by the outliers o Median is resistant to outliers In highly skewed situations, median is preferred to describe the central tendency o Mean < median = left skew o Mean > median = right skew Mode = value of the data that occurs the most frequently Measuring the variability of Quantitative data Range = difference b/t largest & smallest observations o Not resistant to outliers, not good to use, only accounts for 2 values Standard deviation: average distance of an observation from the mean ∑ x−x ) S = x=mean √ n−1 o The larger the SD, the greater the variability o Not resistant to outliers o Larger SD = more values are farther from the mean o One observation, SD is undefined. If all have the same value, SD=0 Variance = S^2 (less common) Lecture 4: The Empirical Rule = when a distribution is unimodal, symmetrical, bell- shaped: Groups o If any data is out of 3 SDs, it is a potential outlier or error Z-score o Z= Value−Mean SD o If >3, this observation is a potential outlier o Can be negative = below the mean Using measures of position Percentile: value such that percent of observations fall below or above that value o Median = 50th percentile o Q1 (first quartile) = 25th percentile (median of the lower 50) o Q3 (third quartile) = 75th percentile (median of the upper 50) Interquantile range o Q3-Q1 Resistant to outliers Range of the middle half of the data o Defining outliers: Q1 - 1.5 x IQR = lower limit Q3 + 1.5 x IQR = upper limit Histogram vs. Boxplot Box plot and 5 number summary o Min, Q1, median, Q3, maximum o Whiskers: extend to min and max values If longer on left = left skewed Cannot tell if bell shaped vs. symmetrical in a box plot o Outlier indicated with circles Histogram Associations Response vs. explanatory variable o Response variable depends on or is explained by explanatory variable There is an association when there is a relationship b/t the two variables No association = independent o x = explanatory variable, y = response variable o Side by side boxplot is helpful for associations 1 numerical and 1 categorical variable o Scatterplot useful for 2 quantitative variables For descriptive statistics, report mean +/- the SD (or median and IQR) Two categorical variables o Contingency table Rows = explanatory variable Columns = response variable o Relevant descriptive statistics are called conditional proportions Lecture 5/6: Types of studies o Experimental: subjects assigned to experimental conditions then outcome/response is recorded Treatments = experimental conditions o Observational Researchers observe both the response and explanatory variable w/o assigning treatments Observational Studies o Participants must be a representative sample from the population So it can be generalizable - otherwise it is bias o Bias = when results don't represent the population Sampling bias: sample may not be random Undercoverage = some part missing Non-response bias: when subjects refuse to participate Missing data: choose not to respond to certain questions Response bias: subjects give inaccurate answers Subjects may lie Question may be subjective or misleading Sampling methods o Sampling frame: list of subjects in the population from which the sample is taken o Sampling design: method used to collect the data o Random sampling methods Simple random sample = each individual equally likely to be sampled Most likely to be representative or the population of interest Unbiased Cluster sample = clusters are naturally occurring groups in the population Good when a reliable sampling frame is not available Take a random sample of individuals from within a random sample of clusters Stratified sample = population is divided into separate groups called strata (groups of smaller individuals) Select a simple random sample from within each stratum Useful for comparing specific groups o Ex: stratify on gender Non-random sampling method Volunteer sample Convenience sample o Both likely to suffer from undercoverage 3 possible associations o Each variable as the explanatory or response o Confounding variable = third variable affects the association o Simpson's Paradox: the direction of association changes b/t the two variables after we analyze the data by levels of a third variable o In observational studies association cannot infer causation Conducting an Experiment Key concepts in experimental design: o Control: compare treatment of interest to control group o Randomize: randomly assign subjects to treatment and control groups o Replicate: collect a sufficiently large sample size or replicate the entire study o Block: account for variables known or suspected to affect the response of interest o Results should not be affected by a confounding variable If observed differences resulted from treatment, causal Terminology o Placebo: fake treatment (control) o Placebo effect: showing change despite being on the placebo o Blinding: experimental units don't know which group they are in o Double blind: both experimental units and researchers don't know the group assignment - avoids BIAS Multifactor Experiments o Categorical explanatory variables in an experiment may be referred to as factors - things we can impose on the experimental units o Blocking - if you cannot randomly assign a variable Separate subjects into blocks (groups) Randomly assign treatment within each block Example: gender Lecture 7: Probability basics: Random phenomena: everyday events where outcome is uncertain o Coin flip, rolling a dice o Trial = each flip o Outcomes = head or tail (can have multiple) Probability: how we quantify this uncertainty or randomness o Probability of an outcome is the proportion of times that an outcome would occur in a long run of observations or trials Law of large numbers: more observations, proportion with a certain outcome converges to the true probability of that outcome. o Gambler's fallacy: coin is not due for a tail after a head!! Working with probabilities Sample space = set of all possible outcomes Event = a particular outcome Disjoint = when events do not have any common outcomes Finding probabilities o Classical Method: o Empirical approach: always between 0 and 1 Union of events o A and B consist of outcomes that are either in A or in B Ex: Deck of cards, Jack or Red o P(A or B) = P(A) + P(B) - P(A and B) Disjoint events: cannot happen at the same time (mutually exclusive) o Always dependent o Coin toss cannot be head and tail, fail and pass a class, ace and queen in cards o P(A or B) = P(A) + P(B) Non-disjoint events: can happen at the same time o Example: Jack and Red Compliment rule: the probability that something does not happen o P(A ) = 1-P(A) o How to use as a shortcut: P(at least 1 boy) = 1-P(no boys) Intersection: outcomes in both A and B o Jack and Red = P(A and B) = 2/52 Independent events: knowing the result of one event provides no useful knowledge about the result of the other o P(A and B) = P(A) x P(B) o Rules: Independent if both hold true P(A|B) = P(A) P(A and B) = P(A) x P(B) Types of Probabilities o Marginal: based on a single variable - found in margin in contingency table o Joint: based on tow or more variables - found inside the table Conditional probability: Probability of A given B occurred o P(A|B) = P(A and B)/P(B) General Multiplication Rule o P(A and B) = P(B) x P(A|B) Interpreting probabilities: o Event is rare and happens by random chance at P = 0.05 Lecture 8 Probability distributions: Random variables: Numerical measurement of the outcome of a random phenomenon o Can be explained by a probability distribution o Discrete (viewed in a table) Random variable X a discrete value Disjoint events o Continuous (viewed in a curve) Area under the curve = 1 Probability of a person being exactly 1 value = 0 Normal Distributions: Describes a continuous random variable o If random variable X follows a normal distribution we say X~N(µ, σ) Standard Normal Deviation o Specific case of the normal distribution where mean µ = 0 and SD σ = 1 Using Z scores o Always for less than - if you need greater than, use 1-P o If given P, look in the table then look in margins for Z o What value such that 15% above it? - look for 0.85 in chart o Intervals: Assessing normality Histogram: look for bell shape Q-Q plot: diagonal line, other patterns suggest other distributions Lecture 9 Binomial Distribution Used for: Dichotomous categorical variables - discrete random variable o Has 1 or 2 possible outcomes (success vs. failure) Conditions for binomial distribution o The trials are independent o The # of trials, n, is fixed o Each trial has 2 possible outcomes (success, failure) o Each trial has the same probability of success, p X = number of successes in n trials o Characterized by parameters n,p o X~ Bin (n,p). Mean (µ) = np SD (σ) = Practice: Lecture 10: Terminology: Population distribution: true distribution of a random variable in a population o We make assumptions - normally distributed, true proportions o Parameter = numerical summary of a population Mean = µ SD = σ Data distribution: distribution of observed values from a sample o Example: observe 1000 SAT scores and summarize with statistics o Statistic = numerical summary of the sample Mean = Xbar Proportion = p hat SD = S o Statistics vary from sample to sample Sampling distribution: Refers to the probability distribution of a statistic o How you would expect a statistic to vary among similar studies o How close the sample mean is to the population mean o Looking for sample mean, proportion How they relate? o As the sample size increases in sampling distribution: More normal shape (bell) Mean stays the same as population mean SD decreases - most data close to the center Data distribution will be same as population distribution Sample mean o Mean = µ σ o SD (variability) = √n n = sample size o When the population distribution from which you are sampling is normally distributed, then the sampling distribution of the sample mean is approx normal regardless of your sample size n Central limit theorem: o As the sample size n increases, the sampling distribution of the sample mean approaches a normal distribution Sample distribution will take on a bell-shaped distribution o If its not normal and small, x bar is not normally distributed Practice: Sampling Distribution of a Sample Proportion Numerical summary for categorical data o P hat = probability of success o Mean = p o SD = p(1−p) √ n o Only if: n is large such that np > 10 (successes) and n(1-p) > 10 (failures), then the distribution is normal P > 0.05 means it is unlikely for something to happen by chance o Out of 2 standard deviations (P<0.05), it is likely it was by chance Practice: