# Statistics Exam 1 Study Guide STA 2023

This 5 page Study Guide was uploaded by kyrabacon on Wednesday February 10, 2016. The Study Guide belongs to STA 2023 at University of Florida taught by Maria Ripol in Spring 2016.

Date Created: 02/10/16

Statistics Study Guide- Exam 1 Feb 17 8:20-10:00pm Covers chapters 1- 6, workbook pages 1-48, and lectures Don’t need to know formulas (except for a few) or z-table- given during test Ch 1- The Art and Science of Learning From Data Statistics- analyzing and translating data into knowledge of world around us o Statistical inference Random Sampling- each member has same chance of being included in sample o Parameters- # summary of population o Statistics- # summary of sample Ch 2- Exploring Data Categorical (groups) v Quantitative (numbers) data o Discrete- finite possible outcomes o Continuous- infinite possible outcomes Know graphs o Pie chart o Bar chart o Dotplots o Stemplot o Histogram Know shapes of graphs o Mound/ Bell o Uniform/ Rectangular o Bimodal o Skewed (right or left depending on direction of tail) Know how to figure out center, spread and outliers Averages o Mean- “balance point” ( o Median – “equal areas” o Mode- most common observations Bell-shaped – mean preferred Skewed – median preferred Bimodal – mode preferred (2 modes) Notation (n = # of observations) o / n Range- max # - min # Variance (s ) - averaged squared deviation 2 2 o s = x -i ) --------------------- n – 1 (represents degrees of freedom) 2 Standard Deviation (s) = sq root of s o Be able to compute this on your calculator! Empirical Rule (based on change of inflection points) o 68% within 1 sd of mean o 95% within 2 sd of mean o 99.7% within 3 sd of mean Quartiles- divide datathnto 4 o Q1 = lower 25 percentile (median of lower half) o Q2 = median/ 50 percentile th o Q3 = upper 75 percentile (median of upper half) IQR = spread of central 50% Q3 – Q1 5 # summary = min, Q1, median/Q2, Q3, max o Know how to graph boxplot Ch 3- Contingency, Correlation and Regression Explanatory variable - predictor Response variable – conclusion Contingency tables o Both are categorical o Displays frequencies o Computes percentages / determines association Conditional Proportions- divide each cell count by total # observations (percentage of data) Scatterplots o X = explanatory o Y = response o Gives a direction – pos/neg, strong/weak o Ŷ = a + bx (b is slope and a is y-int) Correlation (r) – direction and strength of STRAIGHT LINE relationship Between -1 and 1 No units Close to 0 = weak Close to 1 = strong Outliers have strong effect and are not used (high leverage point) o Correlation Coefficient r = xi- ) / x ][iy - ȳ) y s ] ------------------------------------------------- n – 1 neg = big x and small y (and vice versa) pos = big x and big y (and vice versa) o Regression line = equation of line that best fits points Points scattered Uses Ŷ = a + bx Only y-int is x=0 makes sense o Least squares regression finds line that minimizes prediction errors Residuals – difference between observed y and predicted y Y – Ŷ Distances above and below line cancel each other out Passes through (x, ȳ) o b = r (s y s x o a = ȳ - bx o Coefficient of Regression = R 2 % variability in y explained by linear regression on x Extrapolation- make predictions using regression line with a data point outside the range Influential Outliers- any points where x value is far away and falls far from trend o Correlation does not imply causation Lurking variables- variables that are not shown Correlation is used for quantitative data o Simpson’s Paradox- happens in contingency tables where a third variable added in to the 2 categorical variables reverses the association (shows another trend) Ch 4- Gathering Data Statistical Inference- statement about population based on random representative sample Experiments- researchers assigns subjects to treatments Simple Random Survey- every set of n individuals have same chance of being selected Margin of Error- 1 / √n Response variable- one that is measured Experimental units- people in study 3 Principles of Experimental Design Control of variability o blind studies o avoid lurking variables/ confounding effects Randomization o Random samples Replication o # of experimental units Multifactor experiments- treatments are combination of factors Matched Pairs Design- unit is matched with another unit with the same confounding variables and the two are compared against each other Cross-over- unit is own matched pair and receives two treatments Blocked design- similar to matched pair but with 3 or more treatments Types of observational studies Cross-sectional- representative of population Case-control studies- retrospective and match each case with pos/neg outcomes Prospective studies- studies over longer period of time Ch 5- Probability Random phenomenon- cannot predict next outcome but pattern appears in long term Probability of outcome is proportion of times outcome occurs Independent trials Sample space- set of all possible outcomes o S = {..., ..., ...} Events- outcome or group of outcomes Rules for Pairs of Events Complement rule- event does not happen is 1- probability it will happen o P(A ) = 1- P(A) Disjoint events- two events have no outcome in common o P(A or B) = P(A) + P(B) Intersection is overlap of two events in both A and B o P(A and B) = P(A∩B) = 0 Union- all events in A and B o P(A or B) = P(A) + P(B) - P(A∩B) Independence- if two events are independent, than knowledge about one event tells us nothing of the other event o Multiplication Rule- if A and B are ind., then P(A and B) = P(A) X P(B) Conditional Probability- P(A|B) = [P(A∩B)] / P(B) Ch 6- Probability Distributions Random Variable- numerical measurement of outcome of random phenomena o Discrete RV- finite list of possible outcomes Falls between 0 and 1 Adds up to 1 μ = xP(x) o Continuous RV- infinite # of possible outcomes Represented by smooth curve Areas under curve are probabilities = 1 Normal distribution o Bell-shaped or symmetric with center at mean o Spread given SD, represents curvature o X ~ N (μ, σ) ~ = distributed, N = normal o Empirical Rule μ is mean, σ is sd, so one sd away from mean is μ – (1, 2 or 3)σ or μ + (1, 2 or 3)σ We use Z table for sds that are not 1, 2 or 3 Z = (x – μ) / σ (Remember this one!!) o Pos scores = above mean o Neg scores = below mean Standard normal distribution = when mean = 0 and sd = 1 Z ~ N (0, 1) o Z-table shows cumulative probability to left of table Areas same as probabilities and proportions o Finding normal probabilities- standardize x using z-score, then draw shaded area and determine area using z-table o Finding value of x given a proportion- draw picture, look up cumulative area in middle of z, draw shaded area and determine area using z-table Binomial Distribution o Conditions needed 2 outcomes (discrete) Fixed # of trials (n) Trials independent Outcome is “success/failure” or “yes/no” Each trial has same probability of success X = number of successes/yes Distributed binomially [x ~ bin(n, p)] o Formulas x n-x P(x) = P (1-P) (exactly x successes out of n) = n! / x! (n-x)! (# ways of arranging x successes out of n) o n! = factorial = n(n-1)(n-2)(n-3)…0 mean = μ = np standard deviation = σ = √ [np (1-p)]

