### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# STAT 2004: Final Exam Study Guide STAT 2004

Virginia Tech

GPA 3.62

### View Full Document

## About this Document

## 57

## 0

## Popular in Introductory Statistics

## Popular in Statistics

This 12 page Study Guide was uploaded by Mara DePena on Thursday May 5, 2016. The Study Guide belongs to STAT 2004 at Virginia Polytechnic Institute and State University taught by Metzger in Spring 2016. Since its upload, it has received 57 views. For similar materials see Introductory Statistics in Statistics at Virginia Polytechnic Institute and State University.

## Popular in Statistics

## Reviews for STAT 2004: Final Exam Study Guide

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 05/05/16

STAT 2004 STUDY GUIDE: FINAL EXAM TABLE OF CONTENTS Course Logistics/About the Exam …………………………………………………………………………….. 2 Populations and Samples …………………………………………………………………………………………. 2 Sampling Methods……………………………………………………………………………………………………. 2 Experimental Design…………………………………………………………………………………………………. 3 Visualizing Numerical Data……………………………………………………………………………………….. 3 Distribution………………………………………………………………………………………………………………. 4 Boxplots……………………………………………………………………………………………………………………. 4 Robustness……………………………………………………………………………………………………………….. 5 Probability………………………………………………………………………………………………………………… 5 Probability Distribution…………………………………………………………………………………………..... 6 Symbols…………………………………………………………………………………………………………………….. 7 Z-Score Problems…………………………….……………………………………………………………………….. 7 Evaluating Normality ……………………………………………………………………………………………….. 8 Bernoulli Distribution……………………………………………………………………………………………….. 8 Binomial Distribution………………………………………………………………………………………………… 8 Standard Error and Confidence Intervals………………………………………………………………….. 10 Statistical Hypotheses…………………………..………………………………………………………………….. 10 Making a Non-95% Confidence Interval……………………………………………………………………. 11 Summary of Approaching Hypotheses Problems……………………………………………………... 11 Central Limit Theorem……………………………………………………………………………………………... 11 T-Distribution…………………………………………………………………………………………………………....11 Hypothesis Testing Rationale……………………………..…………………………………………………….. 11 Chi-Square Test……….……………………………………………………………………….………………………. 11 Basics of Linear Regression…………………………………………………………………………………..….. 12 Finding a P-Value………………………………………………………………………………………………….….. 12 2 COURSE LOGISTICS/ABOUT THE EXAM • You must bring a calculator to the exam. The exam will consist of multiple choice and short answer questions. • Statistics is the study of how best to collect, analyze, and draw conclusions from data. • Data consists of observations, and these observations form the backbone of a statistical investigation. POPULATIONS AND SAMPLES • Population- Represents all people or things of interest. • Sample- Observed/measured subset of a population. • Summary statistic- A single number that summarizes a large amount of data. • Variables- Measured or observed characteristics of data. o Categorical variable- Responses themselves are categories. § Nominal- Unordered levels. § Ordinal- Ordered levels. o Numerical variable- Counts/measures information. Can take a wide range of numerical values. Sensible to add, subtract, or average these values. § Discrete- Finite, countable scale. Can only take numerical values with jumps. (Ex: 1, 2, 3, 4…) § Continuous- Continuous scale. (Height, weight, etc.) o Associated/dependent/correlated- When two variables show some connection with one another. o Independent- When two variables are not associated. o Correlation is not causation. o Explanatory variable- In scientific terms, this is the independent variable. o Response variable- In scientific terms, this is the dependent variable. o Confounding variable- Variable that is correlated with both the explanatory and response variables. SAMPLING METHODS • We seek to randomly select samples from a population. • Bias- When a sample is skewed to a person’s interests. o Non-response bias- Can skew results when people do not respond. • Sample frame- List/roster of all potential observations (numbered.) • Simple random sample- Most basic random sample. Equivalent to using a raffle. All observations have an equal chance of being chosen. • Stratified random sample- Divide-and-conquer sampling strategy. Population is divided into groups called strata by demographics/subgroups. Similar cases grouped together. o Random samples are drawn from each strata. • Cluster sample- Population is divided into clusters, often but not always by location. o All members are measured/given treatment. EXPERIMENTAL DESIGN • Observational study- A data analysis where data is collected in a way that does not directly interfere with how the data arises. 3 • Experiment- Used to investigate the possibility of causation. Has an explanatory and response variable. (Independent and dependent in scientific terms.) o Randomized- When individuals are randomly assigned to a group. o Placebo- Fake treatment. • Prospective study- Identifies individuals and collects information as events unfold. • Retrospective study- Collect data after events have taken place. • Principles of Experiment Design: 1. Randomization- Subjects sampled randomly, treatments/control assigned randomly. 2. Replication- Large sample size based on cost/convenience. 3. Error control- Eliminate/account for any differences in the sample. • Placebo • Blinding- Subjects do not know if they are in the treatment or control group. • Double-blinding- Researcher also doesn’t know who is treatment/control. • Blocking- Group subjects into blocks who share some other variable. • Treatments are applied to experimental units. • Response is measured on observational units. o Ex: If you modify the temperature in several fish tanks and record the heart rate of fish in different temperature tanks, the tanks are the experimental units while the fish are the observational units. VISUALIZING NUMERICAL DATA • Dot plot • Histogram o Sorts things into categories and provides a view of data density. o Right-skewed- Data trails off to the right. o Left-skewed- Data trails off to the left. o Unimodal- One prominent peak. o Bimodal- Two peaks. o Multimodal- Three peaks. • Scatterplot o Provides a case-by-case view of data for two numerical variables. 150 100 50 0 0 5 10 15 • Stem-and-leaf plot o Data set: {1, 2, 4, 7, 7, 7, 12, 15, 18, 22, 24} 2 2, 4 1 2, 5, 8 4 0 1, 2, 4, 7, 7, 7 DISTRIBUTION • Describes shape, center, and spread/variation of data. For the images below, imagine histograms that fit the depicted curves. • Sample standard deviation- Tells you how spread out your data is. It is the square root of the variance. BOXPLOTS • A boxplot uses a five number summary consisting of the median (Q ), minimum, 2 th th maximum, and 25 (Q ) and 75 (Q1) percentile. It3summarizes a data set while also plotting unusual observations known as outliers. o Outliers- Observations that are extreme relative to the rest of the data. • Below is an example of a boxplot depicting test scores. Boxplots are usually vertical, but this one will be depicted horizontally. o The first step is to draw the median. The second step is to draw a rectangle to represent the middle 50% of the data. 5 50 75 100 • Interquartile range (IQR)- Q -Q3. 1t is the length of the box in the boxplot. • IQR Method o One of the many methods for calculating outliers. § Lower cutoff- Q -1 1.5 x IQR) § Upper cutoff- Q -3 1.5 x IQR) § These upper and lower cutoffs make the whiskers attached to the box. Any points outside of the whisker range are considered outliers and are labeled with a dot. ROBUSTNESS • Robust estimate- Strong/effective in all/most situations and conditions. Outliers do not change it very much. The median and IQR are considered robust estimates. PROBABILITY The proportion of times an outcome would occur if repeated infinitely many times. • Sample space- Represents possible outcomes. • Law of Large Numbers- As the number of trials increases, the estimate goes closer to the true probability. In other words, as a sample size increases a statistic gets closer to the parameter it is estimating. • Example problem one: Find P(7) (probability of rolling a 7) with two fair independent dice. o How can a 7 be rolled? § S (sample space)= {(1,6) or (2,5) or (3,4) or (6,1) or (5,2) or (4,3)} o Mutually exclusive/disjoint- Cannot both happen together. (Ex: Sanders and Bush cannot both be elected president.) In this example, you cannot get two of these results at the same time. § Addition Rule of Disjoint Outcomes- If A1 and A2 represent two disjoint outcomes, then the probability that one of them occurs is given by P(A1 or A2) = P(A1) + P(A2). • In this example we add the individual probabilities for each pair, and multiple the probabilities for each die. • {(1,6) or (2,5) or (3,4) or (6,1) or (5,2) or (4,3)}= (1/6) x (1/6) + (1/6) x (1/6) +(1/6) x (1/6) +(1/6) x (1/6) = 6/36 or 1/6 • A summary of the rules… o If A and B are disjoint… § P(A or B)= P(A) + P(B) o If A and B are independent (knowledge of one doesn’t affect knowledge of the other)… § P(A and B)= P(A) x P(B) 6 o If A and B are not disjoint… § P(A or B)= P(A) + P(B) – P(A and B) • Example problem two: o We have 52 card deck. It consists of the numbers 1-10 and jacks, queens, kings, and aces. Suits are diamond and heart (red), and club and spade (black). o What is the probability of having a red King? § There are 52 cards, and two red Kings: the King of Diamonds and the King of Hearts. Therefore P(red King)= 2/52. o What is the probability of drawing a card that is red or a King? § P(red or King)= P(R) + P(K) – P(R and K) • (26/52) + (4/52) – (2/52)= 28/52 • Venn diagrams can be used as a visual aid when solving probability problems. o We can make a venn diagram for the previous example. One circle represents the red cards, the other the Kings. The overlap represents Kings that are also red cards, and the rectangle represents the remainder of the deck. People who are more visual and less math-minded may prefer drawing a Venn Diagram to working out the formulas. 24 2 2 24 PROBABILITY DISTRIBUTION A table or graph showing all possible outcomes and their probabilities. • Ex: Roll two dice. There are 11 possible sums. X (sum) 2 3 4 5 6 7 8 9 10 11 12 P(X) 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36 7 • Note that not all probability distributions will follow such a curve. Also note that all probabilities add up to one or to 100%. Prob • Possible short answer question for the midterm: a o Describe the distribution of two fair coin flips. § S= { HH, HT, TH, TT} 1/4 (1/4)(1/4)(1/4)(1/4) SYMBOLS HH HT TH TT ???? Median ???? Mean µ Population mean Estimation 2 σ Variance σ Standard deviation Z-SCORE PROBLEMS • Z-score- How many standard deviations away from the mean the observation is. o Z= (observation – mean) / standard deviation o There is a chart that lists the z-scores, and using this chart you can determine the percentage of observations under the left part of the curve corresponding to that z-score. • The steps to solving a problem involving z-score: o Start with an observation o Find the z-score(s) o Look up the corresponding probabilities in the chart § “Less than”: Keep value from table § “More than”: 100% - table value § “Between”: Difference in table values. 8 EVALUATING NORMALITY • 68-95-99.7 (Empirical) Rule o For any normal distribution, one standard deviation away is 68%, two is 95%, and three is 99.7%. BERNOULLI DISTRIBUTION • In a Bernoulli distribution, an outcome has two possibilities: success or failure. o Success- What we were interested in happened. o Success is represented by a 1, while failure is represented by a 0. • Probability of success is represented by a p. • For a Bernoulli random variable. o X ~ Bernoulli (p) o The expectation of a Bernoulli distribution = the probability of success. (E[X]=p) o Variance [X] = p (1-p) o SD [X] = the square root of the variance • Categorical nominal: o You can put the number of successes and failures into a bar plot. 0 1 BINOMIAL DISTRIBUTION • When you choose more than one observation from a Bernoulli distribution, that is a binomial distribution. • Y ~ Binomial (n, p), with Y being a random variable, n being the number of observations and p being the probability. o E[Y]= n x p o V[Y]= n x p x (1-p) 9 • Ex: Randomly select 3 VT students. What is the probability that 2/3 are freshmen? o T ~ Binomial (3, .25) § .25 is the probability and individual is a freshman, assuming the four classes are divided equally throughout the student population. o (.25) ( .75) x 3 = 14.1% § The probability of a student being a freshman is raised to the number of freshmen. § The probability of a student being from another class is raised to the number of students from another class. § This is multiplied by the number of combinations: 3 in this case. • In order to determine the number of combinations, you need a calculator. o TI-30: § Type in the number of people total, click PROB, click nCr, hit enter, then type in the number you are choosing from the total number o TI-84 § Type in the number of people total, click MATH, click PROB, click nCr, hit enter, then type in the number you are choosing from the total number • If a question asks for the probability of “at most __ out of __,” you must calculate all three probabilities and add them together. • If a questions asks for the probability of “at least __ out of ___,” you could add all the probabilities, or take the probability of the opposite. • You can change a binomial distribution into a normal curve as well. o Say X ~ Binomial (50, .9) § Multiple 50 and .9 together for the mean value. § Then, find the variance of the binomial distribution to find the standard deviation. § You now have the two values for the normal curve. N ~ (45, 2.12) § You can use this to find the z-score and use your chart to find the probability. • Binomial distributions can be done by hand or by normal approximation. o By hand: § Baseball game. N=9 innings § P(rain)= .4. Find the probability it rains in 7 or fewer innings. § Opposite of 7 or fewer is 8 or more. This is easier to calculate. § (number of combinations of 8)(.4) (.6) + (number of combinations of 9)(.4) (.6) = .0038 or .38% § 100-.38%= 99.62% o By normal approximation: § Water officials claim the water is 90% clean. § X=single sample 10 § X ~ Bern (.9) § We take 200 samples. § Y=many samples § Y ~ Binom (200, .9) § Find the probability of 175 or fewer clean samples. • This would be a pain by hand. § Use normal approximation. • E[Y]= 180 • Var[Y]= .9 x .1 x 200 = 18 • SD[Y]= 4.24 • N (180, 4.24) § Z= (175-180)/4.24= -1.18 § -1.18 à .1190 = 12% • Individuals have more variation than means. STANDARD ERROR AND CONFIDENCE INTERVALS • Standard deviation- How individuals vary. • Standard error- How much means/averages vary. It is the variance divided by n, or the standard deviation divided by the square root of n. • Point estimate- A single value used to estimate a parameter. • Sampling distribution- Distribution of an entire sample. • Confidence interval- A plausible range of values for the population parameter. o For example: If you have a 95% confidence interval, [1, 5], you would say, “I am 95% confident that the true mean lies between 1 and 5.” You could also say, “If we repeat the sample many times, about 95% of our confidence intervals would include the true mean.” However, you can not say that there is a 95% probability that the true mean is within 1 and 5, because the true mean is not a random variable. Therefore, you cannot make probability statements about it. • Alpha level- The % of the confidence interval that will miss the true parameter. o For a 95% confidence interval, the alpha level is 5%. • The larger the confidence interval, the better. STATISTICAL HYPOTHESES Research questions we wish to test using statistical evidence. o Ex: It’s always windy in Blacksburg. § Suppose W USpresents wind speed in the US. W ~N(US3) § Null hypothesis- µ ≤ 7 mph Blacksburg § Alternative hypothesis- µBlacksburg mph o The alternative hypothesis is what you think the truth is. The null hypothesis always includes an equal sign. • Writing hypotheses: o Make alternative based on what you want to show. o Make null the opposite. 11 MAKING A NON-95% CONFIDENCE INTERVAL o Determine what the leftover or alpha level is o Determine the cutoff z-score (z*) using the z-table § Look it up and use it to determine x, the x, the cutoff SUMMARY OF APPROACHING HYPOTHESES PROBLEMS o Create hypotheses o Set alpha level (given: use 0.5 if it is not given) o If the scenario is two sided (equals or not equal to) § Build a confidence interval from alpha, x bar, and the standard error § Is the null hypothesis µ beyond the confidence interval? • If yes, reject the null • If no, fail to reject the null o If the scenario is one sided (less than or greater than) § Build distribution as if null were true § Determine cutoff based on alpha, mu, standard error § Is your x bar beyond the cutoff? • If yes, reject the null • If no, fail to reject the null CENTRAL LIMIT THEOREM • No matter the observation distribution, for a large n, x bar ~ N (mu, standard error.) (x bar follows a normal distribution) o This math won’t be tested, but the concept is important. T-DISTRIBUTION • Has more outliers. Handle problems the same way you would handle a normal distribution, but use the t-table chart. o Degrees of freedom- How many pieces of info? It is equal to n-1, and follows the t when written in distribution. For example, if it had 7 degrees of freedom the distribution would be written as t 17mu, S.E.). • Use a T-test when sigma is unknown. • Use cutoffs to test, not p-values. HYPOTHESIS TESTING RATIONALE • Make an assumption. Is what you observed likely or unlikely? • Compare what you observe to what you expect. CHI-SQUARE TEST • How different is what we observed from what we expected? • For each category, calculate (▯▯▯▯▯▯▯▯▯▯▯▯▯▯▯▯▯)▯ . Add these values together for the chi- ▯▯▯▯▯▯▯▯ square value. • The degrees of freedom for a chi-square value is the number of categories minus 1. o Look up the actual chi-square value on the table. 12 o Is the chi-square value you calculated larger than the value on the table? If so, reject the null. If not, fail to reject the null. BASICS OF LINEAR REGRESSION • Create an equation of a line that fits through data points as well as possible. • X is the independent variable while Y is the dependent variable. • Questions to ask yourself: o Are the points perfectly straight? o What is the best line? o What are the slope and y-intercept? o What’s the correlation (r)? § R is between -1 and +1. • Fit a line through the points. o Y= mx+b is changed to y=β +β x,w0th 1 being 0he y-intercept and β being the1 slope. o β a0d β ar1 considered true parameters. Use estimates when using statistics, with the hats over them. • Residual- How far is the line from the actual point? • Best line- Minimizes sum of the residuals squared. • You need to know how to estimate parameters for the final. • Inference- Making decisions based on data. FINDING A P-VALUE • To find the p-value, calculate the z-score from your observation, and look up it’s probability on the table. That is the p-value. • P-value- Probability of observing data as or more extreme as what was observed, assuming H i0 true. • If the P-value is less than the alpha, you reject the null hypothesis. If it is greater than alpha, you fail to reject the null.

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "I bought an awesome study guide, which helped me get an A in my Math 34B class this quarter!"

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.