### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# ST 260 final exam review ST 260

UA

### View Full Document

## 13

## 0

## Popular in Statistical Data Analysis

## Popular in Statistics

This 8 page Study Guide was uploaded by Jia Liu on Sunday May 1, 2016. The Study Guide belongs to ST 260 at University of Alabama - Tuscaloosa taught by in Spring 2016. Since its upload, it has received 13 views. For similar materials see Statistical Data Analysis in Statistics at University of Alabama - Tuscaloosa.

## Reviews for ST 260 final exam review

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 05/01/16

Section #1-9 Review Section #1-3 How can we define statistics? • The science of decision-making in the face of uncertainty • The science of transforming data into information for making decisions • The science of summarizing, modeling, and interpreting data for making decisions. Statistical Inference = Drawing conclusions about the population (parameters) from the sample (statistics). 1. Categorical (Qualitative) Variable • Examples: male/female, registered to vote/not, ethnicity, eye color 2. Quantitative Variables –Discrete - usually take on integer values but can take on fractions when variable allows - counts, how many –Continuous - can take on any value at any point along an interval - measurements, how much. Ex: weight, height, income • Cross-sectional data is collected around the same time, but across different sections (or groups) • Time series data is collected over several time periods. –There is only one observation per time period Categorical Data 1. Tabular Displays: Frequency distribution, Relative Freq. Dist., Percent Freq. Dist., Crosstabulation 2. Graphical Displays: Bar Chart, side by side bar chart, stacked bar chart, Pie Chart Quantitative Data 1. Tabular Displays: Frequency distribution, Relative Freq. Dist., %Freq. Dist., Cumulative Freq. Dist., Cum. Rel. Freq. Dist., Cum. % Freq. Dist. Crosstabulation 2. Graphical Displays: Dot Plot, Histograms –Binning, Stem-and-Leaf Display, Scatter Diagram o The relative frequency bar chart looks the same as the bar chart, but shows the proportion of visits in each category rather than counts. o A histogram is similar to a bar chart with the bin counts used as the heights of the bars. Note: there are no gaps between bars unless there are actual gaps in the data. o Before making a histogram, the Quantitative Data Condition must be satisfied: the data values are of a quantitative variable whose units are known. o Caution: Categorical data cannot be displayed in a histogram, and quantitative data cannot be displayed in a bar chart or a pie chart. o The marginal distribution of a variable in a crosstabulation is the total count that occurs when the value of that variable is held constant. o In a contingency table, when the distribution of one variable is the same for all categories of another variable, we say that the variables are independent. Because of the possibility of Simpson’s Paradox: • Examine both aggregated and unaggretated crosstabulation data • Check for a “hidden variables” that will give different results • Use the tabulation (aggregated or unaggregated) that gives better insight – Better yet, show both Descriptive Statistics for Quantitative Data Which measures of center and spread should be used for a distribution? • If the shape is skewed or has outliers, the median and IQR should be reported. • If the shape is unimodal and symmetric, with not outliers, the mean and standard deviation and possibly the median and IQR should be reported. • Always pair the median with the IQR and the mean with the standard deviation. • If there are multiple modes, try to determine if the data can be split into separate groups. • If there are unusual observations point them out and report the mean and standard deviation with and without the values. Standardizing m N−m (k ( n−k –���∈{1,2,…,���} –Without Replacement P (X=K = N (n The hypergeometric distribution is not comprised of Bernoulli trials because the outcome from the previous trials influence the probabilities of subsequent trials A continuous random variable is a random variable that may take on any value in some interval [a, b]. The distribution of the probabilities can be shown with a curve, f (x) called a probability density function (pdf) The Normal Distribution is the most important distribution in statistics! Bell-Shaped and unimodal (one mode) Symmetric, Not skewed ------Mean = Median = Mode --(���≤���) = ���(��� ≥ ���) = –���(���≤���−���) = ���(���≥���+���) The Z-transformation (or Z-score) is used to transform any normal distribution to a standard normal (with mean=0, sd=1). ** “distance from mean in standard deviation units” X−Z If X~��� (���, ���) and , then Z~(0,1) σ Pr(Z > a) = Pr(X > μ+��� ��� ) Pr (���>���) = 1−Pr (���≤���) Pr (���<���) = Pr (���≤���) Section #7-8 •Populations have Parameters (PoP) •Samples provide Statistics (SaS) •Statistics are used to estimate Parameters (SP) Population Parameters (Real thing): Sample Statistics (Estimators): •Mean (μ) •Sample mean ( ´ ) •Distribution/Percentiles •Sample percentiles •Median •Sample median •Proportion (���) •Sample proportion ( ´ ) •Variance (���2) •Sample variance (���2) Statistical Inference = Drawing conclusions about the population (parameters) from the sample (statistics). The sample should be representative of the population •Sampling Bias – When the summary characteristics of the sample differ from the corresponding characteristics of the population it is trying to represent Size of sample is important, not the size of the population Types of Sampling: •Random Sampling – Its random when no one can guess which elements will be chosen •Stratified Sampling –Group elements into homogeneous groups and take samples from within each group •Convenience Sampling –Choose sample according to whatever is easiest –Ex: ask your friends if you are looking good –Bias can obviously be a big issue with this type of sampling Large finite populations can often be treated as infinite. A point estimator estimates the population parameter with a single number If we take 100 samples (each of size n) from a population and construct 99% confidence intervals for the population proportion from each sample, what is the expected number of intervals that will contain the true population parameter? 99 If we repeatedly take samples (of size n=10) from a population and construct 90% confidence intervals for the population proportion 100 times, what is the expected number of intervals that will not contain the true population parameter? 10 (no connection with n) For small finite populations, will the confidence interval be wider or narrower that in infinite populations? n 1 What equation is the point estimator for the population mean? x= n ∑ xi i=1 Below is a 95% and 80% confidence interval. Which one is the 95% interval? Wider one What is the critical value (z*) required to get a 80% confidence interval for a population proportion? 1.28 p(1−´p) Standard error by plugging in the sample proportions SE ´ = √ n s Standard error by plugging in the sample averages SE ´ = √n x N(μ , σ ) Or, z= ´−μ ~ N (0, 1) √n 1/√n ´−μ But, if the standard deviation ��� has to be estimated with ���, then t= has a t- 1/√n distribution with df = n-1 Confidence Interval for averages/means: CI = point estimate ± margin of error s MOE ( ´ ) = t¿ ×������´( ) SE ( x ) = √n ¿ s CI ( x ) = x ± t 2 n t 2 2 Sample¿mean:n= MOE 2×s ( σ ) ¿ 2 z ¿ Sample Size for proportion: ¿ ¿ n=¿ Section #9 1.Scatterplots: Direction, form, strength, outliers x-axis: explanatory, predictor, or independent variable. y-axis: response, or dependent. 2. Correlation: measure the strength of the linear association between x and y (Quantitative variables). Strength = how tightly the points follow a straight line. -1≤r≤1 ∑ zxzy x−x y−y Correlation equation (Correlation coefficient) r= , zx= s , zy= s n−1 x y y y−¿ ¿ ¿ Sample correlation coefficient (x−x ) (sample covariance) ∑ ¿ sxy r= ,sxy¿ sxsy r=1 or -1 is a perfect linear relationship, r=0 is a lack of linear relationship (no linear pattern) The sign of the correlation gives the direction of the relationship. Correlation has no units, so shifting or scaling the data, standardizing, or even swapping the variables has no effect on the numerical value. A large correlation is not a sign of a causal relationship. Correlation ≠ causation (lurking variable) **Don’t correlate categorical variables. 3.Linear model: y=b 0b x1 b 0intercept ,1 :slope y represent an approximate or predicted value. True Regression Equation: y=β +0 x1 +error Estimated Regression Equation: y=b 0b x1 Rewrite the regression equation: zy=r zx 4.Residuals: e=y− y y: observed value y : predicted value 5. Least squares line: The regression (best fit) line doesn’t pass through all the points, but it is the best compromise in the sense that the sum of squares of the residuals is the smallest possible. The slope tells us the change in y per unit change in x. When plotted against the predictive values, the residuals should show no pattern, no change and no direction in spread. sy 6. b 1r s , the slope gets its sign from the correlation. b0=y−b x1 x 2 2 7. Variance in model: 0≤ r ≤1 r :coefficient of determination( percentage of variation) r : fraction of data’s variance accounted for by the model, how well a model fits. If the correlation were 1.0, then the model predicts y perfectly, the residuals would all be zero and have no variance. If the correlation were 0, the model would predict the mean for all x-value. The residuals would have the same variability as the original data. r=(signof b ) √ 2 1 8. Assumption: Models are useful only when specific assumptions are reasonable. We check conditions that provide information about assumption: 1). Quantitative Data Condition --- linear models only make sense for quantitative data, so don’t be fooled by categorical data recorded as numbers. 2). Linearity Assumption check Linearity Condition --- two variables must have a linear association, or a linear model won’t mean a thing. 3). Outlier Condition --- outliers can dramatically change a regression model. 4). Equal Spread Condition --- check a residual plot for equal scatter for all x-values. 9. Confidence Interval: the regression equation y=b +0 x1 gives a point estimate for the mean of y at a particular value of x. Prediction Interval: which gives an interval estimate of the value of a new y at (given) x. This will be wider than the confidence because we are estimating the value of a single observation, not the average. ** Slope: the slope of k means that for every extra unit of x, the y of the ,,,,, is predicted to decrease or increase by k units of y. OR: The slope is k. Based on this model, each additional unit of x tends to require an additional y units of y. **Intercept: the intercept of m is the value of the regression line when x=0. If it is not very reasonable (meaningful), the reason is that the x would not be 0.

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

#### "When you're taking detailed notes and trying to help everyone else out in the class, it really helps you learn and understand the material...plus I made $280 on my first study guide!"

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.