### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# StatsforScience_Exam3StudyGuide Math 3339

UH

### View Full Document

## About this Document

## 33

## 0

## Popular in Statistics for the Sciences

## Popular in Mathematics (M)

This 13 page Study Guide was uploaded by Aishwarya Juttu on Thursday July 14, 2016. The Study Guide belongs to Math 3339 at University of Houston taught by Prof. C Poliak in Summer 2016. Since its upload, it has received 33 views. For similar materials see Statistics for the Sciences in Mathematics (M) at University of Houston.

## Popular in Mathematics (M)

## Reviews for StatsforScience_Exam3StudyGuide

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 07/14/16

Exam 3 Study Guide Lecture 12 concepts - Confidence interval is a range of possible values that the unknown population parameter might be in; if we repeat this process several times we would expect to include the population proportion x% of time • Confidence level C increases, the margin of error(width of interval) increases • Confidence level C decreases, the margin of error(width of interval) decreases • As sample size increases the width of the interval decreases - Point estimate is the value that is calculated from the sample data to estimate the unknown population parameter • Sample Proportions pˆ=x/n x= random sample drawn, n=sample size • Sample Mean - To interpret confidence interval: • Write down all given values (average, sample size, margin of error) • +&- the margin of error from the average, it will give you the interval with an upper value and a lower value • Write the interpretation: We are % confident that the question is between lower value and upper value • pˆ+c(-1,1)*qnorm(1+c/2)*sqrt(pˆ*(1-pˆ)/100) - Confidence interval Conditions for proportions 1. Sample must be SRS from the population of interest 2. Population must be at least 10 times the size of the sample 3. Number of successes and number of failures must each be at least 10 (both npˆ ≥ 10 and n(1 − pˆ) ≥ 10) - Confidence level- percentage of times that the confidence interval actually does contain the parameter • C = 100(1 − α)% • confidence level will be given in the problem. If not assume it is 95% - Critical value- is the cut off point where 1-α is the middle area of the density curve • for proportions the critical value is z∗ • z scores can be found using z-table or qnorm((1 + c)/2) where c= confidence level in decimal - Standard Error- standard deviation of the estimate for proportions • Sample Proportions • Sample Mean: SE( x )=s/√n s=standard deviation of sample - Margin of Error • For sample mean: m= critical value x standard error • For sample proportions: - Conditions for estimating the population mean 1. Sample must be a result of SRS 2. Distribution of population must be normal. Central Limit Theorem must pass: sample size is larger than 30 which means it is a normal distribution - Degrees of freedom (df)- shape of distribution changes with different sample sizes • used for the inference of the population mean when the population standard deviation σ is unknown • distribution is bell shaped • df=n-1 • use t-table or qt(probability,df) - Critical value for population mean • For UNKNOWN σ is t∗ where the area between −t∗ and +t∗ under the T-curve is the confidence level C = 1 − α • qt((1 + C)/2, df) • For KNOWN σ is z∗ where the area under theNormal curve is between −z∗ and +z∗ is the confidence level C = 1 − α • qnorm((1 + C)/2) - Margin of Error for population mean: Recall m = critical value × standard error • For UNKNOWN σ, the margin of error for estimating the mean µ: • For KNOWN σ, the margin of error for estimating the mean µ: with df=n-1 - One sample t test on R t.test(x,conf.level=C) Lecture 13 concepts - Special gamma distribution- if the pdf of X is the gamma density with α = ν/2 and β = 2. The pdf of a chi-squared random variable is • If X has a chi-squared distribution we often represent this as X ∼ χ^2(ν) • v is called the number of degrees of freedom (df) - 6.7 Theorem: If X1 ∼ Gamma(α1, β) and X2 ∼ Gamma(α2, β) are independent, then X1 + X2 ∼ Gamma(α1 + α2, β) - Corollaries from Theorem 6.7 - Confidence interval σ has lower and upper limits that are the square roots of the corresponding limits in the interval for σ^2 • σ^2= mean √σ^2= standard deviation • • lower limit ((n-1)s^2)/(gchisq((1+d)/2),n-1) • upper limit ((n-1)s^2)/(gchisq((d/2),n-1) - If the statistics are given, to find the limits of confidence intervals for means or proportions (n>30) where SE= s/√n • lcl = point estimate − qnorm((1 + C)/2) ∗ SE • ucl = point estimate + qnorm((1 + C)/2) ∗ SE • Var: c(lul,ucl) • SD: sqrt(c(lul,ucl)) - If the statistics are given, to find the limits of confidence intervals for means or proportions (n<30) • lcl = point estimate − qt((1 + C)/2) ∗ SE • ucl = point estimate + qt((1 + C)/2) ∗ SE • Var: c(lul,ucl) SD: sqrt(c(lul,ucl)) • - Normal quantile plot- a graph used for assessing normality of data in order to use the t-interval and the confidence interval for variances, if the plot follows a straight line then we can say we have an approximate normal distribution • First qqnorm(x) then qqline(x) Lecture 14 concepts - Null hypothesis- assume no effect for the parameter tested, H0 : µ = µ0, value is always equal to what we assume the mean to be - Alternative hypothesis- state we suspect is true instead of null, Ha : µ, test of significance is designed to assess the strength of the evidence against the null hypothesis • Test to prove that the mean is really lower than what is assumed. This is called a left-tailed test. Ha : µ < µ0 • Test to prove that the mean is greater than what is assumed. This is called a right-tailed test. Ha : µ > µ0 • Test to prove that the mean is not equal (either higher or lower) than what is assumed. This is called a two-tailed test. Ha : µ≠µ0 - Decision & errors • Reject the null hypothesis in favor of the alternative hypothesis (RTH0) Fail to reject the null hypothesis (FTRH0) • • Type 1 error- rejecting null hypothesis when in fact it is true, WORSE than type 2 • Type 2 error- not rejecting null hypothesis when it is false - Test statistic- used to measure the difference between the data and what is expected on the null hypothesis • For UNKNOWN σ about the population mean: • For KNOWN σ about the population mean: • Rejection region- set of value for which the test statistic least to a rejection of the null hypothesis • Critical value is the boundary of the rejection region • Left tailed Ha: µ<µ0 Area= α • Right tailed Ha: µ>µ0 Area= α • Two-tailed Ha: µ≠µ0 Area= α/2 - Significance level- denoted by α to compare with the p-value in order to decide to reject H0 or fail to reject H0 • Reject H0 if the p-value is as small or smaller than α, thus the data is significant at level α • Do not reject H0 if the p-value is larger than α • If the α is not given, then we can assume it is 0.05 If p-value is less than 1%= very very strong evidence that the Ha is true • • If p-value is between 1%-5%= strong evidence that the Ha is true • If p-value is between 5%-10%=weak evidence that the Ha is true • If p-value is higher than 10%= no evidence that the Ha is true - P-value- probability that the test statistic would take a value that agrees with Ha • If Ha : µ < µ0, then P-value = P(Z < test statistic) pnorm(z) • If Ha : µ > µ0, then P-value = P(Z > test statistic) 1-pnorm(z) • If Ha : µ ≠ µ0, then • P-value = P(Z < −test statistic orZ > +test statistic)= P(Z < −test statistic) + P(Z > +test statistic)= 2P(Z > |test statistic|) 2*pnorm(-z) • Replace Z with T if t-test is used - Conditions for tests • Z-test 1. An SRS of size n from the population 2. Known population standard deviation, σ 3. Either a Normal population or a large sample (n ≥ 30) • T-test 1. An SRS of size n form the population 2. Unknown population standard deviation 3. Either a Normal population or a large sample (n ≥ 30) - Matched pairs test- test when comparing corresponding values in data, can only be used when our data samples are dependent upon one another µd=mean of differences • Conditions for matched pair t-test • Each sample is an SRS of size n from the same population • The test is conducted on paired data (the samples are NOT independent) • Unknown population standard deviation Either a Normal population or large samples (n ≥ 30 ) • • Hypotheses - H0 : µd = 0 and Ha : µd ≠ 0 or µd < 0 or µd > 0. - Inference for population proportion- p0 represents the given population proportion and the hypothesis will be • H0 : p = p0 • Ha : p ≠ p0 or p < p0 or p > p0 • Conditions: 1. The sample must be an SRS from the population of interest 2. The population must be at least 10 times the size of the sample 3. The number of successes and the number of failures must each be at least 10 (both npˆ ≥ 10 and n(1 − pˆ) ≥ 10) • If the conditions are met, use z-test: Lecture 15 concepts - One sample t-test t.test(x,mu=µ,alternative=“two.sided”) mu=µ, x=dataname - Bivariate data- data with two different variables • Response variable- measures an outcome of a study, dependent variable (y-axis) • Explanatory variable- explains or influences the response variable, independent variable (x- axis) - Scatterplots- best for quantitative variables plot(explanatory,response) • Direction- pattern with negative or positive direction • Form- straight line relationship or no pattern at all • Strength- how much the points follow a single stream (strong, moderate or weak association) • Outliers- values outside the trend - Correlation coefficient- value that measures the direction and strength of a straight-line relationship between the two variables, written as r for a sample: cor(x,y) If we know covariance between x & y: • • Positive correlation= positive association, negative correlation=negative association • r is always between -1 and 1 • values close to 0 indicate very weak linear relationship • closer to -1=very strong negative relationship • closer to 1=very strong positive relationship • Regression line- straight line that describes how a response variable y changes as an explanatory variable x changes - Least-squares regression line (LSRL)- line that makes the sum of the squares of the vertical distances of the data points form the line as small as possible Y = β0 + β1x + ε • y=dependent variable • x=independent variable • β0=population intercept of the line • β1=slope of population • ε= error which is 0 • First name.lm=lm(y~x) then summary(x.lm) - Coefficient of determination- R^2, measure of how successful the regression equation was in prediction the y variable, the closer to 100, the better the equation • Format: this much fraction of variability in y variable that is explained by the LSRL with the x variable • Multiple R-squared= R^2 - Residual- difference between observed value of y-variable and the predicted value by the regression line • residual= observed y- predicted y • the closer the residuals are to zero, the better we are at predicting the y-variable • Linear models: name.lm=lm(y~x) • Plot: plot(x,resid(name.lm)) • Curved pattern= not a linear relationship • Increases spread= as x increases, prediction of y will be less accurate for larger x • Decreasing spread= as x increases, prediction of y will be more accurate for larger x • Individual points with larger residuals are outliers • Use R^2 and the residuals to determine if the equation is a good way of predicting the y variable - Residual standard error • Use R^2 and the residuals to determine if the equation is a good way of predicting the y variable - Error sum of squares - Total sum of squares- a quantitative measure of total amount of variation in observed values - Regression sum of squares- amount of total variation that is explained by the model - Coefficient of determination- r^2 - Analysis of Variance Table anova(shelf.lm) - Conditions for regression inference • sample is an SRS from population • Linear relationship • sd of responses about the population line is the same for all values of the explanatory variable • Response varies normally about the population regression line - T-test for significance of β1 • Hypothesis • Test statistic t= (observed-hypothesized)-sd of observed • P-value: two-tailed test • Decision: reject H0 if p-value ≤ α • Conclusion: : If H0 is rejected we conclude that the x variable can be used to predict the y- variable - Confidence Intervals for β1- for range of possible values for the slope we can use a CI b1 ± t∗ × SEb • • t is from table D with df=n-2 • confint(name.lm,level=c) - Inferences with µˆy • Let Yˆ = βˆ0 + βˆ1x∗ where x∗ is some fixed value of x • mean value of Yˆ • variance of Yˆ • Yˆ has a normal distribution • The 100(1 − α)% confidence interval for µY that is the expected value of Y for a specific value of x∗, is • • predict(name.lm,newdata=data.frame(y=n),interval="c",level = c) - F-distribution- distribution of a random variable, can use it to test the hypothesis H0: β = 1 versus Ha : β ≠ 0 • F= (U/v1)/(V/v2) • U= SSR/(σ^2) ~X^2 (df=1=v1) • V=SSE/(σ^2)~X^2 (df=n-2=v2) • U&V are independent • F=t^2 and p-value is the same • Analysis of Variance Table anova(shelf.lm) Formulas - Sample proportions: • Sample Proportions pˆ=x/n x= random sample drawn, n=sample size • Standard of Error for sample proportions Margin of Error for sample proportions: • - Sample mean: • Sample Mean • Standard of Error for sample Mean: SE( x )=s/√n s=standard deviation of sample • Margin of Error for sample mean: m= critical value x standard error For UNKNOWN σ, the margin of error for estimating the mean µ: • • For KNOWN σ, the margin of error for estimating the mean µ: with df=n-1 - Test statistics • use t-table or qt(probability,df) For KNOWN σ about the population mean: • - Special gamma distribution confidence interval • σ^2= mean • √σ^2= standard deviation • lower limit ((n-1)s^2)/(gchisq((1+d)/2),n-1) • upper limit ((n-1)s^2)/(gchisq((d/2),n-1) - Z-test If we know covariance between x & y: • - LSRLY = β0 + β1x + ε - Residual= observed y- predicted y - Residual standard error - Error sum of squares - Total sum of squares- a quantitative measure of total amount of variation in observed values - Regression sum of squares- amount of total variation that is explained by the model - Coefficient of determination- r^2 - Confidence Intervals for β1 • b1 ± t∗ × SEb • t is from table D with df=n-2 - Inferences with µˆy • Let Yˆ = βˆ0 + βˆ1x∗ where x∗ is some fixed value of x • mean value of Yˆ • variance of Yˆ • The 100(1 − α)% confidence interval for µY that is the expected value of Y for a specific value of x∗, is - F-distribution • F= (U/v1)/(V/v2) - R-studio • Confidence interval r code: pˆ+c(-1,1)*qnorm(1+c/2)*sqrt(pˆ*(1-pˆ)/100) Z-scores: qnorm((1 + c)/2) • • Critical value • unknown σ qt((1 + C)/2, df) • known σ qnorm((1 + C)/2) • If the statistics are given, to find the limits of confidence intervals for means or proportions (n>30) where SE= s/√n • lcl = point estimate − qnorm((1 + C)/2) ∗ SE • ucl = point estimate + qnorm((1 + C)/2) ∗ SE • Var: c(lul,ucl) • SD: sqrt(c(lul,ucl)) • If the statistics are given, to find the limits of confidence intervals for means or proportions (n<30) • lcl = point estimate − qt((1 + C)/2) ∗ SE • ucl = point estimate + qt((1 + C)/2) ∗ SE • Var: c(lul,ucl) • SD: sqrt(c(lul,ucl)) Normal quantile plot: First qqnorm(x) then qqline(x) • • P-value • If Ha : µ < µ0, then P-value = P(Z < test statistic) pnorm(z) • If Ha : µ > µ0, then P-value = P(Z > test statistic) 1-pnorm(z) If Ha : µ ≠ µ0, then • • P-value = P(Z < −test statistic orZ > +test statistic)= P(Z < −test statistic) + P(Z > +test statistic)= 2P(Z > |test statistic|) 2*pnorm(-z) • One sample t-test t.test(x,mu=µ,alternative=“two.sided”) mu=µ, x=dataname Scatterplots- best for quantitative variables plot(explanatory,response) • • Correlation of Coefficient cor(x,y) • LSRL First name.lm=lm(y~x) then summary(x.lm) • Residuals Linear models: name.lm=lm(y~x) • • Plot: plot(x,resid(name.lm)) • Analysis of Variance Table anova(shelf.lm) • Confidence levels for β1 confint(name.lm,level=c)

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "I signed up to be an Elite Notetaker with 2 of my sorority sisters this semester. We just posted our notes weekly and were each making over $600 per month. I LOVE StudySoup!"

#### "There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

#### "Their 'Elite Notetakers' are making over $1,200/month in sales by creating high quality content that helps their classmates in a time of need."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.