S301 Exam 2 Study Guide
S301 Exam 2 Study Guide STAT-S301
Popular in Business Statistics
Popular in Statistics
verified elite notetaker
History 105 Fry- Roots of Contemporary Issues
verified elite notetaker
verified elite notetaker
Criminal Justice 101
verified elite notetaker
This 14 page Study Guide was uploaded by Lauren Detweiler on Sunday March 29, 2015. The Study Guide belongs to STAT-S301 at Indiana University taught by Hannah Bolte in Spring2015. Since its upload, it has received 555 views. For similar materials see Business Statistics in Statistics at Indiana University.
Reviews for S301 Exam 2 Study Guide
Woah...are you an angel? Please tell me you're going to be posting these awesome notes all semester...
-Ms. Javonte Haag
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 03/29/15
S301 Exam 2 Study Guide Prof Hannah Bolte Chapters 1316 0 9 Week 78 Chapters 13 141 and 151152 Required concepts not covered in lecture I VocabularyNotation a N population size b n sample size 0 Strata subsets of similar items in the population d 15 the estimate of the population proportion based on your sample proportion observed e p the actual usually unknown population proportion 11 131 a Two surprising properties i The best way to get a representative sample is to pick members of the population at random ii Larger populations don39t require larger samples b Sources of bias i Bias systematic error in choosing the sample c Randomization selecting a subset of the population at random i Ensures that on average a sample mimics the population 111 Sampling methods a Simple Random Sample SRS A sample of n items chosen by a method that has equal chance of picking any sample of size n from the population i The standard against which we compare other sampling methods b Voluntary Response A sample consisting of individuals who volunteer when given the opportunity to participate in a survey c Convenience Sampling A sampling method that selects individuals who are readily available IV Best Practices Samples and Surveys a Randomize b Plan carefully Match the sampling frame to the target population Keep focused Reduce the amount of nonresponse Pretest your survey V Con dence Intervals a Con dence interval A range of values for a parameter that is compatible with the data in a sample b Degrees of freedom i Controls the shape of student s tdistributions r6999 ii The denominator of an estimate of 02 n 1 in con dence intervals for the mean Required concepts covered in lecture Week 7 1 Types of Samples a Strati ed Sample i Divide your population according to your demographics of interest ii Use SRS to select members from each strata according to the original sample frame distribution b Clustered Sample i Sometimes the population conveniently sorts itself into smaller groups ii Use SRS to randomly select clusters then use SRS to sample within the cluster c Strati ed Clustered Sample i Sometimes the population conveniently sorts itself into strata ii Use SRS to randomly select clusters according to the percent of population represented by each strata then use SRS to sample within the cluster d Voluntary Response i Caution Those who volunteer might not always be good representatives of those who do not volunteer e Convenience i Caution Sometimes it can be tempting to just use whatever is easiest 11 Evaluating the Reliability of Sample Data a Questions to consider for surveys i What was the sampling frame ii Is the sample a simple random sample iii What was the rate of nonresponse iv How was the question worded v Did the interviewer affect the results vi Does survivor bias affect the survey 111 Sampling Variation a Samples won t necessarily have the same mean as the population b CLT assures us that the larger the sample the more likely it is that our sample mean will be close to the true population mean c Sample variation is to be expected i It is the cost of using a sample to represent the population d Sample variation should NOT be a result of bias IV Sampling Error a A statistic calculated from the sample will not generally be exactly the same as a parameter calculated from the population b Sampling error the difference between the statistic and the parameter assuming no bias i Sampling error of the sample mean 1 For a simple random sample the sample mean is neither systematically too high or systematically too low 2 The sampling error is the sample mean X minus the population mean a V Interpreting Variation in Production a Many tests in manufacturing are destructive ie the samples are ruined b HALT Highly Accelerated Life Testing how long until it breaks in fastforward c Is it Random Variation not meaningful or Change in Process meaningful d Even when machinery is functioning properly there will be variation among HALT scores recorded as number of tests passed e We need to understand What to eXpect for HALT scores i Eg on average a device passes a 7 tests before breaking with a standard deviation 0 4 f The Benefits of Averaging i Due to the math behind the CLT l The sampletosample variance among mean HALT scores is smaller than the variance among individual HALT scores 2 The distribution of mean HALT scores appears more bell shaped than the distribution of individual HALT scores VI The Effect of Sample Size on the Variance a As n increases the variance of the potential sample means decrease VarX for large n lt VarX for small 11 b Outliers are averaged in with more and more average values and hence have less and less in uence on the sample mean VII Standard Error of the Mean a Standard Error the standard deviation of the sample means i Proportional to the standard deviation of the original observations ii Notations for standard error 5 D X S E X 0X JLH VIII Sample Means a If the underlying population is normally distributed the proportion will be 039 TH b Even if the underlying population is nowhere near normal 039 precisely 039 039 TE IX Formula for the Sampling Distribution a Read as The distribution of the sample means is approximately normal with a mean of a and a variance of 0211 2 a X N01 n X Summary a If the population is normally distributed i No matter the sample size the sample mean is normally distributed ii The sample mean should equal the population mean on average iii The SD of the sample mean is equal to the population SD divided by the square root of the sample size b If the population is NOT normally distributed i The sample mean is usefully normally distributed if the sample size is large enough ii The sample mean should equal the population mean on average iii The SD of the sample mean is equal to the population SD divided by the square root of the sample size XI Statistical Estimation a Goal is to describe the population as best you can using the information in a sample XII Interval Estimation a We want to construct intervals in such a way that 95 of the time the intervals we make will contain the true population values and 5 of the time they do not i ie we want to be 95 con dent that our interval contains the true value b Interval Estimation giving a range of values as your estimate c Point Estimation giving a single value as your estimate i Since a single value is unlikely to be exactly correct accompany a point estimate with a statement about how big the error is likely to be ii Means we need to know the standard error of our estimate 25 1 51323 p 29 iii If we don t know p use p as the estimate for p XIII Con dence Intervals for p a If you want your range based on A to include p extend out each direction from z 90 of the time 1645 standard errors z 95 of the time 196 standard errors z 99 of the time 258 standard errors 23 i ZaZ 39 51305 where a 1 minus the value you chose from column I and Zaz is the value that produces the required percentile from the standard normal distribution column 2 b Some common choices for Zaz l a a zazznorminvl 0l 95 5 zoms norminv09750 1 2196 90 10 zoos norminv0950l 21645 99 l 1 z0005 norm inv0995 0 1 R 25758 c Steps to calculate con dence interval I Make sure your sample is large enough that is np and nl p are both 2 10 Decide what you want your con dence interval to be common choices are 90 95 and 99 Determine resulting a2 eg 5 25 05 Identify the value for zQz that goes with your a eg norm inv 025 0 1 for a 95 CI 5 Find your standard error ie p1 p n substituting p for p E Calculate your con dence interval eg m 196 39 se 23 196 se XIV Con dence Intervals for u With 0 Known a Con dence intervals for u follow the same rule as con dence intervals for p 1 p 23 t zaz where SE Tm a u X t zaz SEX where SEX W XV Estimating with u and 0 Unknown a Suppose you know nothing about a population not mean or variance all you have is data from one sample the mean of n individual scores randomly chosen b How well does the sample distribution represent the true unknown population distribution i The estimate for the mean u of the population of observations is the sample mean X you observe ii If you don t know a the standard error SE Via is estimated using the sample standard deviation 5 sea1 XVI Student s t Distribution a The test statistic for the sample means is X 1 X 1 Z 0 0N5 b What if we don t know a Then we use S and call it Tn1 instead of Z X Sv c Each test statistic Tn1 has its own distribution dependent only on n i Wide When n is small and approaches the Standard Normal 2 Distribution When n is large XVII Differences Between t and z Distributions a In reality we frequently deal with t rather than z distributions b t distributions are based on less precise information c ta2n1 is the value necessary to produce the required percentile in the sampling distribution for Tn1 for a given a and n Tn l d You can t just memorize the Ira211 for the common a s like you can with Zaz because they change based on your sample size e The larger your sample size n the closer Ira211 gets to Zaz XVIII Con dence Intervals for u With 0 Unknown a Con dence intervals for u with 0 unknown follow nearly the same rules as con dence intervals for u with a known p X i Za2 SEQ f where SEQ f n u X i Ira2 n1 seX where seX n b Main difference we always have to look up the appropriate ta2n1 since it is dependent on both our desired con dence level and our sample size Required concepts covered in lecture Week 8 I Sampling Distribution a Once you have your sample IF it is randomized and unbiased we can count on some general guidelines to give us insight on the underlying population i Your sample mean is one from all possible sample means each calculated from one possible sample the distribution of which is normal ii The distribution of sampling means is centered around u 2 iii The distribution of sampling means has a variance of 07 iv The SD of the distribution of sampling means is called the standard error v On average X u Your best guess for u is X vi If we don t know a S is our best guess 039 1 Upper case I know it SEX V g 9 i 2 Lower case I m not sure se X M II Con dence Intervals for p a Before you calculate con dence intervals using p i zaz sep you should estimate a 95 CI using the empirical rule III Estimating with u and 0 Unknown a Suppose you know nothing about a population not mean or variance all you have is data from one sample the mean of n individual scores randomly chosen i To estimate a CI for u make 2 adjustments 1 Adjust your estimate of the distribution of X to re ect the lesser accuracy a Use 5 to estimate the standard error se X Elm S 039 b Th1s 1ncreases the standard error V gt V c Increasing the standard error of X widens the distribution of X 2 Adjust the distribution to which you compare the distribution of X a Use percentiles from a t distribution instead of the z distribution b The t distribution is wider than the zdistribution ii Then compare wide estimate for X with the duly wide t distribution IV Student s t Distribution a The family of t distributions model the distribution of sample means for SMALL SAMPLES when the underlying population is normally distributed b Think of it as one sampling distribution that adjusts its shape according to sample size c It s bellshaped It s useful for interpreting sample means just as is the zdistribution e It requires that we rst standardize our sample mean X X M Sx f Just like getting a zscore but instead of a use s g The t distribution z the zdistribution around n 30 so the empirical rule still applies except in cases of very small sample sizes V Student s t Distribution vs Standard Normal zDistribution a Differences between t and z distributions i t distributions more common in practice ii t distributions based on less accurate info re ected by wider spread iii Just as PZ gt zaz aZ P T1 gt tgn1 aZ iv You can t just memorize the ta2n1 for the common a s like you Tn l can with ZaZ because they change based on your sample size V The larger your sample size n the closer Ira2311 gets to ZaZ VI Common Confusions a Most errors of interpretation can be spotted by remembering that a confidence interval offers a range for a population parameter b CI describe the population parameter not the data in this sample or in another sample Week 9 Chapters 153155 and 161162 Required concepts not covered in lecture 1 Common Confusions in Interpreting Con dence Intervals CIs a F quotmom tum II A a c o i F 5 H n J K 1 so 2 L An Excelscreon 35 ll 391 20 15 12 10 Count I 19 Balance 71 A picture from the textbook pg 367 The 95 CI is shown by the two red lines Ninety ve percent of all customers keep a balance of 1520 to 2460 The con dence interval doesn39t contain many data much less 95 of the balances The con dence interval gives a range for the population mean u not the balance of an individual The mean balance of 95 of samples of this size will fall between 1520 and 2460 The con dence interval describes u an unknown constant not the means of other samples The mean balance 11 is between 1520 and 2460 Closer but still incorrect The average balance in the population does not have to fall between 1520 and 2460 This is a 95 con dence interval It might not contain u 11 Margin of Error a The 95 CI for u replaces the percentile from the tdistribution with 2 S S x 2 W to x 2 W b Margin of Error ME the extent of this interval to either side of X or similarly around 23 S ME 2 W C Three factors determine the Margin of Error i Level of con dence ii Variation of the data iii Number of observations III Hypothesis Testing a oclevel probability of a Type I error b C the critical value for the sample mean X between the retain and reject regions c pvalues smallest oclevel at which H0 can be rejected d Power the probability that a test can reject H0 IV The relationship between errors alevels C pvalues power a See above V z test for proportions and 190 a z statistic Number of standard errors that separate the test statistic from the region speci ed by H0 deviation of test statistic from H0 15 190 standard error of test statistic p01 pOn b z test test of H0 based on a count of standard errors separating HO from the test statistic Required concepts covered in lecture 1 Choosing the Sample Size a Suppose I want a 95 con dence interval but I want the CI to have some particular margin of error b Rearrange the equation for margin of error to solve for n the sample size 402 n Margin of Error2 c We have to choose n before doing anything i If you don t know a obtain an estimate using a pilot sample 11 Determining Sample Size for Proportions a For a study about p no need for a pilot sample b Use p 05 which results in largest possible values for 0 p1 p V05 0 025 05 c Sample sizes for various margins of error 95 coverage for p 100 10 400 5 625 4 11 12 3 2500 2 10000 1 III From Con dence Intervals to Hypothesis Testing a Statistics uses observed data to make informed suppositions about a population i Confidence intervals provide a range of plausible values for a population parameter ii Hypothesis tests consider the plausibility of a speci c claim claims are called hypotheses 1 H0 Hnaught Hnull null hypothesis 2 Ha Sometimes written as H1 alternative hypothesis IV Attributes of the Null Hypothesis H0 a Gets the bene t of the doubt b Associated w no change in course c The default belief or status quo d Origination httpenwikipediaorgwikiLadytastingtea V Examples of the Null Hypothesis HO a In practice the null hypothesis is exempli ed When I Fisher presumed the lady couldn t tell the difference unless she adequately convinced him otherwise I Without evidence of cheating I presume you have your own clicker I We don t x the widget machine unless we have adequate reason to believe it is malfunctioning I Those accused of a crime are innocent unless a jury of peers is convinced of their guilt beyond a reasonable doubt VI Attributes of the Alternative Hypothesis Ha a Requires us to take some action b Change our action c Change our belief d Carries the burden of proof e Is the only other option VII Examples of Null and Alternative Hypotheses a In hypothesis testing the null and alternative hypotheses must be mutually exclusive and exhaustive b Either the null is true or the alternative is true I NULL The tea drinker cannot distinguish when the milk was added I ALTERNATIVE The tea drinker can distinguish when the milk was added NULL You are not answering choker questions for someone else I ALTERNATIVE You are answering clicker questions for someone else NULL The widget machine is ne ALTERNATIVE The widget machine is not ne I NULL You are innocent of the crime I ALTERNATIVE You are guilty of the crime VIII Test Statistics a Statistical tests rely on the sampling distribution of the test statistic that estimates the parameter speci ed in the null and alternative hypotheses P P0 3 Z VP01 Pon or AH S 2 b The test statistic tells how many standard errors your sample mean is from the expectation under the null e9 939 g5 134 r39IIZJKW 124g 213 austere 393 39239 39 0 39 2 3 39tSwre 2 hsf shks 39c 1 nee Tu we c Key Question What is the chance of getting a test statistic this far from the center of the null distribution if the null is true VI Hypothesis Testing dLevels and pValues a To test your hypothesis i Specify your threshold for reasonable doubt This is your oclevel ii Identify the critical value for your chosen oclevel between retaining and rejecting the null Test 39 Lefttailed 2 vs lt Right tailed S vs gt 2 tailed 2 vs 75 Sign L Proportion V za Zaz or a known 0 unknown tam 1 taf2n 1 iii Collect your data Identify your sample statistics 3 and S or 23 iv Calculate your test statistic tn1 or z V Consider probability of observing test statistic under null hypothesis 1 If test statistic is more extreme than critical value reject H0 in favor of Ha Otherwise retain H0 2 Equivalent to nding the probability of observing test statistic under the null is lt 0L we call this the pvalue 3 Equivalent to having observed a value for X that exceeds the critical value between the retain and reject region we call this value C VII Type I and Type II Errors a But what if I m wrong i When making a decision two things can go wrong 1 You reject the null when it s really true A Type I error 2 You retain the null when it s really false A Type 11 error a Type I errors are worse than Type II errors Retain the null Reject the null Null is really true Type I Error Null is really false Type H Error indicates a correct decision VIII Type I Errors a You changed your belief took action stated something was meaningful when it wasn t b Worst type of error we can make c We limit our probability of making a Type I error to CL d gt quot The oclevel chosen is our willingness to commit a Type I error IX Type II Error a Rejecting the null when you shouldn t b You erred on the side of caution and hence failed to act on an opportunity or take action accept a new treatment s effect or accept something isn t working the way it should c A good test has enough power to reject the null when it is false X dLevels C pValues Power NULL mgr TutLAWN 4J139L M l f 1 quot I I I x I I 39 Plt l x rmly lcu39 I K rout k Iquot 39I l RETAIN NULL 5 533 What happens to our probability of Type I error if we increase 0L What happens to C if we increase CL C What happens to power if true unknown distribution is very close to null distribution d What happens to probability of Type 11 error if our test has little power to reject null e What happens to our power if we increase n PT Week 10 Chapters 162165 Required concepts not covered in lecture I Hypothesis Testing a dlevel probability of a Type I error b C the critical value for the sample mean X between the retain and reject regions c pvalues smallest dlevel at which H0 can be rejected d Power the probability that a test can reject H0 11 Important Reminder a Signi cant does not mean important Required concepts covered in lecture 1 Testing HO for Proportions vs Means Testing Hg with z test Testing Hg with t test Sample type proportion mean Hypotheses Ho 12 S 120 Ho u S Mo Hazpgtp0 Haugtuo not covered in book 39 39 iquot l o X l o X l n Test Statlstlc Z m Z m tn1 m Pu1n PU 2 332 ifquot Compare to 20 ta1 a gt quot Note that although this table shows a set of hypotheses for a right tailed test S vs gt they could also be lefttailed 2 vs lt or two tailed vs at 11 t Test for the Mean and 110 a SE vs se i SE uses the population SD 0 and produces a z statistic ii se estimates the SE by using the sample SD in place of a and produces a t statistic b t statistic i The number of estimated standard errors from X to 110 deviation of sample statistic from H0 X 110 t estimated standard error of sample statistic S c t test d A test that uses a t statistic as the test statistic III Strategically Signi cant vs Economically Signi cant Critical Value for Test at alevel5 vs Observed Value from Sample I 1520 1500 Using tstat 12499 m 178 gt 16455 1005Q2499TINV0952499 REJECI Using pval pval1TDIST1782499TRUE 00374 lt 005 oz REJECT Using c c 1500 16455 0 151846 lt 1520 2 mm C 41C C 35 03C I C 35 N C151846 C C 1520 CL C C pvalue374 I C05 C CO m r m 3 L rn 390 L0 1 4 3 lt3 lt7 C L u H r lt3939 3 I Q 7 39 391 7 466 f 1493 35 1 1403 3 15311 1531 31 quot 23 66 184 9 11193 34 1495 739 1536 51 153193 151189 1514 59 l 1519 97 1535 36 1533 05 1530 7L1 1533 44 15 6 3 1533 82 47953 482 27 0 1 LI 1 lt7 146381 151 15quot l 14 J1 l a Other things to consider i Sensitivity of rents to economic factors 1 Impact on rental agency s costs 2 Impact on renter s nancial stability ii Sensitivity to other external factors 1 Political 2 Environmental 3 Legal iii Opportunity Costs 1 Other more pro table expansions 2 Other more stable investments 3 Unforeseen opportunities b Statistically signi cant does not mean important i ii iii IV The size of the sample affects the pvalue of a test With enough data even a trivial difference from H0 leads to a statistically signi cant outcome Statistical signi cance does not mean that you have made an important or meaningful discovery Statistical signi cance should not be the only thing on which you base business decisions Use your brain intuition research consultants experts etc Con dence Intervals vs Hypothesis Testing a Statistically signi cant does not mean important 1 ii iii iv A con dence interval provides a range of parameter values that are compatible with the observed data A test provides a precise analysis of a speci c hypothesized value for a parameter Most people understand the implications of Cls more readily than tests 1 The difference is subtle For a twotailed test a CI will produce the same result if 1 oclevel Con dence Level For a onetailed test a CI will produce the same result if 1 oclevel Con dence Level assuming test statistic is not extreme in opposite direction of reject region 1 A CI may not produce the same result if 1 oclevel i Con dence Level
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'