New User Special Price Expires in

Let's log you in.

Sign in with Facebook


Don't have a StudySoup account? Create one here!


Create a StudySoup account

Be part of our community, it's free to join!

Sign up with Facebook


Create your account
By creating an account you agree to StudySoup's terms and conditions and privacy policy

Already have a StudySoup account? Login here

Reading Wrap Ups/Final Exam Review

by: Patrece Savino

Reading Wrap Ups/Final Exam Review MTH 109

Marketplace > Wake Forest University > Mathematics (M) > MTH 109 > Reading Wrap Ups Final Exam Review
Patrece Savino
GPA 3.644

Preview These Notes for FREE

Get a free preview of these Notes, just enter your email below.

Unlock Preview
Unlock Preview

Preview these materials now for free

Why put in your email? Get access to more of this material and other relevant free materials for your school

View Preview

About this Document

These notes are for all of the chapters covered in the course. Fill in formulas as needed to help study!
Math, Statistics, MATH 109, wake forest, notes
75 ?




Popular in Statistics

Popular in Mathematics (M)

This 49 page Bundle was uploaded by Patrece Savino on Sunday February 14, 2016. The Bundle belongs to MTH 109 at Wake Forest University taught by Rouse in Winter 2016. Since its upload, it has received 50 views. For similar materials see Statistics in Mathematics (M) at Wake Forest University.

Similar to MTH 109 at WFU

Popular in Mathematics (M)


Reviews for Reading Wrap Ups/Final Exam Review


Report this Material


What is Karma?


Karma is the currency of StudySoup.

You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 02/14/16
Patrece Savino Math 109 Professor Rouse 1.1 Individuals and Variables ­ Data: contains information about a group of individuals and is organized in variables o Individuals: objects described by the set of data; people, animals, or things o Variables: characteristics or information on the individual; different values for  different individuals ­ Ask following questions when planning a statistical study:  o Who?  How many? o What?  Variables?  Units of measurement? o Where? o When? o Why?  Purpose? ­ Categorical variable: individual is grouped or categorized ­ Quantitative variable: numeric values applied for when arithmetic operations make  sense (adding, averaging, etc.) ­ Unit of Measurement: quantitative value  ­ Refer to Figures 1.1 and 1.3 for examples of data on individuals 1.2 Categorical Variables: Pie charts and Bar graphs ­ Exploratory data analysis: statistical tools/ideas that help us examine data and  determine its purpose o Examine variables individually o Examine relationships between variables o Graph data ­ Distribution: distribution of variables is what values it takes and how often it takes them ­ Distribution of a Categorical Variable: lists categories and gives count or percent of  individuals who fall into each category ­ Refer to example 1.2 for information on distribution of variables ­ Pie Charts show the distribution of a categorical variable visually ­ Bar Graphs represent categories visually by using height ­ Refer to example 1.3 for information on bar graphs 1.3 Quantitative Variables: histograms ­ Histogram: graph of the distribution of a quantitative variable o Refer to figure 1.4 for steps to creating a histogram o Horizontal axis: units of measurement for the variable o Vertical Axis: Variable 1.4 Interpreting histograms ­ When graphing data:  o Observe overall pattern by considering size, shape, and how much the data varies o Identify the outlier o Midpoint is the center of distribution o Smallest­Largest values scale determines variability o Symmetric Distribution: right and left sides are approximately mirrored o Skewed to the right: right side of the histogram extends further than the left o Skewed to the left: left side of histogram extends further than the right ­ Important things to note when describing distribution:  o Shape: peaks, dips, etc.       Center  midpoint of distribution o Variability: scale of data o Outliers ­ Reference “apply your knowledge” and figures 1.5­1.7 o Note classes that were organized 1.5 Quantitative Variables: stemplots ­ Steps to creating a stemplot: o Separate each observation into a stem, consisting of all but the number furthest to  the right o The number furthest to the right is the leaf o Stems are written in a vertical column ascending o List each leaf to the right of the stem ascending o *Note that each stem may have multiple leafs o You may split stems to reduce the numbers of leafs – each stem appears more  than once ­ Refer to Apply your knowledge 1.6 Time plots ­ Displays change over time ­ Time plot: plots each observation of a variable against the time it was measured ­ *Always on the x axis ­ Cycles: up and down movements on the graph ­ Trend: overall pattern over a longer course of time ­ Time Series Data:  shows the change of a single variable over time ­ Cross­sectional Data:  displayed by histograms, more than one variable ­ College Tuition Costs are generally upwardly trending Learning Objectives ­ Be able to identify individuals and variables ­ Be able to identify categorical and quantitative variables ­ Be able to identify units of measurement in data sets ­ Pie Charts and Bar Charts display distribution of categorical variables ­ Histograms and Stem plots display distribution of quantitative variables ­ Patterns, deviations o Shape o Center o Variability o Symmetry o Skew ­ Be able to identify outliers ­ Be able to observe variables over time using a time plot o Reveals trends and cycles Chapter 20 Reading Wrap Up 20.1 Conditions for inference about a mean  Conditions for inference about a mean: o We can regard our data as a simple random sample (SRS) from the  population o Observations from the population have a Normal distribution with  mean mu and standard deviation sigma.  In practice, it is enough that  the distribution be symmetric and single­peaked unless the sample is  very small.  Both mu and sigma are unknown parameters.  Standard error: when the standard deviation of a statistic is estimated from  data, the result is called the standard error of the statistic.  The standard  error of the sample mean x­bar is s/sqrt(n). 20.2 The t distributions  If we know the value for standard deviation, we would base our confidence  intervals and tests for mu on the one­sample z statistic:  In practice, we don’t know the standard deviation, so we substitute the  standard error for its standard deviation.  The statistic that results does not  have a Normal distribution.  It has a distribution called a t distribution.  The one­sample t statistic and the t distributions: o Draw an SRS of size n from a large population that has the Normal  distribution with mean mu and standard deviation sigma.  The one­ sample t statistic: o …has the t distribution with n­1 degrees of freedom  There is a different t distribution for every sample size, so we specify a  particular t distribution by giving its degrees of freedom [t(n­degrees of  freedom)]  Figure 20.1 illustrates facts about t­distributions o The density curves are similar in shape to the Standard Normal curve  (symmetric about 0, single­peaked, and bell­shaped) o The variability is a bit greater than that of the Standard Normal  distribution.  They have more probability in the tails and less in the  center than the Standard Normal.   o As the degrees of freedom increase, the density curve approaches the  N(0,1) curve ever more closely because s estimates sigma more  accurately as the sample size increases  ex. 20.1 20.3 The one­sample t confidence interval  The one­sample t confidence interval: o Draw an SRS of size n from a large population having unknown mean mu.  A level C confidence interval for mu is: o …where t* is the critical value for the t(n­1) density curve with area C between –t* and t*.  This interval is exact when the population  distribution is Normal and is approximately correct for large n in other cases.  Ex. 20.2  The one sample t confidence interval has the form:  20.4 The one­sample t test  The One­Sample t Test: o Draw an SRS of size n from a large population having unknown mean mu.  To test the hypothesis H : mo = mu , comouter the one­ sample t statistic.   o In terms of a variable t having the t(n­1) distribution, the p­value for a  hypothesis test against: o These p­values are exact if the population distribution is Normal and  are approximately correct for large n in other cases.  Ex. 20.3 20.6 Matched pairs t procedures  In a matched pairs design, subjects are matched in pairs and each treatment is given to one subject in each pair.  Another situation calling for matched  pairs is before­and­after observations on the same subjects.  To compare the responses of the two treatments in a matched pairs design,  find the difference between the responses with each pair.  Then apply the  one­sample t procedures to these differences ex. 20.4 20.7 Robustness of t procedures  Robust procedures o A confidence interval or significance test is called robust if the  confidence level or p­value does not change very much when the  conditions for use of the procedure are violated  Using the t procedures o Except in the case of small samples, the condition that the data are an  SRS from the population of interest is more important than the  condition that the population distribution is Normal.  Sample size less than 15: Use t procedures if the data appear  close to Normal.  If the data are clearly skewed or if outliers are present, do not use it.  Sample size at least 15: The t procedures can be used except in  the presence of outliers or strong skewness  Large samples: the t procedures van be used even for clearly  skewed distributions when the sample size is large, roughly n  >/= 40.  Ex. 20.5 20.8 Resampling and standard errors  Ex 20.6, 20.7  Standard errors can be estimated using resampling.  This is useful when  dealing with statistics and parameters other than means.   Chapter 21 Reading Wrap­Up 21.1 Two­sample problems  The goal of inference is to compare the response to two treatments or to  compare the characteristics of two populations   We have a separate sample from each treatment or population ex 21.1  There is no matching of the two samples  The two samples are assumed to be independent and can be of different sizes 21.2 Comparing two population means  Conditions for inference comparing two means: o We have two SRSs, from two distinct populations.  The samples are  independent, that is, one sample has no influence on the other.    Matching violates independence, for example.  We measure the same  response variable for both samples.   o Both populations are Normally distributed.  The means and standard  deviations of the populations are unknown.  Ex. 21.2 21.3 Two­sample t procedures  The standard deviation of the difference in sample means is:  Because we don’t know the population standard deviations, we estimate  them by the sample standard deviations from our two samples.  The result is  the standard error, or estimated standard deviation, of the difference in  sample means:  When we standardize the estimate by dividing it by its standard error, the  result is the two­sample t statistic.    The Two­Sample t Procedures:  o Ex. 21.3 Draw an SRS of size n  from1a large Normal population with  unknown mean mu and d1 w an independent SRS of size n from  2  another large Normal population with unknown mean mu .  A level 2  confidence interval for mu – m1 is g2 en by:  o Here t* is the critical value for confidence level C for the t distribution with degrees of freedom from either Option 1 (software) or option 2  (the smaller n1 – 1 and n2 ­1) o To test the hypothesis H : muo = mu1, comp2ter the two­sample t  statistic.   o Find P­values from the t distribution with degrees of freedom from  either Option 1 or 2 ex. 21.4, 21.5 21.5Robustness again  Two sample t procedures are more robust (less sensitive to departures from  our conditions for inference for comparing two means) than one sample t  methods particularly when the distributions are not symmetric  21.6 Details of the t approximation  The exact distribution of a two sample t statistic is not a t distribution.    Approximate distribution of the two­sample t statistic: o The distribution of the two sample t statistic is very close to the t  distribution with degrees of freedom df given by: o This approximation is accurate when both sample sizes n 1 d n 2are 5  or larger.  Ex. 21.6 21.7 Avoid the pooled two­sample t procedures   Caution: Never use the pooled t procedures if you have software that will  implement Option 1 21.8 Avoid inference about standard deviations  Caution: Unlike t procedures for means, the F test for standard deviations is  extremely sensitive to non­Normal distributions.    Caution: We do not recommend trying to make inferences about population  standard deviations in basic statistical practice. 21.7 21.9 Permutation tests  If experimental units are assigned to two treatment groups completely at  random, and our null hypothesis is “no treatment effect,” we can test  hypotheses using sample means as follows.  First, list all possible ways units can be assigned to treatment groups.  Second, based on the data obtained, for each possible assignment determine what the difference in means would be.   Under the null hypothesis, each of these is equally likely.  Third, determine  the distribution of these possible outcomes by listing all different possible  mean difference and their corresponding probabilities.    Permutation test is what sometimes refers to this ^ and the sampling  distribution is called permutation distribution   Practical issues:  p­values may not be small, listing all possible outcomes is  tedious, most likely to be useful with small to moderate size experiments  because of the robustness of t procedures Chapter 15 Reading Wrap Up 15.1 Parameters and Statistics  A parameter is a number that describes the population. In practice, the value of a parameter is not known because we can rarely examine the entire population  A statistic is a number that can be computed from the sample data without making use of any unknown parameters. In practice, we often use a statistic to estimate an unknown parameter ex. 15.1, 15.2  Statistics come from Samples and Parameters come from Population 15.1 Statistical estimation and the law of large numbers  As we continue to take larger and larger samples, the statistic x-bar (mean) is guaranteed to get closer and closer to the parameter mu  Law of larger numbers  The Law of Large Numbers: draw observations at random from any population with finite mean mu. As the number of observations drawn increases, the mean (x-bar) of the observed values tends to get closer and closer to the mean mu of the population. Ex. 15.3 o Law of Large Numbers can be proven mathematically w/ basic laws of probability. The behavior of x-bar is similar to the idea of probability. In the long run, the proportion of outcomes taking any value gets close to the probability of that value, and the average outcome gets close to the population mean. 15.3 Sampling distributions  “What would happen if we took many samples of 10 subjects from this population?” ex. 15.2 o Take a large number of samples of size 10 from the population o Calculate the sample mean (x-bar) for each sample o Make a histogram of the values of x-bar o Examine the shape, center, and variability of the distribution displayed in the histogram ex. 15.4  The population distribution of a variable is the distribution of values of the variable among all the individuals in the population  The sampling distribution of a statistic is the distribution of values taken by the statistic in all possible samples of the same size from the same population  CAUTION: The population distribution describes the individuals that make up the population. A sampling distribution describes how a statistic varies in many samples from the population. 15.4 The sampling distribution of x-bar  Suppose that x-bar is the mean of an SRS of size n drawn from a large population with mean mu and standard deviation o. Then the sampling distribution of x-bar has mean mu and standard deviation o/sqrt(n)  Facts about mean and the standard deviation of the sampling distribution of x-bar are true for any population, not just for some special class such as Normal distributions. They have important implications for statistical inference: o The mean of the statistic x-bar is always equal to the mean mu of the population. That is, the sampling distribution of x-bar is centered at mu. In repeated sampling, x-bar will sometimes fall above the true value of the parameter mu and sometimes below, but there is no systematic tendency to overestimate or underestimate the parameter. Tis makes the idea of lack of bias in the sense of “no favoritism” more precise. Because the mean of x-bar is equal to mu, we say that the statistic x-bar is an unbiased estimator of the parameter mu. o An unbiased estimator is “correct on the average” in many samples. How close the estimator falls to the parameter in most samples is determined by the variability of the sampling distribution. If individual observations have standard deviation o, then sample means x-bar from samples of size n have standard deviation o/sqrt(N). That is, averages are less variable than individual observations. o Not only is the standard deviation of the distribution x-bar smaller than the standard deviation of individual observations, but it gets smaller as we take larger samples. The results of large samples are less variable than the results of small samples. o CAUTION: The standard deviation of the sampling distribution gets smaller only at the rate sqrt(n). To cut the standard deviation of x-bar in half, we must take four times as many observations, not just twice as many.  If individual observations have the N(u,o) distribution, then the sample mean x-bar of an SRS of size n has the N(u,o/sqrt(n)) distribution ex. 15.5 15.5 The central limit theorem  As sample size increases, the distribution of x-bar changes shape: it looks less like that of a population and more like a Normal distribution  Draw and SRS of size n from any population with mean mu and finite standard deviation o. The central limit theorem says that when n is large, the sampling distribution of the sample mean x-bar is approximately Normal: ex. 15.6  Means of random samples are less variable than individual observations  Means of random samples are more Normal than individual observations ex. 15.7, 15.8 15.5 Sampling distributions and statistical significance  Any statistic we can calculate from a sample with have a sampling distribution ex. 15.9, 15.10, 15.11 Detailed Learning Outcomes: Chapter 16 Reading Wrap Up  Statistical inference provides methods for drawing conclusions (making inferences)  about a population from sample data o Uses the language of probability to determine how trustworthy our conclusions  are o Two most common types of statistical inference: confidence intervals and tests of  significance  Simple Conditions for inference about a Mean  1) We have an SRS from the population of interest.  There is no nonresponse or  other practical difficulty.  The population is large compared to the size of the  sample. 2) The variable we measure has an exactly Normal Distribution N(μ,σ) in the  population 3) We don’t know the population mean μ.  But we do know the population  standard deviation σ.  Caution!  The conditions that we have a perfect SRS, that the population is exactly  Normal, and that we know the population σ are all unrealistic.   16.1 The reasoning of statistical estimation  Ex. 16.1 Body Mass Index of Young Women o The sampling distribution of x­bar tells us how close to μ the sample mean x­bar  is likely to be o Statistical estimation turns the information around to say how close x­bar is to the  unknown population mean μ is likely to be o Interval of numbers between the values x­bar +/­ 0.6 a 95% confidence interval  for μ 16.2 Margin of error and confidence interval  The 95% confidence interval for the mean BMI of young women is x­bar +/­ 0.6.  The  sample results tell us that x­bar is 26.8, so our confidence interval is 26.8 +/­ 0.6.    Form of confidence intervals:  The estimate is our guess for the value of unknown parameter (26.8 in ex 16.1)  The margin of error (+/­ 0.6) shows how accurate we believe our guess is, based on the  variability of our estimate.  Our confidence interval is 95% because x­bar +/­ 0.6 catches the unknown parameter in  95% of all possible samples  A level C confidence interval for a parameter has two parts: o An interval calculated from the data, usually in the form:  o A confidence level C, which gives the probability that the interval will capture  the true parameter value in in repeated samples.  That is, the confidence level is  the success rate of this method.  Interpreting a confidence level ex. 16.2 o The confidence level is the success rate of the method that produces the interval.   We don’t know whether the 95% confidence interval from a particular sample is  one of the 95% that capture μ or one of the unlucky 5% that miss.   o To say that we are 95% confident that the unknown μ lies between 26.2 and 27.4  is shorthand for “We got these numbers using a method that gives correct  results 95% of the time.”  16.3 Confidence intervals for a population mean   Reducing reasoning to a formula  We said 95% confidence interval for the unknown mean μ of a population  To find a 95% confidence interval for mean BMI of young women, we caught central  95% of Normal sampling distribution by going two standard deviations (68,95,99.7) out  from the mean.  The 95% determined how many standard deviations we go in both  directions from the mean to capture the central 95%.    To find a level C confidence interval o We first catch the central area C under the Normal sampling distribution o How many standard deviations must we go out in both directions from the mean  to capture central area C? o Because Normal distributions are on the standard scale, we can obtain all info  from the Normal curve  Critical Value “z*” is chosen so that the standard Normal curve has area C between –z*  and z*   A level C confidence interval for mean μ of a Normal population with known standard  deviation σ, based on an SRS of size n, is given by: ex. 16.3  Confidence Intervals 4 Step Process:  o STATE: What is the practical question that requires estimating a parameter? o PLAN: Identify the parameter, choose a level of confidence, and select the type  of confidence interval that fits your situation o SOLVE: Carry out the work in two phases:  Check the conditions for the interval you plan to use  SRS: We don’t have an actual SRS from the population of all  patrons of this restaurant (ex. 16.3).  Scientists often act as if  subjects are SRSs if there is nothing special about how the subjects were obtained.    Normal distribution:  The psychologists expect from past  experience that measurements like this on patrons of the same  restaurant under the same conditions will follow approximately a  Normal distribution.  We can’t look at the population, but we can  examine the sample.    Known σ:  It is really unrealistic to suppose that we know that σ = 2.  We will see in later chapter that it is easy to do away with need  to know σ.  Calculate the confidence interval o CONCLUDE: Return to the practical question to describe your results in this  setting 16.4 How confidence intervals behave  The “z” confidence interval,                                                      for the mean of a Normal  population illustrates several important properties that are shared by all confidence  intervals in common use: o The user chooses the confidence level!    The margin of error follows from this  choice o We would like a high confidence level and a small margin of error  High confidence says that our method almost always gives the correct  answers  A small margin of error says that we have pinned down the parameter  quite precisely o How do we get a smaller margin of error? o Margin of error gets smaller when:  ex. 16.4  z* gets smaller.  Smaller z* is the same as lower confidence level C  (figure 16.3).  Caution!  There is a trade­off between the confidence level  and the margin of error.  To obtain a smaller margin of error from the  same data, you must be willing to accept lower confidence.  σ is smaller.  The standard deviation σ measures the variation in the  population.  You can think of the variation among individuals in the  population as noise that obscures the average value μ.  It is easier to pin  down μ when σ is small.    n gets larger.  Increasing the sample size n reduces the margin of error for  any confidence level.  Larger samples thus allow more precise estimates.  However, Caution!  Because n appears under a square root sign, we must  take four times as many observations to cut the margin of error in half.    *In practice, we can control the confidence level and sample size, but we  cannot control σ Chapter 17 Reading Wrap Up  2  type of statistical inference are called tests of significance and they have a different  goal: to assess the evidence provided by data about some claim concerning a population  parameter ex 17.1  The reasoning of a statistical test applet animates 17.1   Significance tests basic idea is simple: an outcome that would rarely happen if a claim  were true is good evidence that the claim is not true 17.1 The reasoning of tests of significance   Ex. 17.2, shows that cola has an average loss of sweetness x­bar = 0.3.  Now, we make a  claim against it.  We seek evidence that there is a sweetness loss, so the claim we tests is  that there is not a loss.  In that case, the mean loss for the population of all trained testers  would be μ = 0.   o If the claim μ = 0 is true, the sampling distribution of x­bar from the 10 tasters is  Normal with mean μ = 0 and the standard deviation:  o Similar to calculations in Chapter 15 and 16.  Figure 17.1 shows this sampling  distribution.  We can judge whether any observed x­bar is surprising by locating it on this distribution.   o For this cola, 10 tasters had mean loss x­bar = 0.3.  It is clear in figure 17.1 than  an x­bar this large is not particularly surprising.  It could easily occur just by  chance when the population mean is μ = 0.  That 10 tasters found x­bar = 0.3 is  not strong evidence that this cola loses sweetness. o Ex. 17.3 the taste test for the new cola produced x­bar = 1.02, which is very far  out on the tail of the Normal curve – so far out that an observed value this large  would rarely occur just by chance if the true μ were 0.  This observed value is  good evidence that the true μ is greater than 0, that is, that the cola lost sweetness. 17.2 Stating hypotheses  Null and Alternative Hypotheses  o The claim tested by a statistical test is called the null hypothesis.  The test is  designed to assess the strength of the evidence against the null hypothesis.   Usually the null hypothesis is a statement of “no effect” or “no difference.”    o The claim about the population that we are trying to find evidence for is the  alternative hypothesis.  The alternative hypothesis is one­sided if it states that a  parameter is larger than or smaller than the null hypothesis value.  It is two­sided if it states that the parameter is different from the null value (it could be either  larger or smaller).    Caution!  Hypotheses always refer to a population parameter, not to a particular sample  outcome.  Be sure to state H  0nd H  (ahat we hope to find evidence for) in terms of  population parameters.  For examples 17.2 and 17.3, we are seeking evidence for loss of sweetness.  The null  hypothesis says “no loss” on the average in a large population of tasters.  The alternative  hypothesis says “there is a loss.”  So, the hypotheses are:    o The alternative hypothesis is one­sided because we are only interested in whether  the cola lost sweetness.  Ex. 17.4  Caution!  The hypotheses should express the hopes or suspicious we have before we see  the data.  It is cheating to first look at the data and then frame hypotheses to fit what the  data show.   17.3 The P­value and statistical significance   The idea of stating a null hypothesis that we want to find evidence against seems weird   think about it like a criminal trial…the defendant is “innocent until proven guilty,” so, the null hypothesis is innocence and the prosecution must try to provide convincing evidence against this null hypothesis  The p­value is the probability that measures the strength of the evidence against a null  hypothesis   Test statistic and P­Value o A test statistic calculated from the sample data measures how far the data diverge from what we would expect if the null hypothesis H  were0true.  Large values of  the statistic show that the data are not consistent with H .0 o The probability, computed assuming that H  is tru0, that the test statistic would  take a value as extreme or more extreme than that actually observed is called the  p­value of the test.  The smaller the p­value, the stronger the evidence against H   0 provided by the data.  Ex. 17.5, 17.6  Caution!  Failing to find evidence against H  m0ans only that the data are not inconsistent with H 0 not that we have clear evidence that H  is0true.  Only data that are inconsistent  with H 0provide evidence against H . 0  Statistical Significance  o If the P­value is as small as or smaller than α, we say that the data are  statistically significant at level α.  The quantity α is called the significance  level or the level of significance.    Caution!  “Significant” in the statistical sense does not mean “important.”  It means  simply, “not likely to happen just by chance.’  The significance level α makes “not  likely” more exact. Tests for a population mean  The big idea is this reasoning of a test: data that would rarely occur if the null hypothesis H 0were true provide evidence that H  is0not true.    Tests of Significance: The Four Step Process o STATE: What is the practical question that requires a statistical test? o PLAN:  Identify the parameter, state null and alternative hypotheses, and choose  the type of test that fits your situation o SOLVE:  Carry out the test in three phases: 1. Check the conditions for the test you plan to use 2. Calculate the test statistic 3. Find the p­value o CONCLUDE:  Return to the practical question to describe your results in this  setting  z Test for a Population Mean  o Draw an SRS of size n from a Normal population that has unknown mean μ and  known standard deviation σ.  To test the null hypothesis that μ has a specified  value,  o Calculate the one­sample z statistic,  o In terms of a variable Z having the standard Normal distribution, the P­value for a test of H0 gainst ex. 17.7  Caution!  In this chapter, we are acting as if the “simple conditions” from page 274 are  true.  In practice, you must verify these conditions. o SRS: ex. 17.2, The most important condition is that the 72 executives in the  sample are an SRS from the population of all middle­aged male executives in the  company.  We should check this requirement by asking how the data were  produced.   o Normal Distribution:  We should also examine the distribution of the 72  observations to look for signs that the population distribution is not Normal. o Known σ:  It is really unrealistic to suppose that we know that σ = 15.  We,  again, will see in later chapter that it is easy to do away with the need to know σ. 17.5 Significance from a table  Significance from a table of Critical Values ex. 17.8 o To find the approximate P­value for any x­statistic, compare x (ignoring its sign)  with the critical value z* at the bottom of Table C.  If z falls between two values  of z*, the p­value falls between the two corresponding values of P in the “one­ sided P” or the “two­sided P” row of Table C.    17.6 Resampling: significance from a simulation  Ex. 17.9, 17.10  First, we must resample in the same manner as we obtain our data.  If our data are  obtained by an SRS, we must resample by taking repeated SRSs from the population  distribution determined by the null hypothesis.  Second, resampling only provided an estimate of a p­value.  Repeat the resampling and  you will obtain a different estimate.  Accuracy of the estimate is improved by taking a  larger number of samples to estimate the sampling distribution.    Finally, resampling requires the use of software.  Some software packages are more  convenient for resampling than others.   Detailed Learning Outcomes Chapter 22 Reading Wrap Up ­ Questions about proportion of some outcome in a population ex. 22.1, 22.2 ­ The sample proportion p­hat o The statistic that estimates the parameter p is the sample proportion: o The sampling distribution of a Sample Proportion  Draw a SRS of size n from a large population that contains proportion p of successes  Let p­hat be the sample proportion of successes,   Then:   The mean of the sampling distribution is p  The standard deviation of the sampling distribution is:  As the sample size increases, the sampling distribution of p­hat  becomes approximately Normal. That is, for large, p­hat has  approximately the N(p, [p(1­p)/n]^1/2) distribution ex. 22.3 ­ Large­sample confidence intervals for a proportion o The confidence interval is: o We don’t know the value of p, so we need to replace the standard deviation with  the standard error of p: o To get the confidence interval… o Large­Sample Confidence Interval for a Population Proportion  Draw an SRS of size n from a large population that contains an unknown  proportion p of successes.  An approximate level C confidence interval for p is:  Where z* is the critical value and successes and failures in the sample are  both at least 15 ex. 22.4 ­ Choosing the sample size o The margin of error for a large­sample confidence interval is: o Here are two ways to get p*  Use a guess p* based on a pilot study or on past experience with similar  studies  Use p* = 0.5 as the guess.  The margin of error m is largest when p* = 0.5, so this guess is conservative in the sense that if we get any other p­hat  when we do our study, we will get a smaller margin of error than planned o Sample size for desired margin of error  The level C confidence interval for a population proportion p will have a  margin of error approximately equal to a specified value m when the  sample size is:  …where p* is a guessed value for the sample proportion ex. 22.5 ­ Significance tests for a proportion o The test statistic for the null hypothesis H : 0 = p is0 he sample proportion p­hat  standardized using the value p sp0 ified by H 0,  o Significance tests for a proportion  SRS size n from large population that contains unknown proportion p of  success.  To test the hypothesis H :0p = p c0, ute the z statistic:  In terms of a variable Z having the Standard Normal distribution, the  approximate P­value for a test of H ag0 nst ex. 22.6, 22.7 ­ Plus four confidence intervals for a proportion o Add four imaginary observations: two successes and two failures: ex. 22.8 Chapter 23 Reading Wrap Up ­ Two­sample problems: proportions ex. 23.1. 23.2 o Subscript notation: ­ The sampling distribution of a difference between proportions  o To use p­hat1 – p­hat2 for inference, we must know its sampling distribution.   Here are the facts we need:  When the samples are large, the distribution of p1­p2 (hat) is  approximately Normal  The mean of the sampling distribution is p1­p2  The standard deviation of the distribution is: ­ Large­sample confidence intervals for comparing proportions o Standard error of the statistic phat1 – phat2: o When n1 and n2 are large, an approximate level C confidence interval for p1­p2  is: ex. 23.3 ­ Significance tests for comparing proportions o The null hypothesis says there is no difference between the two populations: o The alternative hypothesis says what kind of difference we expect ex. 23.4 o Significance Test for Comparing Two Proportions  To test the hypothesis that there is no difference, find the pooled  proportion p­hat of successes, then compute the z statistic:  In terms of a variable Z having the standard Normal distribution, the p­ value for a test of H­null against ex. 23.5 ­ Plus four confidence intervals for comparing proportions o To get the plus four confidence interval for the difference p1­p2, add four  imaginary observations: one success, and one failure, in each of the two samples.   Then, use the large­sample confidence interval with the new sample sizes and  counts of successes ex. 23.6  Detailed Learning Outcomes Reading Wrap Up Chapter 6 Two Way Tables ­ Two way tables describe relationships between categorical variables (ex. 6.1) ­ row variables are described in the rows of the tables ­ column variables are described in the columns of the tables  6.1 Marginal Distributions  ­ To analyze two way tables,  o First look at the distribution of each variable separately  “how often each outcome occurred”    “total” column o If any totals are missing, calculate them. (ex. 6.2) o Marginal distributions are the distributions of each categorical variable  relative to the table total 6.2 Conditional Distributions ­ CAUTION: Marginal distributions tell us nothing about the relationship between  two variables ­ To describe relationship between two variables, we must calculate certain  percentages ­ Marginal vs. Conditional Distributions (ex. 6.3) o The marginal distribution of one of the categorical variables in a two­way  table of counts is the distribution of values of that variable among all  individuals described by the table o The conditional distribution of a variable is the distribution of values of that variable among only individuals who have a given value of the other  variable; separate conditional distribution for each value of the other  variable ­ Segmented bar graphs (see figure 6.3): each bar is divided into parts, each part  representing a different category ­ Mosaic plot (see figure 6.4): variation of a segmented bar graph; bars have  different widths corresponding to the proportion of people in each of the three  preferred lifestyle categories and each bar is segmented (adults preferred  lifestyle based on sex and education table) ­ CAUTION: No single graph (such as a scatterplot) portrays the form of the  relationship between categorical variables.  No single numerical measure (such  as correlation) summarizes the strength of the association.  o If there is an explanatory­response relationship, compare the conditional  distributions of the response variable for the separate values of the  explanatory variable (Apply your knowledge 6.3­6.5) 6.3Simpson’s Paradox ­ Effects of lurking variables can change or reverse relationships between  categorical variables (ex. 6.4) (Apply your knowledge 6.6­6.7) ­ An association or comparison that holds for all of several groups can reverse  direction when the data are combined to form a single group…this reversal is  called Simpson’s Paradox Learning Outcomes 1. Identify the individuals and variables in a dataset. 2. Classify variables as categorical or quantitative. 3. Generate appropriate visualizations from raw data. 4. Calculate appropriate summary statistics from raw data. 5. Determine appropriate summary statistics to use depending upon the question of interest and the data. 6. Interpret all numerical quantities and visualizations in context to tell the story of the data. 7. Recognize that correlation does not imply causation. Reading Wrap Up Chapter 18 18.1 Conditions for Inference in Practice  Caution!  Any confidence interval or significance test can be trusted only under specific  conditions 18.1, 18.2 ­ Simple conditions from previous unit ­ Data must come from a process to which laws of probability apply ­ Caution!  If your data doesn’t come from a random sample or a randomized  comparative experiment, your conclusions may be challenged ­ Caution!  There is no simple rule for deciding when you can act as if a sample is  an SRS.  Pay attention to these cautions:  o Practical problems such as nonresponse in samples or dropouts from an  experiment can hinder inference even from a well­designed study o Different methods are needed for different designs o There is no cure for fundamental flaws like voluntary response surveys or  uncontrolled experiments ­ Caution!  Any inference procedure based on sample statistics like sample mean x­ bar that are not resistant to outliers can be strongly influenced by a few extreme  observations 18.2 Cautions about Confidence Intervals  Caution!  Margin of Error in a confidence interval ignores everything except the sample­ to­sample variation due to choosing the sample randomly ­ The margin of error in a confidence interval only covers random sampling errors ­ Practical difficulties (ex. undercoverage and nonresponse) are often more serious  than random sampling error.  The margin of error does not take these into  account.   18.3 Cautions about Significance Tests  How plausible is the null hypothesis? ­ Strong evidence = small p  What are the consequences of rejecting the null hypothesis? ­ You are making a big change – make sure that your evidence is strong  enough to support this change  Caution!  There is no sharp border between “significant” and “nonsignificant,” only  increasingly strong evidence as the p­value decreases.    The evidence against the null is stronger when the hypothesis is one­sided because it has  evidence based on the data PLUS information about the direction of possible deviations  from the null.  Two­sided tests multiply the p­value by 2.    Caution!  How important an effect is depends on the size of the effect as well as on its  statistical significance.    Sample size affects statistical significance 18.3, 18.4 ­ Because large random samples have smaller chance variation, very small  population effects can be highly significant when the sample is large ­ Because small samples have a lot of chance variation, even large population  effects can fail to be significant when the sample is small ­ Statistical significance does not tell us whether an effect is large enough to be  important.  Statistical significance is not the same thing as practical  significance.    Caution!  Running one test and reaching the 5% level of significance is reasonably good  evidence that you have found something.  Running 20 tests and reaching that level only  once is not.   18.4 Planning studies: sample size for confidence intervals  Sample size for desired margin of error: o The z confidence interval for the mean of a Normal population will have a  specified margin of error m when the sample size is:   Caution!  Notice that it is the size of the sample that determines the margin of error.  The  size of the population does not influence the sample size we need.  18.5 18.5 Planning studies: the power of a statistical test  Questions we must answer to decide how many observations we need: 18.6, 18.7, 18.8,  18.9 o Significance Level o Effect Size o Power  The power of a test against a specific alternative is the probability that the test will reject  the null hypothesis at a chosen significance level (a) when the specified alternative value  of the parameter is true.  Types of errors: o Type I error: if we reject the null hypothesis when in fact the null is true o Type II error: if we fail to reject the null hypothesis when the alternative is true o The significance level (a) of any fixed level test is the probability of a Type I  error o The power of a test against any alternative is the probability of correctly rejecting the null hypothesis for that alternative.  It can be calculated as 1 minus the  probability of a Type II error for that alternative.   Reading Wrap Up Chapter 12 - Caution: Random samples eliminate bias from the act of choosing a sample, but they can still be wrong because of the variability that results when we choose at random. Ex. 12.1 12.1 The idea of probability - Chance behavior is unpredictable in the short run but has a regular and predictable pattern in the long run ex. 12.2. - We call a phenomenon random if individual outcomes are uncertain but there is nonetheless a regular distribution of outcomes in a large number of repetitions - The probability of any outcome of a random phenomenon is the proportion of times the outcome would occur in a very long series of repetitions - Caution: the proportion in a small or moderate number of tosses can be far from the probability. Probability describes only what happens in the long run. We can never measure probability exactly. 12.3 Probability models - The idea of probability rests on the observed fact that the average result of many thousands of chance outcomes can be known with near certainty - Basis for probability models: - A list of possible outcomes - Probability for each outcome - The sample space S of a random phenomenon is the set of all possible outcomes - Ex. coin toss sample space S = {H,T} - An event is an outcome or a set of outcomes of a random phenomenon. That is, an event is a subset of the sample space. - A probability model is a mathematical description of a random phenomenon consisting of two parts: a sample space S and a way of assigning probabilities to events ex. 12.4, 12.5 - If all outcomes in a sample are equally likely, we find probability of an event by: - Ex. 12.5: Caution: Comparing this S with figure 12.2 reminds us that we can change S by changing the detailed description of the random phenomenon we are describing 12.4 Probability Rules 1. Any probability is a number between 0 and 1 2. All possible outcomes together must have probability 1 3. If two events have no outcomes in common, the probability that one or the other occurs is the sum of their individual probabilities 4. The probability that an event does not occur is 1 minus the probability that the event does occur - We can state these rules mathematically: ex. 12.6 - The probability P(A) of any event A satisfies 0 </= P(A) </= 1 - If S is the sample space in a probability model, P(S) = 1 - Two events A and B are disjoint if they have no outcomes in common and so can never occur together. If A and B are disjoint, P(A or B) = P(A) + P(B). This is the addition rule for disjoint events. - For any event A, P(A does not occur) = 1 - P(A) 12.5 Finite and discrete probability models - A probability model with a finite sample space is called finite - To assign probabilities in a finite model: - List the probabilities of all the individual outcomes - These probabilities must be numbers between 0 and 1 and add exactly to 1 - The probability of an event is the sum of the probabilities of the outcomes making up the event - Discrete probability models include finite sample spaces as well as sample spaces that are infinite and equivalent to the set of all positive integers ex. 12.7, 12.8 12.6 Continuous probability models - When we use a table of random digits to select a digit between 0 and 9, the finite probability model assigns a probability of 1/10 to each of the 10 possible outcomes - If we want to choose a random number between 0 and 1, allowing any number between 0 and 1 to be the outcome, software random number generators will do this o The sample space is now S = {all numbers between 0 and 1} ex. 12.9 - A continuous probability model assigns probabilities as areas under a density curve. The area under the curve and above any range of values is the probability of an outcome in that range. - Normal distributions are continuous probability models ex. 12.10 12.7 Random variables - A random variable is a variable whose value is a numerical outcome of a random phenomenon - The probability distribution of a random variable X tells us what values X can take and how to assign probabilities to those values ex. 12.11 - There are two main types of random variables (corresponding to two types of probability models): o Discrete random variable: has a finite list of possible outcomes o Continuous: can take on any value in an interval, with probabilities given as areas under a density curve 12.8 Personal probability - A personal probability of an outcome is a number between 0 and 1 that expresses an individual’s judgment of how likely the outcome is ex. 12.12 Chapter 13 Reading Wrap Up 13.1 Independence and the multiplication rule - A figure that shows the sample space S as a rectangular area and events as areas within S is called a Venn diagram ex. 13.1 - Two events A and B are independent events if knowing that one occurs does not change the probability that the other occurs. If A and B are independent, P(A and B) = P(A)P(B) ex. 13.2, 13.3, 13.4 - Caution: The multiplication rule P(A and B) = P(A)P(B) holds if A and B are independent but not otherwise. - Caution: The addition rule P(A or B) = P(A) + P(B) holds if A and B are disjoint but not otherwise. - Caution: Be careful not to confuse disjointness and independence. 13.2 The general addition rule - For any two events A and B: P(A or B) = P(A) + P(B) – P(A and B) ex. 13.5 13.3 Conditional probability - Ex. 13.6 - When P(A) > 0, the conditional probability of P given A is: - Caution: the conditional probability P(B|A) makes no sense if the event A can never occur, so we require that P(A) > 0. Be sure to keep in mind the distinct roles of the events A and B in P(B|A). ex. 13.7 13.4 The general multiplication rule - The probability that both of two events A and B happen together can be found by: - Here, P(B|A) is the conditional probability that B occurs, given the information that A occurs ex. 13.8, 13.9 13.5 Independence again - Two events A and B that both have positive probability are independent if P(B|A) = P(B) 13.6 Tree diagrams - Probability models often have several stages with probabilities at each stage conditional on the outcomes of earlier stages. These models require us to combine several of the basic rules into a more elaborate calculation ex.13.10, 13.11 - Starting from, o The probability of each source, and o The conditional probability of the outcome given each source - …the tree diagram leads to the overall probability of the outcome Chapter 14 Reading Wrap Up 14.1 The binomial setting and binomial distributions - The distribution of a count depends on how the data are produced. Here is a common situation: - The binomial setting: 1. There are a fixed number n of observations 2. The n observations are all independent. That is, knowing the result of one observation doesn’t change the probabilities we assign to other observations. 3. Each observation falls into one of just two categories: success and failure 4. The probability of success, p, is the same for each observation - The count X of successes in the binomial setting has the binomial distribution with parameters n and p. The parameter n is the number of observations, and p is the probability of a success on any one observation. The possible values of X are the whole numbers from 0 to n. - Caution: the binomial distributions are an important class of discrete probability models. Pay attention to the binomial setting, because not all counts have binomial distributions. Ex. 14.1, 14.2 14.2 Bi


Buy Material

Are you sure you want to buy this material for

75 Karma

Buy Material

BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.


You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

Why people love StudySoup

Jim McGreen Ohio University

"Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

Allison Fischer University of Alabama

"I signed up to be an Elite Notetaker with 2 of my sorority sisters this semester. We just posted our notes weekly and were each making over $600 per month. I LOVE StudySoup!"

Steve Martinelli UC Los Angeles

"There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."


"Their 'Elite Notetakers' are making over $1,200/month in sales by creating high quality content that helps their classmates in a time of need."

Become an Elite Notetaker and start selling your notes online!

Refund Policy


All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email


StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here:

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.