BNAD277 Study Guide Test 1
BNAD277 Study Guide Test 1 BNAD277
Popular in Business Statistics
verified elite notetaker
Popular in Business
This 29 page Study Guide was uploaded by Kristin Koelewyn on Tuesday February 23, 2016. The Study Guide belongs to BNAD277 at University of Arizona taught by Dr. S. Umashankar in Spring 2016. Since its upload, it has received 61 views. For similar materials see Business Statistics in Business at University of Arizona.
Reviews for BNAD277 Study Guide Test 1
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 02/23/16
BNAD277 Test 1 Study Guide: Chapter 3: Bnad277: Chapter 3a Notes Numerical Measures- Location - Two types of measures: o Measures of Location o Measures of Variability - Measures of location: o Mean, median, mode, weighted mean, geometric mean, percentiles, quartiles ▯ If the measures are computed for data from a sample, they are called sample statistics. ▯ If the measures are computed for data from a population, they are called population parameters. ▯ A sample statistic is referred to as a point estimator of the corresponding population parameter. - Mean: o Most important measure of location. o Provides a measure of central location. o The mean of a data set is the average of all the values. o The same mean x̅ is the point estimator of the population mean µ. ▯ Formula for Sample Mean: ▯ Formula for Population Mean: - Median: o The median of a data set is the value in the middle when the data items are arranged in ascending order. o The Median is the preferred measure of central location when the data set has extreme values. o The median is used most often for annual income and property value data. o Just a few extremely large incomes can inflate the mean. ▯ For an ODD number of observations, the median is the MIDDLE value. ▯ For an EVEN number of observations, the median is the AVERAGE of the two middle values. - Mode: o The mode of a data set is the value that occurs with greatest frequency. o The greatest frequency can occur at two or more different values. o If the data has exactly two modes, the data is considered bimodal. o If the data has more than two modes, it’s considered multimodal. - Weighted Mean: o Sometimes, the mean is computed by giving each observation a weight that reflects its relative importance. o The choice of weights depends on the application. o For example, a 4 unit course weighs more than a 1 unit course towards GPA. o Other examples are quantities such as pounds, dollars, or volume. ▯ Formula for Weighted Mean: ▯ Xi= value of the observation i ▯ Wi= weight got observation i - Geometric Mean: o The geometric mean is calculated by finding the nth root of the product of n values. o Often used in analyzing growth rates in financial data. o Should be used when you want to determine the mean rate of change over several periods (years, quarters, weeks, days, etc). o Other examples include changes in populations of species, crop yields, population levels, and birth/death rates. ▯ Formula for Geometric Mean: - Percentiles: o A percentile provides information about how the data are spread over the interval from the smallest value to the largest value. o For example, admission test scores for colleges are usually reported in terms of percentiles. o The pth percentile of a data set is a value such that at least p percent of the items take on this value or less and at least (100-p) percent of the items take on this value or more. ▯ First, arrange data in ascending order ▯ Then, compute L =(p/100)(n+1) - Quartiles: o Quartiles are specifth percentiles. o First Quartile= 25 Percentile o Second Quartile= 50 Percentile= Median o Third Quartile= 75 Percentile - Excel Functions: o Mean Function: =AVERAGE (data cell range) o Median Function: =MEDIAN (data cell range) o Mode Function: =MODE.SNGL (data cell range) o Geometric Mean Function: =GEOMEAN (data cell range) o Percentile Function: =PERCENTILE.EXC (data range, p/100) o Quartile Function: =QUARTILE.EXC (data range, quartile number) o Sample Variance Function: =VAR.S (data cell range) o Sample Standard Deviation Function: =STDEV.S (data cell range) - Measures of Variability: o Measures of variability =dispersion. o For example, in choosing what vendor to buy from, it is good to consider the variability in delivery time as well ad the average delivery time. o Measures of Variability include: range, interquartile range, variance, standard deviation, and coefficient of variation. - Range: o The range of a data set is the difference between the largest and smallest data values. ▯ Range= Largest Value- Smallest Value o Simplest measure of variability. o Very sensitive to the smallest and largest data values. ▯ Example: 3,6,7,9,12,20,19 (ascending order) ▯ 19-3=16=Range - Interquartile Range o The interquartile range of a data set is the difference between the third quartile and the first quartile. o It’s the range for the middle 50% of all the data. o Overcomes the sensitivity to the extreme data values. ▯ Q3-Q1=IQR - Variance o The variance us a measure of variability that utilizes all the data. o Based on the difference between the value of each observation (x) i and the mean (x̅ for a sample, µ for a population). o The variance is useful in comparing the variability of the two variables. o The variance is the average of the squared differences between each data value and the mean ▯ Sample Variance: ▯ Population Variance - Standard Deviation o The standard deviation for a data set is the positive square root of the variance. o It is measured in the same units as the data, making it more easily interpreted than the variance. ▯ Sample Standard Deviation: s=√s 2 ▯ Population Standa2d Deviation: σ=√σ - Coefficient of Variation: o The coefficient of variation indicates how large the standard deviation is in relation to the mean. ▯ The Coefficient of Variation for a Sample and Population: - Example of Sample Variance, Staandard Deviation, and Coefficient of Variation: o Variance: = 2,996.16 o Standard Deviation: s=√s 2 = √2,996.16 = 54.74 o Coefficient of Variation: = [(57.74/590.80) x 100]% = 9.27% ▯ Standard deviation is about 9% of the mean. Descriptive Statistics- Numerical Measures - Measures of Distribution Shape, Relative Location, and Detecting Outliers o z-Scores (focusing a lot on this) o Chebyshev’s Theorem o Empirical Rule o Detecting Outliers - Five-Number Summaries and Box Plots - Measures of Association Between Two Variables - Data Dashboards: Adding Numerical Measures to Improve Effectiveness - Distribution Shape: Skewness o (Never will be asked to compute skewness on test) o Skewness is an important measure of the shape of a distribution. o Formula for skewness: o Skewness can easily be computed using statistical software (Excel) o Example of a symmetric graph (not skewed, mean=median). o Example of a graph skewed moderately to the left (skewed left=negative, mean<median). o Example of a graph skewed moderately to the right (skewed to the right=positive, mean>median). o If the graph is highly skewed right, skewness is positive (often greater than 1.0). o If graph is highly skewed negative, skewness is negative (often less than 1.0). - Z-scores: o The z-score is often called the standardized value. o It denotes the number of standard deviations a data value x is frim the mean. o Excel’s STANDARDIZE function can be used to computer the z- score. o An observation’s z-score is a measure of the relative location of the observation in a data set. o A data value less than the sample mean will have a z-score less than zero. o A data value greater than the sample mean will have a z-score greater than zero. o A data value equal to the sample mean will have a z-score of zero. - Chebyshev’s Theorem: o Good to use when distribution shape is unknown o At least (1-1/z ) of the items in any data set will be within z standard deviations of the mean, where z is any value greater than 1. o Chebyshev’s theorem requires z>1, but z need not be an integer. ▯ At least 75% of the data must be within z=2 standard deviations of the mean. ▯ At least 89% of the data must be within z=3 standard deviations of the mean. ▯ At least 94% of the data values must be within z=4 standard deviations of the mean. - Empirical Rule: o Good to use when distribution shape is a normal bell shape o The empirical rule can be used to determine the percentages of data values that must be within a specified number of standard deviations of the mean. o The empirical rule is based on the normal distribution. ▯ 68.26% of the values of a normal random variable are within +/- 1 standard deviation of its mean. ▯ 95.44% of the values of a normal random variable are within +/- 2 standard deviations of its mean. ▯ 99.72% of the values of a normal random variable are within +/- 3 standard deviations of its mean. - Detecting Outliers: o An outlier is an unusually small or unusually large value in a data set. o A data value with a z-score less than -3 or greater than +3 might be considered an outlier. o It might be: ▯ An incorrectly recorded data value ▯ A data value that was incorrectly included in the data set ▯ A correctly recorded data value that belongs in the data set o Example: For apartment rents, the most extreme z-scores are -1.20 and 2.27. Using the absolute value of z is greather than or equal to 3, there are no outliers in the data set. - Five Number Summaries and Box Plots: o Summary statistics and easy-to-draw graphs can be used quickly to summarize large quantities of data. o Two tools that accomplish this are five-number summaries and box plots. ▯ Smallest Value ▯ First Quartile ▯ Median ▯ Third Quartile ▯ Largest Value o Lowest Value= 525, First Quartile= 545, Median= 575, Third Quartile=625, Largest Value=715 - Box Plot o Used to identify outliers without finding z-scores o A box plot is a graphical summary of data that is based on a five- number summary. o A key to the development of a box plot is the computation of the median and the quartiles Q an1 Q . 3 ▯ Example: ▯ Q1= 545, Q3=625, Q2= 575 o Limits are located (not drawn) using the interquartile range (IQR). o Data outside these limits are considered outliers. o The locations of each outlier are shown with the symbol *. ▯ Example: Apartment Rents o Whiskers (dashed lines) are drawn from the ends of the box to the smallest and largest data values inside the limits. - Measures of Association Between Two Variables: o Two descriptive measures of the relationship between two variables are covariance and correlation coefficient. - Covariance: o The covariance is a measure of the linear association between two variables. o Positive values indicate a positive relationship (as x goes up, y goes up or vice versa) o Negative values indicate a negative relationship (as x goes up, y goes down or vice versa) ▯ Covariance for samples: ▯ Covariance for population: - Correlation Coefficient: o Correlation is a measure of linear association and not necessarily causation o It means that one is associated with the other, not that one is the cause of the other. ▯ Correlation Coefficient is computed as: o The coefficient can take on values between -1 and +1. o Values near -1 indicate a strong negative linear relationship. o Values near +1 indicate a strong positive linear relationship. o The closer the correlation is to zero, the weaker the relationship. ▯ Example: • From the data above, we can get: - Data Dashboards: Adding Numerical Measures to Improve Effectiveness o Data dashboards are not limited to graphical displays. o The addition of numerical measures, such as the mean and standard deviation of KPIs, to a data dashboard is often critical. o Dashboards are often interactive. o Drilling down refers to functionality in interactive dashboards that allows the user to access information and analyses at increasingly detailed level. Bnad277: Chapter 6 Notes Continuous Probability Distributions - Uniform Probability Distribution - Normal Probability Distribution - Exponential Probability Distribution - Continuous Probability Distributions: o A continuous random variable can assume any value in an interval on the real line or in a collection of intervals. o It is not possible to talk about the probability of the random variable assuming a particular value. o Instead, we talk about the probability of the random variable assuming a value within a given interval. o The probability of the random variable assuming a value within some given interval from x to x is defined to be the area under the 1 2 graph of the probability density function between x an1 x 2 - Uniform Probability Distribution: o A random variable is uniformly distributed whenever the probability is proportional to the interval’s length. o The uniform probability density function is: ▯ Where a= smallest value the variable can assume and b= largest value the variable can assume o Expected Value of x: o Variance of x: o Example: Slater’s Buffet ▯ Slater customers are charged for the amount of salad they take. Sampling suggests that the amount of salad taken is uniformly distributed between 5 ounces and 15 ounces. ▯ Use uniform probability function (^above^) where x= salad plate filling weight ▯ Expected Value of x: ▯ Variance of x ▯ Distribution for Salad Plate Filling Weight - Area as a Measure of Probability: o The area under the graph of f(x) and probability are identical. o This is valid for all continuous random variables. o The probability that x takes on a value between some lower value x1and some higher value x can2be found by computing the area under the graph of f(x) over the interval from x1to x 2 - Normal Probability Distribution: o The normal probability distribution is the most important distribution for describing a continuous random variable. o It is widely used in statistical inference. o It has been used in a wide variety of applications including: height of people, rainfall amounts, test scores, & scientific measurements o Abraham de Moivre, a French mathematician, published The Doctrine of Chances in 1733 and he derived the normal distribution. ▯ Normal Probability Density Function Where: o Characteristics: ▯ The distribution is symmetric; its skewness measure is zero. ▯ The entire family of normal probability distributions is defined by its mean and its standard deviation. ▯ The highest point on the normal curve is at the mean, which is also the median and the mode. ▯ The mean can be any numerical value: negative, zero, or positive. ▯ The standard deviation determines the width of the curve: larger values result in wider, flatter curves. ▯ Probabilities for the normal random variable are given by areas under the curve. The total area under the curve is 1 (.5 to the left of the mean and .5 to the right of the mean). ▯ Basis for the empirical rule: • 68.26% of values of a normal random variable are within +/- 1 standard deviation of its mean. • 95.44% of values of a normal random variable are within +/- 2 standard deviations of its mean. • 99.97% of values of a normal random variable are within +/- 3 standard deviations of its mean. - Standard Normal Probability Distribution: o Characteristics: ▯ A random variable having a normal distribution with a mean of 0 and a standard deviation of 1 is said to have a standard normal probability distribution. ▯ The letter z is used to designate the standard normal random variable. ▯ Converting to the Standard Normal Distribution: • We can think of z as a measure of the number of standard deviations x is from m - Using Excel to Compute Standard Normal Probabilities: o Excel has two functions for computing probabilities and z values for a standard normal distribution: ▯ NORM.S.DIST is used to compute cumulative probability ▯ NORM.S.INV is used to compute the z value • The “S” in the function names reminds us that they relate to the standard normal probability distribution. - Standard Normal Probability Distribution Continued: o Example: Pep Zone ▯ Pep Zone sells auto parts and supplies including a popular multi-grade motor oil. When the stock of this oil drops to 20 gallons, a replenishment order is placed. The store manager is concerned that sales are being lost due to stockouts while waiting for a replenishment order. ▯ It has been determined that demand during replenishment lead-time is normally distributed with a mean of 15 gallons and a standard deviation of 6 gallons.The manager would like to know the probability of a stockout during replenishment lead-time. In other words, what is the probability that demand during lead-time will exceed 20 gallons? P(x>20)=? • Step 1: Convert x to the standard normal distribution: ▯ Step 2: Find the area under the standard normal curve to the left of z=.83 ▯ Step 3: Compute the area under the standard normal curve to the right of z=.83 ▯ The manager wants the probability of a stockout to be no more than .05 • First find the z value compliment of the tail area: (1-.05=.95). ▯ Then convert z .05to the corresponding value of x: ▯ By raising the reorder point from 20 gallons to 25 gallons on hand, the probability of a stockout decreases from about .20 to .05. This is a significant difference. o Excel to Computer Normal Probabilities: ▯ NORM.DIST is used to computer the cumulative probability given and x value. ▯ NORM.INV is used to compute the x value given a cumulative probability. - Exponential Probability Distribution: o The exponential probability distribution is useful in describing the time it takes to complete a task. o The exponential random variables can be used to describe: time between vehicle arrivals at a tollbooth, time required to complete a questionnaire, and distance between major defects in a highway. o In waiting line applications, the exponential distribution is often used for service times. o A property of the exponential distribution is that the mean and standard deviation are equal. o The exponential distribution is skewed to the right. Its skewness measure is 2. ▯ Density Function: ▯ Cumulative Probabilities: o Example: Al’s full-service pump: The time between arrivals of cars at Al’s full-service gas pump follows an exponential probability distribution with a mean time between arrivals of 3 minutes. Al would like to know the probability that the time between two successive arrivals will be 2 minutes or less. o Relationship Between the Poisson and Exponential Distributions: Chapter 7: Sampling and Sampling Distributions - Introduction o An element is the entity on which data are collected o A population is a collection of all the elements of interest o A sample is a subset of the population o The sampled population is the population from which the sample is drawn o A frame is a list of the elements that the sample will be selected from o The reason we select a sample is to collect data to answer a research question about a population o The sample results provide only estimates of the values of the population characteristics o The reason is simply that the sample contains only a portion of the population o With proper sampling methods, the sample results can provide good estimates of the population characteristics - Selecting a sample from a Finite Population: o Finite populations are often defined by lists such as: ▯ Organization membership roster ▯ Credit card account numbers ▯ Inventory product numbers o A simple random sample of size n from a finite population of size N is a sample selected such that each possible sample of size n has the same probability of being selected. o Replacing each sampled element before selecting subsequent elements is called sampling with replacement. o Sampling without replacement is the procedure used most often. o In large sampling projects, computer-generated random numbers are often used to automate the sample selection process. ▯ Example: St. Andrew’s College received 900 applications for admission in the upcoming year from prospective students. The applicants were numbered, from 1 to 900, as their applications arrived. The Director of Admissions would like to select a simple random sample of 30 applicants. • Step 1: Assign a random number to each of the 900 applicants (generated by Excel’s RAND function). • Step 2: Select the 30 applicants corresponding to the 30 smallest random numbers. - Sampling from an Infinite Population: o Sometimes we want to select a sample, but find it is not possible to obtain a list of all elements in the population. o As a result, we cannot construct a frame for the population. o Hence, we cannot use the random number selection procedure. o Most often this situation occurs in infinite population cases. o Populations are often generated by an ongoing process where there is no upper limit on the number of units that can be generated. o Some examples of on-going processes, with infinite populations, are: ▯ parts being manufactured on a production line ▯ transactions occurring at a bank ▯ telephone calls arriving at a technical help desk ▯ customers entering a store o In the case of an infinite population, we must select a random sample in order to make valid statistical inferences about the population from which the sample is taken. o A random sample from an infinite population is a sample selected such that the following conditions are satisfied. ▯ Each element selected comes from the population of interest. ▯ Each element is selected independently. - Point Estimation: o Point estimation is a form of statistical inference o In point estimation we use the data from the sample to compute a value of a sample statistic that serves as an estimate of a population parameter. o We refer to x bar as the point estimator of the population mean mu, s is the point estimator of the population standard deviation, and p bar is the point estimator of the population proportion p. - Practical Advice: o The target population is the population we want to make inferences about. o The sampled population is the population from which the sample is actually taken. o Whenever a sample is used to make inferences about a population, we should make sure that the targeted population and the sampled population is in close agreement. - Sampling Distribution of x bar: o The sampling distribution of x bar us the probability distribution of all possible values of the sample mean x bar. o Expected value of x bar: ▯ When the expected value of the point estimator equals the population parameter, we say the point estimator is unbiased. o Standard deviation of x bar: o Finite Population: o Infinite Population: o When the population has a normal distribution, the sampling distribution of x bar is normally distributed for any sample size. o In most applications, the sampling distribution of x bar can be approximated by a normal distribution whenever the sample is size 30 or more. o In cases where the population is highly skewed or outliers are present, samples of size 50 may be needed. - Central Limit Theorem: o When the population from which we are selecting a random sample does not have a normal distribution, the central limit theorem is helpful in identifying the shape of the sampling distribution x bar. ▯ Example: What is the probability that a simple random sample of 30 applicants will provide an estimate of the population mean SAT score that is within +/-10 of the actual population mean m? • Step 1: Calculate the z-value at the upper endpoint of the interval. Z=(1707-1697)/15.96=.63 • Step 2: Find the area under the curve to the left of the upper endpoint. P(z<.63)=.7357 • Step 3: Calculate the z-value at the lower endpoint of the interval. Z=(1687-1697)/15.96=-.63 • Step 4: Find the area under the curve to the left of the lower endpoint. P(z<-.63)=.2643 • Step 5: Calculate the area under the curve between the lower and upper endpoints of the interval. P(- .68<z<.68) = P(z<.68)-P(z<.68) =.7357-.2643=.4714 - Sampling Distribution of p bar: o Expected value of p bar: o Finite Population: o Infinite Population: o Example: Recall that 72% of the prospective students applying to St. Andrew’s College desire on-campus housing. What is the probability that a simple random sample of 30 applicants will provide an estimate of the population proportion of applicant desiring on-campus housing that is within plus or minus .05 of the actual population proportion? ▯ For our example, with n = 30 and p = .72, the normal distribution is an acceptable approximation because: ▯ Step 1: Calculate the z-value at the upper endpoint of the interval. z=(.77-.72)/.082=.61 ▯ Step 2: Find the area under the curve to the left of the upper endpoint. P(z<.61)=.7291 ▯ Step 3: Calculate the z-value at the lower endpoint of the interval. z=(.67-.72)/.082=-.61 ▯ Step 4: Find the area under the curve to the left of the lower endpoint. P(z<-.61)=.2709 ▯ Step 5: Calculate the area under the curve between the lower and upper endpoints of the interval. P(-.61<z<.61) = P(z<.61)- P(z<.-61) = .7291-.2709 = .4582 • Probability that proportion of applicants will be within +/- .05: - Stratified Random Sampling: o The population id first divided into groups of elements called strata. o Each element in the population belongs to one and only one stratum. o Best results are obtained when the elements within each stratum are as much alike as possible. o A random sample is taken from each stratum. o Formulas are available for combining the stratum sample results into one population parameter estimate. o Advantage: If strata are homogeneous, this method is as precise as simple random sampling but with a smaller total sample size. o Example: The basis for forming the strata might be department, location, age, industry, type, etc. - Cluster Sampling: o The population is first divided into separate groups of elements called clusters. o Ideally, each cluster is a representative small-scale version of the population. o A simple random sample of the clusters is then taken. o All elements within each sampled (chosen) cluster from the sample. o Example: A primary application is area sampling, where clusters are city blocks or other well-defined areas. o Advantage: The close proximity of the elements can be cost effective. o Disadvantage: This method generally requires a larger total sample size than a simple or stratified random sampling. - Systematic Sampling: o If a sample size of n is desired from a population containing N elements, we might sample one element for every n/N elements in the population. o We randomly select one of the first n/N elements from the population list. o We then select every n/Nth element that follows in the population list. o This method has the properties of a simple random sample, especially if the list of the population elements is a random ordering. o Advantage: The sample usually will be easier to identify than it would be if a simple random sampling were used. o Example: Selecting every 100 listing in a telephone book after the first randomly selected listing. - Convenience Sampling: o It is a non-probability sampling technique. Items are included in the sample without known probabilities of being selected. o The sample is identified primarily by convenience. o Example: A professor conducting research might use student volunteers to constitute a sample. o Advantage: Sample selection and data collection are relatively easy. o Disadvantage: It is impossible to determine how representative of the population the sample is. - Judgment Sampling: o The person most knowledgeable on the subject of the study selects elements of the population that he or she feels are most representative of the population. o It is a non-probability sampling technique. o Example: A reporter might sample three or four senators, judging them as reflecting the general opinion of the senate. o Advantage: It is a relatively easy way of selecting a sample. o Disadvantage: The quality of the sample results depends on the judgment of the person selecting the sample. Chapter 8: Interval Estimation - Margin of Error and the Interval Estimate o A point estimator cannot be expected to provide the exact value of the population parameter o An interval estimate can be computed by adding and subtracting a margin of error to the point estimate. ▯ Point Estimate +/- Margin of Error o The purpose of an interval estimate is to provide information about how close the point estimate to the value of the parameter. o The general form of an interval estimate of a population mean is x bar +/- Margin of Error. - Interval Estimate of a Population Mean: σ Known o In order to develop an interval estimate of a population mean, the margin of error must be computed using either: ▯ The population standard deviation σ, or ▯ The sample standard deviation σ o σ is rarely known exactly, but often a good estimate can be obtained based on historical data or other information. o We refer to such cases as the σ known case. o There is a 1-a probability that the value of a sample mean will provide a margin of error of za 2σx bar o Interval Estimate of μ: o Values of z a/2for the Most Commonly Used Confidence Levels: o Meaning of Confidence: ▯ Because 90% of all the intervals constructed using x bar +/- 1.645σ x barill contain the population mean, we say we are 90% confident that the interval x bar +/- 1.645σ includes x bar the population mean μ. ▯ We say that this interval as been established at the 90% confidence level. ▯ The value .90 is referred to as the confidence coefficient. o Example: Discount Sounds: Discount Sounds has 260 retail outlets throughout the United States. The firm is evaluating a potential location for a new outlet, based in part, on the mean annual income of the individuals in the marketing area of the new location. A sample of size n = 36 was taken; the sample mean income is $41,100. The population is not believed to be highly skewed. The population standard deviation is estimated to be $4,500, and the confidence coefficient to be used in the interval estimate is .95. ▯ 95% of the sample means that can be observed are within +/- 1.96σ of the population mean μ. x bar ▯ The margin of error is: ▯ Interval estimate of μ is: • We are 95% confident that the interval contains the population mean. ▯ In order to have a higher degree of confidence, the margin of error and thus the width of the confidence interval must be larger. o Adequate Sample Size ▯ In most applications, a sample size of n=30 is adequate. ▯ If the population distribution is highly skewed or contains outliers, a sample size of 50 or more is recommended. ▯ If the population is not normally distributed but is roughly symmetric, a sample size as small as 15 will suffice. ▯ If the population is believed to be at least approximately normal, a sample size of less than 15 can be used. - Interval Estimate of a Population Mean: σ Unknown o If an estimate of the population standard deviation σ cannot be developed prior to sampling, we use the sample standard deviation s to estimate σ. o This is the σ unknown case. o In this case, the interval estimate for μ is based on the t distribution. o T- Distribution: ▯ William Gosset, writing under the name “Student”, is the founder of the t distribution. ▯ Gosset was an Oxford graduate in mathematics and worked for the Guinness Brewery in Dublin. ▯ He developed the t distribution while working on small-scale materials and temperature experiments. ▯ The t distribution is a family of similar probability distributions. ▯ A specific t distribution depends on a parameter known as the degrees of freedom. ▯ Degrees of freedom refer to the umber of independent pieces of information that go into the computation of s. ▯ At t distribution with more degrees of freedom has less dispersion. ▯ As the degrees of freedom increases, the difference between the t distribution and the standard normal probability distribution becomes smaller and smaller. ▯ For more than 100 degrees of freedom, the standard normal z value provides a good approximation to the t value. ▯ The standard normal z values can be found in the infinite degrees row of the t distribution table. ▯ Interval Estimate: o Example: Apartment Rents: A reporter for a student newspaper is writing an article on the cost of off-campus housing. A sample of 16 one-bedroom apartments within a half-mile of campus resulted in a sample mean of $750 per month and a sample standard deviation of $55. Let us provide a 95% confidence interval estimate of the mean rent per month for the population of one- bedroom apartments within a half-mile of campus. We will assume this population to be normally distributed. ▯ At 95% confidence, a = .05, and a/2 = .025. ▯ .025is based on n - 1 = 16 - 1 = 15 degrees of freedom. ▯ In the t distribution table we see that.025= 2.131. ▯ Interval Estimate: ▯ We are 95% confident that the mean rent per month for the population of one-bedroom apartments within a half-mile of campus is between $720.70 and $779.30. o Adequate Sample Size: ▯ Usually, a sample size of n=30 is adequate when using the expression below to develop an interval estimate of a population mean: • ▯ If the population distribution is highly skewed or contains outliers, a sample size of 50 or more is recommended. ▯ If the population is not normally distributed but is roughly symmetric, a sample size as small as 15 will suffice. ▯ If the population is believed to be at least approximately normal, a sample size of less than 15 can be used. - Summary of Interval Estimation Procedures for a Population Mean: - Sample Size for an Interval Estimate of a Population Mean: o Let E= the desired margin of error. o E is the amount added to and subtracted from the point estimate to obtain an interval estimate. o If a desired margin of error is selected prior to sampling, the sample size necessary to satisfy the margin of error can be determined. o Margin of Error: o Necessary Sample Size: o The Necessary Sample Size equation requires a value for the population standard deviation σ. o If σ is unknown, a preliminary or planning value for σ can be used in the equation. ▯ 1. Use the estimate of the population standard deviation computed in a previous study. ▯ 2. Use a pilot study to select a preliminary study and use the sample standard deviation from the study. ▯ 3. Use judgment or a “best guess” for the value of σ. o Example: Discount Sounds: Recall that Discount Sounds is evaluating a potential location for a new retail outlet, based in part, on the mean annual income of the individuals in the marketing area of the new location. Suppose that Discount Sounds’ management team wants an estimate of the population mean such that there is a .95 probability that the sampling error is $500 or less. How large a sample size is needed to meet the required precision? o At 95% confidence, z .025= 1.96. Recall that σ= 4,500. ▯ A sample size of 312 is needed to reach a desired precision of +/- $500 at 95% confidence. - Interval Estimate of a Population Proportion: o The general form of an interval estimate of a population proportion is: o The sampling distribution of p bar plays a key role in computing the margin or error for this interval estimate. o The sampling distribution of p bar can be approximated by a normal distribution whenever np>/5 and n(1-p)>/5. o Normal Approximation: o Interval Estimate: o Example: Political Science, Inc: Political Science, Inc. (PSI) specializes in voter polls and surveys designed to keep political office seekers informed of their position in a race. Using telephone surveys, PSI interviewers ask registered voters who they would vote for if the election were held that day. o In a current election campaign, PSI has just found that 220 registered voters, out of 500 contacted, favor a particular candidate. PSI wants to develop a 95% confidence interval estimate for the proportion of the population of registered voters that favor the candidate. o PSI is 95% confident that the proportion of all voters that favor the candidate is between .3965 and .4835. - Sample Size for an Interval Estimate of a Population Proportion: o Margin of Error: o Solving for the sample size n we get: o Necessary Sample Size: o Example: Political Science Inc.: Suppose that PSI would like a .99 probability that the sample proportion is within + .03 of the population proportion. How large a sample size is needed to meet the required precision? (A previous sample of similar units yielded .44 for the sample proportion.) o At 99% confidence, z .005 2.576. Recall that p*= .44. o A sample size of 1817 is needed to reach a desired precision of +/- .03 at 99% confidence. ▯ We used .44 as the best estimate of p in the preceding expression. If no information is available about p, then .5 is often assumed because it provides the highest possible sample size. If we had used p = .5, the recommended n would have been 1843.
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'