MSIT 3000 Exam II Study Guide
MSIT 3000 Exam II Study Guide MSIT 3000
Popular in Statistical Analysis for Business
Popular in Department
verified elite notetaker
This 8 page Study Guide was uploaded by Jessica Su on Tuesday October 18, 2016. The Study Guide belongs to MSIT 3000 at University of Georgia taught by Megan Lutz in Fall 2016. Since its upload, it has received 266 views.
Reviews for MSIT 3000 Exam II Study Guide
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 10/18/16
MSIT 3000 Exam II Study Guide Confidence Interval for Proportions (Qualitative data) • Sampling distribution- the distribution of a sample statistics, describes the long-run behavior of a statistics o Sampling distribution for proportions (????)- collect from one sample § Use a simulation to collect imaginary data on a computer and understand what other outcomes for the sample statistics might be § As we repeat simulations enough time, we will have many ????, and can plot them for a basic understanding of the distribution • How to describe a proportion data o Shape: unimodal, bimodal, symmetric, left or right skewed o Center: the average of ???? (mean) o Spread: Standard Deviation for ???? = ???????? ???? § The SD is also the standard error § Tells the variability of ???? • Conditions: o Independence Assumption: the sampled values must be independent of each other § Randomization- collected the data through SRS or other unbiased sampling § 10% condition- sample < 10% of population o Sample size assumption- to obtain approximately normal, the sample size, n, must be sufficiently large § Success/Failure condition- at least 10 successes and 10 failures in the population • np ≥ nq ≥ 10 • Confidence Interval- a range of value calculated from a sample that can be used to estimate a population parameter o Confidence level- a value chosen to represent how confident we are in the method § ___% of all sample of size ___ produce intervals that contains the ____. ???????? o ???? ± ???? ∗ ???? o We are 98% confident that the true proportion of _______ is between the interval_______. • Margin of Error o ???? ∗ ???????? ???? o Increase of sample size, n, will result a bigger margin of error • Sample Size ????∗ o n = ( ) ???????? ???????? o In order to find the biggest n, use 0.5 for both p and q. o Always round up! Ex. 156.2 will become 157 • Remember: o P is the population parameter, fixed and unknown o ???? is the sample proportion, known and changes for each sample Hypothesis Test for Proportion [H] State your hypothesis, define the parameter in context Ho: p = ????▯, comes from the context of problem Ha: p < ????▯, left-tailed test OR p > ???? , right-tailed test ▯ OR p≠ ???? ▯ two-tailed test (after you found one area, multiply it by two) [MA] State (and check) your assumption • Independence o Randomization o 10% condition • Success/Failure: at least 10 successes and 10 failures o np ≥ nq ≥ 10 [M] Mechanics • Calculate test statistics ????▯???? o ???? ???????????????????? = ???????? ???? • Calculate p-value o From JMP [C] Conclusion • If p-value < ????, reject Ho; too small to be true • If p-value > ????, failed to reject Ho; large enough that it is true Example: A random sample of 1000 workers indicated that 488 have invested in an individual retirement account (IRA). National data from 10 years ago suggested that 44% of workers invested in IRAs. Has the proportion changed in the past 10 years? Use a = 0.05. [H] • Ho: p = 0.44, where p is the true proportion of workers invested in IRAs • Ha: p ≠ 0.44 [MA] • Independence o Randomization: met o 10% condition: 1000 is less than 10% of all workers in Georgia • Success/Failure o (0.44)(1000) > 10 o (0.56)(1000) < 10 [M] ▯.▯▯▯▯▯.▯▯ • ???? = = 3.0579 (▯▯▯▯▯(▯.▯▯) • Go to JMP and do a two-tailed test (bc p ≠ 0.44) • x > q = 3.0579 • Probability = 0.0011 • 0.0011 * 2 = 0.0022 • That’s your p-value! [C] • Because p-value is 0.0022 < ???? = 0.05, we reject the null hypothesis. There is sufficient evidence to indicate that the proportion of workers who invest in IRAs has changed in the past 10 years. • We cannot say it increased because our hypothesis test was a two tailed test. If we want to say something has increased, we will need to have a right tailed hypothesis test. Confidence Intervals for Means (quantitative data) • Sampling distribution = ????, the average of repeated samples o a t variable, defined by degrees of freedom § df = n-1 • Population distribution- distribution of all individuals o Ex. When rolling a die, the population distribution is #1, 2, 3, 4, 5, and 6 • How do we describe a mean data? o Shape o Center of the sampling distribution for ???? is ????, the population mean ???? o Spread: Standard Error = ???? § S is the standard deviation and changes from sample to sample • Central Limit Theorem- if sample size is large enough, the mean of all samples from the same population will be approximately equal to the mean of the population • Assumptions: o Independence § Randomization condition (same as proportions) § 10% condition (same as proportion) o Normal Sampling Distribution § CLT: n > 30 OR § Sample is unimodal and symmetric OR § Population is approximately normal • Confidence Interval: ???? ± ???? ( ) ???? ????▯???? • t-score = ????/ ???? • ???? is found in JMP; t-distributionà input value and calculate values ???? ???? • Margin of Error = ???? ???? ???????? o Or n= ( ???????? ) Term Population Sample Sampling Distribution Mean µ ???? µ ???? Standard Deviation σ s (Also the Standard Error) ???? • Example 1: After collecting data from thousands of purchases at a large grocery store, a reasonable estimate of the true population mean is $65.45. o If a random sample of 55 purchases from this store is taken and the standard deviation is $20.67, what is the probability that the average of this sample would be greater than 70? Given: µ = 65.45, n = 55, s = 20.67, ask P(???? > 70) ▯▯▯ ▯▯▯▯▯.▯▯ o t-score =▯/ ▯ = ▯▯.▯▯/ ▯▯.▯▯1.6325 o df = 55-1 = 54 o Go to t-distribution JMP, input value and calculate probability § df=54, x > q, value=1.6325 § Probability = 0.0542 • Example 2: How many need to be sampled to be within 2.5 of the true mean with 90% confidence and a standard deviation of 27? o Given: ME = 2.5, 90% confident = ???? = 1.645, ????= 27 ▯ ▯▯ (▯.▯▯▯)(▯▯) o n= ( ▯▯ ) = ( ▯.▯ ) = 315.6 = 316 (always round up!) Hypothesis Test for Means [H] ???? ▯ ???? = ???? ▯ ????▯: ???? > ???? ▯ OR ???? < ???? OR ▯ ???? ≠ ????▯ [MA] § Independence o Randomization o 10% condition § Normality: o n > 30 OR o sample is unimodal and symmetric OR o Population is approximately normal [M] ▯▯▯▯ § t-score = ▯ ▯ § p-value is based on t-distribution with n-1 df; direction comes from▯???? [C] § If p-value < ????, then we reject ▯ § If p-value > ????, failed to rejec▯ ???? Example: The technology committee at a small college has stated that the average time spent by students per lab has increased from 55 minutes. Therefore, they are recommending an increase in lab fees. The Student Council would like to verify the committee's claim before agreeing to an increase in lab fees. They randomly select 28 student lab visits and record the time in minutes. Data are bell- shaped with a mean of 62.67 minutes and a standard deviation of 23 minutes. Use α = 0.05 Given: ???? = 55 n = 28, ???? = 62.67, s = 23, α = 0.05 [H] ???? : ???? = 55 ???? means the average time spent by student per lab ▯ ???? ▯ ???? > 55 [MA] § Independence: yes o Randomization: yes o 10%: yes, 28 students are less than 10% of all lab students § Normality: yes o “Data are bell-shaped” [M] ▯▯▯ ▯▯.▯▯▯▯▯ § test value = = = 1.7646 ▯/ ▯ ▯▯/ ▯▯ § P-value: do it in JMP, x > q, value = 1.7646 o Probability = 0.0445 [C] We reject the null hypothesis. Since 0.0445 is less than α = 0.05, we have sufficient evidence that the time spent by student per lab has increased. The technology committee should increase the lab fee. What happens if α=0.01? We fail to reject the null hypothesis. Since 0.0445 is greater than α=0.01, we have insufficient evidence that the time spent by student per lab has increased. Lab fee should not be increased. State of the Real World H 0rue H 0alse H 0rue Type II Error (Fail to Reject) Correct Decision (β) T i H False Type I Error Correct Decision D 0 (Reject) (α) (Power) Example: A statistics professor is concerned about high withdrawal rates in an introductory course. A salesperson suggests that he try a statistics software package that gets students more involved with computers. The software is expensive so the salesperson lets the professor pilot it for a semester to see if withdrawal rates decrease. At the end of the semester, the professor will purchase the software or return it. Consider if the software does decrease withdrawal rates. a. State the null and alternative hypothesis Null: Withdraw rate did not change Alternative: Withdraw rates decreased b. Type I Error a. We say withdraw rates decreased due to the new statistics software, but it actually did not change the rate. So we pay for the fee, and learned the software was for nothing c. Type II Error a. Failed to reject the null, meaning we conclude that the withdraw rate did not change, but it actually changed. So the school missed an opportunity to help students to learn better. d. Write out the null and alternative hypothesis and the chart when identifying the error Two-Sample (Independent sample) t-test • Independent sample- the people we collected from two data are different [H] ???? ▯ ????▯− ???? =▯0 (????▯= ???? )▯ ???? ▯ ????▯− ???? >▯0 (????▯> ???? ▯ ???? − ???? < 0 (???? < ???? ) ▯ ▯ ▯ ▯ ????▯− ???? ≠▯0 (????▯≠ ???? ▯ [MA] • Independence o Randomization o 10% condition • Normality o n > 30 OR sample is unimodal and symmetric OR population is approximately normal o BOTH samples have to meet this condition [M] Question is provided, no need to do any math [C] Same as before Example: Consumers increasingly make food purchases based on nutritional values. In the July 2011 issue, Consumer Reports compared the calorie content of meat and beef hot dogs. The researchers purchased samples of several different brands. The meat hot dogs averaged 111.7 calories compared to 135.4 for the beef hot dogs. Is there a difference in mean calorie content? A random sample yields a p-value of 0.024. Let α = 0.05. What is your conclusion? § Question to ask yourself: Is this an independent data, meaning can the average of meat hot dogs be changed by beef hot dog? If not, then continue [H] ???? ▯ ????▯− ???? =▯0 ???? ▯ ???? meat; ???? ▯???? beef ???? ▯ ????▯− ???? ≠▯0 [MA] Conditions are all met since the data are already given [M] Already given [C] We reject the null hypothesis since p-value is 0.024 is less than α = 0.05. There is sufficient evidence to indicate that the average calories of meat hot dogs are different from beef hot dogs’. Paired t-test (Dependent Sample) • Dependent sample- measure the same individual or object twice [H] ???? : ???? = 0 ▯ ▯ ???? ▯ ????▯≠ 0 Be careful with order of subtraction to ge▯ ???? ????▯< 0 ????▯> 0 Example: Consumers complain that airline fares are on the rise even as complaints about on-time performance, canceled flights and lost luggage increase. A recent study sampled prices for flights originating in 10 different cities. Prices were compared for 2011 and 2013 to see if there was a significant difference between the 2 years. • This is a dependent sample, because the same sample is collected in 10 different cities in 2011 and 2013 [H] ???? ▯ ????▯= 0 There is no difference of prices for flights originating in 10 different cities between 2011 and 2013 ???? : ???? ≠ 0 There is a difference between 2 years. ▯ ▯ a. Given t = -0.67 and p-value = 0.535, state your conclusion. Failed to reject the null hypothesis, because p-value is larger than 0.05. There is insufficient evidence that there is a difference of prices for flights between 2011 and 2013. Confidence Interval for the Difference Between Two Means • Remember: o If you are dealing with proportions, categorical, rates, use z, which is left side of the equation sheet o If you are dealing with means and quantitative data, then use t, which is the right side of the formula sheet § With the exception of finding n. We use z for that
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'