Fall 2008 Stats 250 Exam 1 with detailed Solutions and Explanations STATS 250

Statistics 350 Fall 2008 Exam 1 Explanations 1. Observe that the distribution is roughly bell-shaped. a. The bulk of the data lies in the 100-120 range, so the median should too and only 110 is reasonable. Another way to see this is to realize that in a symmetric distribution, the mean and median should be the same. b. Since the histogram is bell-shaped, we can apply the empirical rule. Roughly 95% of the measurements lie between 80 and 140. By the empirical rule, these numbers should each be 2 standard deviations from the mean making them 4 standard deviations apart. So, the standard deviation should be (140-80)/4 = 15. c. The sketch is to the right. Since the histogram is bell shaped the quantiles should be approximately equal to the normal quantiles. 2. a. Owning a computer is a categorical variable as the response (‘yes’ or ‘no’) is not a measurement. Time spent using a computer is a quantitative continuous variable since it can be measured and any measurement between 0 and the number minutes in a week makes sense. Which social networking sites a respondent uses is a categorical variable. b. i. The shortest list had about 24 friends on the social networking account, while 75% of the list had at least 177 friends. The value 24 may be an outlier, but is still the shortest listed. In a boxplot, 75% of the data is above (at least) the lower quartile (bottom of the box), here 177. Any number 20-25 is fine for the first blank and 175-180 would be reasonable for the second. ii. x = 226. Read this from the table. We use x instead of µ because it is a sample statistic. iii. Correct - On average, the number of friends on a social networking account varied from the mean by about 87 friends. The second choice is incorrect because a standard deviation is not the average distance between data points. The first choice is too vague and does not make it clear that the distances being ‘averaged’ are those between data points and the mean. 9 3. a. H :op = 0.50 and H :ap > 0.50 where p is the population proportion of UM college students that would vote for Obama (or the proportion of all UM college students that would vote for Obama). b. Since the sample size is larger (more than 10 responses for each candidate) we may use the ̂ large sample z-test. The test statistic is: z =√ ( ) √ ( ) = 2.85. Note: = .5 is from the null hypothesis while ̂ = 1578/3000 = .526 is the sample proportion. The p-value is P(Z > 2.85) = 1 – .9978 = 0.0022. Refer to the picture below. c. Decision: Reject H .oConclusion: There is sufficient evidence to say that a majority of all UM students (represented by this sample) would vote for Obama. … or … There is sufficient evidence to conclude that the population proportion of all UM students who would vote for Obama is greater than 0.5. Make sure to distinguish the decision and the conclusion. Also, be sure your conclusion is a statement about the population parameter and in the context of the problem. d. If the decision is correct it would be a type 1 error. Recall, a type 1 error is incorrectly rejecting the null hypothesis and that our decision was to reject H . 0 4. a. i. X, the number of red candies in a bag of 15 M&M candies, is Binomial (n = 15, p = 0.20). Each the n = 15 candies is a ‘trial’ with independent chance of ‘success’ (being red) of p = 0.20. ii. E(X) = np = 15(0.20) = 3 candies. 15 0 15 iii. P(X ≥ 1) = 1 – P(X = 0) = 1– (0.20) (0.80) = 1 – (0.0352) = 0.9648 0 Use the complement rule and the binomial probability formula (see page 1 of the yellow card). b. i. Since we have a larger sample size and 100(0.20) = 20 ≥ 10 and 100(0.80) = 80 ≥ 10, the distribution of X will be approximately normal with a mean of 100(0.20) = 20 and a standard deviation of 100(0.20)(0.80) 16 = 4, which can be expressed as N(20,4). ii. P(X ≤ 15) = P(Z ≤ -1.25) = 0.1056. Here z = (15 – 20)/4 = -1.25. iii. Less than 0.5 since 30 is greater than 20 (the mean number of red candies). A picture may help. 10 5. 0.87 1 0.87 a. 0.87 1.96 3824 0.87 1.96(0.0054 ) 0.87 0.0107 (0.8593,0.8807 ) b. Yes. The interval of reasonable values are all greater than 0. … or … All reasonable estimates of 2 – 1 are greater than zero. 6. We have to worry about nonresponse bias due to the large number who did not respond. 7. No. This is a time plot, not a histogram. 8. a. True. The significance level, α, is the probability of incorrectly reject0ng H . The more severe this type 1 error, the lower we would set the significance level. b. False. A p-value is not the probability that0H is true. c. True. Consider the picture to the right. d. False. When using the Z statistic to test a hypothesis about a population proportion we use the normal distribution as an approximation to the population distribution which is binomial. 9. (0.371, 0.629) 99%, Interval (0.402, 0.598) 95%, Interval (0.418, 0.582) 90% The higher the confidence level, the wider the interval. 10. This is an observational study since randomization was not used. Number of sick days is the response variable since it is what we our primarily interested in. Age of employee is the explanatory variable. We want to know if age explains number of sick days. 11. Compute 400 – 180 – 40 – 72 = 108 people who did not enjoy the movie and did not buy popcorn. It is also a good idea to write in the marginal totals: Enjoyed the movie? Yes No Total Yes 180 72 252 Bought popcorn? No 40 108 148 Total 220 180 400 a. P(Enjoy) = #Yes Enjoyed/Total = 220/ 400 = 0.55 b. P(Enjoy | Popcorn) = 180/252 = 0.7143 -- use only the row for ‘Yes, bought popcorn’. c. This is the ‘No’ / ‘No’ entry, 108. Could also do: 400 – 220 = 180; so 180 – 72 = 108. d. Yes, these results do suggest that popcorn enhances the moviegoers experience because … the probability of enjoying the movie was higher among those who bought popcorn (0.71) versus the overall enjoyment rate of 0.55 … or … P(enjoy | popcorn) = 0.71 is higher than P(enjoy) = 0.55 … or … P(enjoy | popcorn) = 0.71 > P(enjoy | no popcorn) = 40/148 = 0.27 11 12. a. This is an experiment because randomization was used. b. Estimate of common population proportion is 0.40: c. Symbol Test Statistic = _z_ = ___ 1.41___ p-value = ___ 0.0793_ d. The statistical decision at a 5% significance level is: Fail to reject o as 0.0793 > 0.05 e. There is insufficient evidence to say that the new drug has a higher pain relief rate than the placebo in the population. However, the results were marginally significant and a second study involving more people may offer different results. f. Correct: Use a 10% significance level Increasing alpha will give a test with higher power, (but also raises the chance of a type 1 error.) Use more patients Increasing the sample size is the best way to increase the power of the test. Incorrect: Repeat the study many times Repetition does not increase the power of the test, since we must correct for multiple comparisons. 12 Statistics 350 Fall 2008 Exam 1 1. A doctor collects a large set of heart rate measurements that approximately follow a normal distribution. He provides you with the following summaries from SPSS: Statistics Heart Rate N Valid 1000 Missing 0 Mean 110 Minimum 65.0 Maximum 155 a. Which of the following is most likely to be the median of the distribution? Clearly circle your one answer. [1] 5 153590111035 b. Which of the following is most likely to be the standard deviation of the distribution? [2] Clearly circle your one answer. 5 153590111035 c. What would a qq-plot of this data look like (roughly)? [1] Expected Normal Value Observed value 127 2. Today’s typical undergraduate student is often characteri zed as preferring teamwork, experiential activities, and the use of technology. An ECAR (Educause Center for Applied Research) study was published on technology use among undergraduate students. The study used survey and interviewer data to create a portrait of today’s students’ experiences with and skill using information technology. a. Listed below are some of the response variables that were measured in this study. For each of these determine whether it is categorical, quantitative discrete, or quantitative continuous. [1 pt each] Clearly circle your answer. • Technology ownership: Do you own a computer? categordialrequantitative te qua cnnttntivoeus • Time (per week in minutes) spent using a computer for writing documents (word processing). categordialrequantitative te qua cnnttntivoeus • Which social networking site(s) are you a member? (facebook, myspace, friendster, etc.) categordialrequantitative te qua cnnttntivoeus b. Another question on the survey asked for the number of 400 friends on their social networking account. Below and at the right is a summary of some responses. Use these 350 results to answer the following questions. Statistics 300 Number of Friends Mean 226 250 Std. Deviation 87 i. Complete the sentence: 200 [2] The shortest list had about ___________ friends 150 on the social networking account, while 75% of the lists 100 had at least ____________ friends. ii. Report the average number of friends on a social 50 networking account. Include the appropriate symbol in your answer. 0 [2] = Number of Friends iii. The standard deviation reported above is 87. Consider the following interpretations of this value and clearly circle the correct interpretation(s). [2] • The number of friends on a social networking account varies by about 87, on average. • The average distance between the numbers of friends on social networking accounts is roughly 87 friends. • On average, the number of friends on a social networking account varied from the mean by about 87 friends. 128 3. Obama versus McCain : A mock presidential election is planned on the UM college campus. An enterprising Statistics 350 student plans to use the results of the mock election to test the hypothesis that Obama would have a majority of votes among all UM college students. The significance level of the test is set at 5%. A random sample of 3000 college students at the UM was se lected to take part in this mock election which resulted in 1578 voting for Obama. a. State the hypotheses to be tested and completely define the parameter. [4] H o:__________________________ H a___________________________ where the parameter _______ is defined (in the context of this problem) as … b. Provide the value of the test statistic (including its symbol) and the corresponding p-value. Include a sketch to show how the p-value is found. [5] Sketch (include labels): l o b m y S ↓ Test Statistic = ______ = ______________________ _______________________________ p-value = ________________________ c. Give the decision and corresponding conclusion by circling the appropriate statements: [2] The statistical decision at a 5% significance level is:Reject Ho Fail to reject o Therefore, there is is not sufficient evidence to say that, a majority of all students (represented by this sample) would vote for Obama. d. If this decision in part (c) is incorrect, it would be a (circle one)Type 1 Type 2 error. [1] 129 4. The website http://global.mms.com/cai/mms/faq.html reports that in a package of milk chocolate M&M's®, 20% of the candies are red. Let X = the number of redndies in a randomly selected bag of milk chocolate M&M's®. a. Suppose you randomly select one small bag that contains 15 M&M candies. i. What is the distribution of X? Include all relevant features to completely specify the distribution. [3] ii. How many red candies would you expect to find in this bag? [1] AF_wal_:_______________ iii. What is the probability of finding at least one red candy? [3] AF_wal_:_______________ b. Suppose now that you randomly select a larger bag (one bag is never enough...) with 100 M&M candies. i. What is the approximate distribution of X? Include all relevant features to completely specify the distribution. [3] ii. What is the approximate probability of finding no more than 15 red candies? [3] AF_wal_:_______________ iii. Without doing any calculations, what is the approxim ate probability of finding more than 30 red candies? Circle your one answer. [1] Less than 0.5 Equal to 0.5 More than 0.5 130 5. Appreciation for cTools: In a recent survey, a random sample of 3824 UM students resulted in 87% agreeing that cTools is a valuable tool. a. Calculate the 95% (general, not conservative) confidence interval for estimating the population proportion of all UM students who agree that cTools is a valuable tool. [3] AF_walr_:______________ b. In a similar survey of instructors, 81% of the 1639 in structors agreed that cTools is a valuable tool. The 6% difference in rates (87% for students less 81% for instruct ors) would lead to a 95% confidence interval for the difference in population rates (students less instructors) of (3.8%, 8.2%). Based on this interval, does there appear to be a higher population proportion of students who appreciate cTools as compared to that for all instructors? [2] Circle one: Yes No Explain briefly: 6. For a survey about American diets a random sample of 1000 people were contacted. Of the 1000 people, 140 people completed the questionnaire. The results of this study, if applied to all Americans, are questionable because of … (circle one) [1] a large margin of error nonresponse bias selection bias response bias 7. Blackberry subscribers was the title of this USA Today Snapshot regarding Blackberry email devices. Based on the graph, is it appropriate to say that the distribution of the number of Blackberry subscribers is right skewed? [2] Circleone: Yes No Briefyplain: 131 8. Determine whether each statement is True or False. Clearly CIRCLE your answer. [1 point each] a. The significance level, α, is determined prior to the study by considering the consequence of incorrectly rejecting H 0. True False b. If the test of H0: p = 0.75 versus Ha: p > 0.75 resulted in a p-value of 0.083, then the probability that H is true is 0.083. True False 0 c. If the time to wait for pharmacy help has a uniform distribution from 0 minutes to 20 minutes, then 25% of the customers are expected to wait more than 15 minutes. True False d. One condition for using a Z statistic for testing hypotheses about a population proportion p is that the population has a normal distribution. True False 9. A researcher selects 100 subjects at random from a population, observes 50 successes, and calculates three confidence intervals. The confidence levels are 90% , 95%, and 99%, and the intervals are (0.402, 0.598), (0.371, 0.629), and (0.418, 0.582). Match each interval with its confidence level. Write your values clearly. [2] Interval (0.371, 0.629) ________ % confidence level Interval (0.402, 0.598) ________ % confidence level Interval (0.418, 0.582) ________ % confidence level 10. Read the following description and indicate whether it is an observational study or an experiment, and which variable is the response and which is the explanatory variable. An employer is interested in determining if there is a relationship between the age of his employees and the number of sick days that the employee takes. In order to study this relationship, he collects this information from the personnel file for a random sample of employees. [2] This is an (circle one) observational study experiment. Number of sick days is the (circle one) response explanatory variable. Age of employee is the (circle one) response explanatory variable. 132 11.“Does popcorn enhance a movie or not?” In August 2008, the BBC reported that the UK's largest arthouse cinema chain is planning to take popcorn off the concession stand menu. A spokesman for the movie chain claims that many people have asked them to ban popcorn, citing th at it is messy, smelly, and loud. In an effort to save her favorite snack from being banned, Jessica decided to take a survey of people at one of the chain’s theaters on a busy Friday night. The following table partially represents the data that she collected. Enjoyed the movie? Yes No Yes 180 72 Bought popcorn? No 40 a. Suppose that the total number of people that Jessica surveyed is 400. What is the probability that someone randomly selected from this sample of moviegoers enjoyed the movie? [2] A_iwal_:______________ b. What is the probability that someone randomly selected from the sample enjoyed the movie, given that he or she bought popcorn? [2] A_iwal_:______________ c. How many of the people that Jessica surveyed did not buy popcorn and did not enjoy the movie? [2] A_iwal_:______________ d. Assuming this is sample is representative of the movie going population, do these results support Jessica's goal of convincing the theater owners not to ban popcorn? Do the above results suggest that popcorn enhances the moviegoer’s experience? Give your answer and comp leting one of the following statements and include numerical support for your conclusion. [2] Yes, these results do suggest that popcorn enhances the moviegoers experience because … OR No, these results do not suggest that popcorn enhances the moviegoers experience because … 133 12. Pain-Relief: You are on a project team to consider whether yo ur company’s new pain-relief drug (Drug X) should go to market. A study was conducted to compare the ne w pain-relief drug against a placebo. The measured response is whether the pain from a dental procedure wa s relieved in 10 minutes (yes or no). The patients were assigned to one of the two treatment groups using a 2:1 randomization plan. The following results are to be used to test the hypotheses H o: p1= p2(no difference in the population pain relief rates) versus H a: p > p (the new drug has a higher pain relief rate over placebo). Use a 5% significance level. Study Results Group 1 = Drug X Group 2 = Placebo Number of patients assigned 100 50 Number of patients that had pain relieved in 10 minutes (%) 44 (44%) 16 (32%) a. This is an (circle one) observational study experiment. [1] b. In computing the test statistic, we assume H ois true, that is, 1 = 2 = p (the common population proportion). Using the results, provide the estimate of that common proportion. [2] An_iwal_:______________ c. Provide the value of the test statistic (including its symbol) and the corresponding p-value. Include a sketch to show how the p-value is found. [4] Sketch (include labels): l o b m y S ↓ Test Statistic = ______ = ______________________ _______________________________ p-value = ________________________ d. Give the decision and corresponding conclusion by circling the appropriate statements: [1] The statistical decision at a 5% significance level is: Reject H o Fail to rejectoH e. You are asked to report the conclusion of this study to the Vice President of your company. With the significance level of 5%, what should your conclusion be? As your boss appreciates brevity, provide 1-2 well structured sentences. [2] f. What could be done in a follow up study to produce a test with higher power? Circle all correct answers. [2] Repeat the study many times Use a 10% significance level Use more patients Stats 350 Fall 2009 Exam 1 134

