Fall 2006 Stats 250 Exam 1 with detailed Solutions and Explanations
Fall 2006 Stats 250 Exam 1 with detailed Solutions and Explanations STATS 250
Popular in Introduction to Statistics
Popular in Statistics
verified elite notetaker
This 11 page Bundle was uploaded by Debra Tee on Monday September 26, 2016. The Bundle belongs to STATS 250 at University of Michigan taught by Brenda Gunderson in Fall 2016. Since its upload, it has received 20 views. For similar materials see Introduction to Statistics in Statistics at University of Michigan.
Reviews for Fall 2006 Stats 250 Exam 1 with detailed Solutions and Explanations
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 09/26/16
Statistics 350 Fall 2006 Exam 1 Solutions (Answers with Explanations) 1. a. This is an experiment because the migraine patients were randomly divided into groups. b. Yes, the control group is the one in which the patients received no treatment. c. The number of headache days is the response variable since we are measuring the effectiveness of acupuncture and sham acupuncture by the extent to which they reduce the occurrence of migraine headaches. d. The variable “age” would be an example of a(n) confounding (or lurking) variable. The effects attributed to (sham) acupuncture may in fact be due to age in the situation described here. 2. None of these are appropriate because this is a bar graph while the other choices are all words used to describe a histogram. 3. The value is a population parameter because it represents the entire staff. 4. a. Delivery time is a continuous, quantitative variable since it is measurement. st rd b. To calculate the IQR determine the 1 (Q1) and 3 (Q3) quartiles (respectively, the bottom and top of the box) from the DFW Express box-plot. These values are Q1 = 30 and Q3 = 50. Calculate: IQR = Q3 – Q1 = 50 – 30 = 20 minutes. c. The upper boundary is 1.5*IQR above Q3. Compute: Q3 + 1.5*IQR = 50 + 1.5*20 = 80 minutes. d. The Q1 [bottom edge of the box] of the delivery times by Metro is equal to the median [middle line] of delivery times by DFW express. e. For Carborne Carrier, 25% of the deliveries took longer than 58 minutes (57 and 59 are okay too). We are looking for the top 25%, i.e. the values above the upper quartile, Q3. f. For DFW Express, we see that 41 minutes is the median. Hence, 50% of the 56 total deliveries, or 28 deliveries (= .5 * 56), were made in under 41 minutes. See “Count (n)” in the table to find the total deliveries. g. Recall: z = ( x – ̅ ) / s so that x = ̅ + s*z. From the table we know ̅ = 52.0 and s = 15.0. Using these values along with z = -1.4 compute: x = 52.0 + 15.0*-1.4 = 31 minutes. h. The original sentence had the idea of average distance, but it gives the mistaken impression that we are referring to the distance between data points. Instead, we must specify that this (rough) average is for the distances between the individual data points and the mean. i. The variable mileage here is an example of a confounding variable. We should account for this confounder by adjusting the delivery times for mileage in an appropriate way. 1 5. a. The population proportion of Ann Arbor residents who support extending the art fair by a day is unknown since we only know the opinions of a sample rather than the entire population. 1 1 b. The (conservative) margin of error is: m .0707 n 200 c. “No, CI is.55.0707 which is not entirely above 50%.” The key idea here is recognizing that a confidence interval gives reasonable values for the value of the population proportion. Since the CI overlaps 50% we should not conclude that a majority (nor a minority) advocate extending the Art Fair. 2 2 d. 1 1 n 1111.11 m .03 Since we can’t sample a fraction of a person, we should round up to 1112. We round up rather than down as a sample of size 1111 could give a margin of error slightly larger than 3%. 6. a. True. Recall the conditions for using the normal approximation to the Binomial: np > 10 n(1-p) > 10 Check that the conditions are verified for n = 50 and p = 0.7. b. False. The standard error for the sample proportion, ̂, depends on the estimate and the sample size n. The larger sample size will give a smaller standard error for researcher II. c. False. The confidence level (95%) never refers to a specific interval that has already been calculated. Rather the confidence level refers to how often we would expect our confidence interval to contain the true population proportion in repeated samples. Since p is fixed (though unknown) the chance that it is in this specific interval is either 0% or 100%. Want more review? Visit http://www.rossmanchance.com/applets/Confsim/Confsim.html. 2 7. a. The key features are: the axis labels (Density and X = Wait time) and the shape (rectangle of height 0.05 ranging from 0 to 20. The shaded boxes are to help answer questions that follow. Density 1/20 = 0.05 0 10 15 20 X = wait time (minutes) b. From 10-20 minutes we have a total shaded area of 10*0.05 = .5 so P(A) = 0.5 = ½. c. The overlap between events A (at least 10 minutes) and event B (at most 15 minutes) is the region from 10-15 minutes shown using the diagonal bars above. The area of this region gives P(A and B) = 5*0.05 = .25 = 1/4. d. To find P(B | A) consider the fraction of the shaded region (A) which is also shaded using diagonal bars (B). This fraction is 5*0.05 / 10*0.05 = 5/10 = ½. You can also use the conditional probability formula explicitly: P(B | A) = P(A and B)/P(A) = 0.25/0.5 = ½. e. P(A and C) is the probability of waiting exactly 10 minutes. Since our random variable (X = wait time) is continuous, this probability is 0. Recall that a line has zero area. f. No, the events A and C are not mutually exclusive since the value 10 is in both A and C. 8. a. Since we know the probability function of X is symmetric about 3 we can fill in the probabilities for X = 4, 5, and 6 like so: Value of X 0 1 2 3 4 5 6 Probability 0.05 0.10 0.20 0.30 0.20 0.10 0.05 To find P(X = 3) remember that probabilities must sum to 1. The computation could look like this P(X = 3) = 1 – (2*0.05 + 2*0.10 + 2*0.20) = 1 - 0.70 = 0.30. b. E(X)=3, so E(revenue) = 3*6 = 18 dollars. E(X) is the expected number of magazines sold and $6 is the amount taken in per magazine. E(X) = 3 is given but you could also find it directly (possibly using symmetry). Another way to think about it is like this: If we sell 0 magazines we make $0, if we collect 1 magazine we bring in $6, so … E(money made) = 0(0.05) + 6(0.1) + 12(0.2) + 18(0.3) + 24(0.2) + 30(0.1) + 36(0.05) = $18 c. Six magazines would cost 24 dollars, but the owner only brings in 18 dollars on average, so she or he would lose 6 dollars on average. 3 9. a. The questions asks for the approximate distribution of p which is Normal(p, √ ( ) ). Substitute p = .3 and n = 100 to get N(0.3, .0458). b. A good way to begin is by sketching the picture below. We want to find the shaded area which is P( p > .35). First, find the z-score: z = (.35 – 3)/.0458 = 1.09. Therefore, P( p > .35) = P(Z > 1.09) = 1 - .8621 = .1379. Want more review like questions 9 and 10(a-c)? Visit www.rossmanchance.com/applets/NormalCalcs/NormalCalculations.html . 10. Again, you might want to make a sketch for this problem. a. We want to find the stress-test score for which 25% of the students will have a lower score. First, look inside Table A.1 to find the z-score closest to 25% = .2500. (You should look for the number closest to .2500 inside the table and then use the numbers at the left of the row and the top of the column to get the z-score.) In this case z = -0.67 (corresponding to .2514 inside the table). Now, use the mean (55) and standard deviation (2) given to find which stress-test score is 0.67 standard deviations below the mean: -0.67 = (x – 55)/2 x = 55 – 2*0.67 = 53.66. b. P(X > 58) = P(Z > (58 – 55)/2) = P(Z > 1.5) = 1 - .9332 = .0668. c. Since the students are selected at random their scores are independent. The probability that a student scores less than 58 on the stress test is: P(X < 58) = P(Z < 1.5) = .9332. d. i. Binomial random variable with n = 6 and p = probability a score is at least 58 points. Here a student either has a stress-test score greater than 58 (“success”) or does not (“failure”) and the number of trials (n = 6) is fixed. The normal distribution for individual test scores should be used to find the probability of success (p) but Y (the number of successes) follows a binomial distribution. ii. From part “b” of this question we know the probability of “success” is p = 0.0668. Now, use the binomial probability formula: P(Y = 2) = ( ) ( ) = 0.0508. ( ) 4 Statistics 350 Fall 2006 Exam 1 1. A study is designed to determine whether acupunct ure really works (Source: Parade Magazine: July 9, 2006). A total of 300 migraine patients in Germany were randomly divided into three groups. In one group, acupuncture needles were placed at prescribed sites; the second group also was ‘needled’, but randomly (a procedure known as sham acupuncture). The third grou p received no treatment. All the needled patients underwent 12 ‘treatment’ sessions, ea ch lasting 30 minutes. The research ers found that both the real and the sham acupuncture recipients reported 50% fewer headache days, while only 15% of those untreated felt better. It was concluded that the placement of the needles made no difference. a. This study was: (circle one) an observational study an experiment.  b. Wasa control used in this study? (circle one) no ycanl’t  c. For all patients in the study, the number of headache days was recorded.  This variable is the: (circle one) response explanatory variable. d. Fill in the blank. Suppose the younger patients generally respond better to acupuncture and that the patients in the untreated group were mostly older (over 50), while those in the real and sham acupuncture groups were generally younger.  The variable “age” would be an example of a(n) variable. 2. The following graph shows the distribution for the color of jelly beans in bags produced by the JB Company. Which of the following description(s) are appropriate for this distribution? (Circle all that apply.)  Symmetric Uniform Right-skewed Left-skewed None of these is appropriate 3. According to Starbuck’s International, 32.2% of its entire staff that work at the local coffee shops is Hispanic. This value of 32.2% is a: (circle all that apply)  sample statistic sample parameter population statistic population parameter 111 4. A law firm in the Dallas-Fort Wo rth area prepares contracts and othe r legal documents and uses a courier service to deliver the documents tits many clients. Recently a part ner reported that a few complaints have come in from some of their best clients about delayed contract deliveries. The current courier service used is Metro Delivery. There are two other courier servic es that have opened up in past 2 years. Should they keep using Metro or consider a new one? To help address this question, a study was conducted ov er a month period in which all three couriers were used. When a delivery was required, one of the courie rs was randomly selected. One of the responses of interest was the total delivery time, that is, the time (in minutes) from when the order is phoned in to when the documents are delivered to the destination. Below are the boxplots and some additional numerical summaries for comparing the total delivery times for the three couriers. Total Delivery Time (minutes) One Variable Summary DFW Express Carborne Carrier Metro Delivery Mean 41.7 46.9 52.0 Variance 269.1 274.5 227.5 Standard Deviation 16.4 16.6 15.0 Count (n) 56 65 61 112 For any answers below, when appropriate, INCLUDE YOUR UNITS. a. What type of variable is total delivery time? Circle one: Categorical Quantitative  b. Compute the approximate IQR of the delivery times for DFW Express.  IQR = ________________________________________________________________ c. Give the value of the upper boundary (or fence) that would be used to show why the longest delivery time for DFW Express is an outlier. Show all work.  Upper boundary or fence = ______________________ d. Fill in the blank. The Q1 of the delivery times by Metro is equal to the _________________ of the delivery times by DFW Express.  e. Fill in the blank. For Carborne Carrier, 25% of the deliveries took longer than ________________.  f. How many deliveries by DFW Express were made in under 41 minutes? _______________________  g. One of the Metro deliveries resulted in a standard z-score of -1.4. What is the corresponding actual delivery time?  Final answer: ________________ h. The following is an incorrect interpretation of the standard deviation of the Metro’s delivery times. Enhance the sentence to make it correct by inserting a few words  The average distance between the Metro delivery times is roughly 15 minutes. i. DFW Express appears to be the winner as it had the lowest mean delivery time. Another variable measured was mileage, the distance in miles from the law firm to the client. At the right are boxplots that compare the mileage for the deliveries by these three firms in the study. Which of the following statements is an appropriate conclusion? Clearly circle one.  • DFW Express is still the winner as it had the lowest median mileage. • DFW Express is still the winner as mileage was not a response variable. • The response variable of delivery time should be adjusted for mileage before any winner is selected. 113 5. A survey was sponsored by the Ann Arbor News. A random sample of 200 Ann Arbor residents resulted in 110 stating they support extending the Art Fair by one day. a. What is the value of the population proportion of Ann Arbor residents who support extending the Art Fair by one day?  Final answer: ____________________ b. What is the margin of error for this survey, that is , a measure of accuracy of the sample proportion of Ann Arbor residents who support extending the Art Fair by one day?  Final answer: ____________________ c. Do these numbers allow the Ann Arbor News to conclude that a majority of its citizens advocate extending the Art Fair? Circle your answer and provide a brief rationale.  Circle one: Yes No Because: d. Suppose the survey will be repeated and the director of the Art Fair would like to ensure a 95% margin of error of at most 3%. What is the minimum number of such people that should be surveyed?  Final answer: ____________________ 6. Determine whether each of the following statements is true or false. Clearly circle your answer. a. If X has a Binomial (50, 0.7) distribution, then the criteria to use the normal approximation are met.  Falsee b. Two independent researchers are each tryi ng to estimate a population proportion p. Researcher I takes a random sample resulting in a sample proportion of 0.60. Researcher II takes a random sample that is twice as big and also results in a sample proportion of 0.60. Then, the standard errors for the two sample proportions will be the same.  Falsee c. A 95% confidence interval for p, the population proportion of college students who drive a foreign car, was found to be (0.524, 0.676). Thus, ther e is a 95% chance that the populati on proportion of college students who drive a foreign car will be between (0.524, 0.676).  Falsee 114 7. Suppose that the amount of time spent waiting for your bus to our campus each day is a uniform random variable between 0 to 20 minutes. a. Sketch a picture of the model for waiting time for th e bus. Provide labels for each axis and some values along each axis.  Let’s define the following events: • A is the event that you wait at least 10 minutes, that is your waiting time is in the interval [10,20]. • B is the event that you wait at most 15 minutes, that is your waiting time is in the interval [0,15]. • C is the event that you wait at most 10 minutes, that is your waiting time is in the interval [0,10]. Answer the following questions based on the information given above. Show all work. b. What is P(A)?  Final answer: ________________ c. What is P(A and B)?  Final answer: ________________ d. What is P(B|A)?  Final answer: ________________ e. What is P(A and C)?  Final answer: ________________ f. Are the events A and C mutually exclusive?  Circle one: Yes No Explain briefly. 115 8. The demand for a certain weekly magazine at a newsstand is a discrete random variable, X, with an expected value of 3 magazines sold per week. Furtherm ore, the probability distribution function of X is symmetric about the value of 3. The magazine is sold for $6.00 per copy to the custom ers and it costs $4.00 per copy for the owner of the newsstand. At the beginning of each week, the owner of this newsstand buys 6 magazines to sell during the week. Value of X 0 1 2 3 4 5 6 Probab 0ilty1.20 a. Fill in the missing values in the table above.  b. In dollars, what is the expected amount of money the owner of the newsstand will take in from the sales of this magazine per week?  Final answer: ________________ c. Explain briefly why the owner is not wise to buy 6 magazines at the beginning of each week. Note: longwinded explanations will be penalized!  9. ConsiderthepopulationofallUMstudents. Suppose30%ofallUMstudentshavetheirownlaptop computer. A random sample of n = 100 UM students will be selected an d the sample proportion that have their own laptop computer will be computed. a. If you consider all possible samples of 100 UM st udents, what is the approx imate distribution that describes the possible values for the sample proportionp ? Be complete, give all aspects of the distribution.  b. Suppose that 35 of the 100 UM students surveyed stated they have their own laptop, for a sample proportion of 0.35. If the rate of students that have their own laptop is indeed 30%, how likely would it be to a sample proportion as large as 0.35 or larger? Show all work.  Final answer: ________________ 116 10. The BioPharm company has developed a stress test for college-aged students. Scores on the stress test for such students are approximately normally distributed with a mean of 55 points and a standard deviation of 2 points. Scores of 58 points or high erindicateahighlevelofstressandareofconcerntodoctors. A random sample of college-aged students will be selected and each will be given this stress test. Answer the following questions based on the information given. Show all work. a. Complete the sentence. About 25% of college-aged students will have a stress test score of at most ________ points.  Final answer: ________________ b. What is probability the score for the first randomly selected student will be at least 58 points?  Final answer: ________________ c. Given that the first randomly selected student has a score of at least 58 points, what is the probability that the next randomly selected student scores below 58 points?  Final answer: ________________ d. Suppose the stress test is given to six randomly se lected college-aged students. Let Y be the number of students with a high level of stress (a score of at least 58 points). i. What kind of a random variable is Y? Clearly circle your answer.  • Normal random variable with a mean of 55 and a standard deviation of 2. • Normal random variable with a mean of 55 and standard deviation of 12 (6 students times 2). • Binomial random variable with n = 6 and p = probability a score is at least 58 points. • Binomial random variable with n = 6 and p = ½. ii. What is the probability that exactly 2 of the 6 stud ents will have a high level of stress (a score of at least 58 points)?  Final answer: ________________ 117
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'