stat200_final exam newA++
stat200_final exam newA++
Popular in Course
verified elite notetaker
Popular in Department
This 17 page Study Guide was uploaded by kimwood Notetaker on Friday November 13, 2015. The Study Guide belongs to a course at a university taught by a professor in Fall. Since its upload, it has received 29 views.
Reviews for stat200_final exam newA++
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 11/13/15
True or False. Justify for full credit. (25 pts) (a) The normal distribution curve is always symmetric to its mean. True . this is because the mean, median and mode all lie about the same position. (b) If the variance from a data set is zero, then all the observations in this data set are identical. True. The variance is a measure of dispersion from the mean.So if the data set are identical then it would imply that the deviation from the mean is equal to zero. C P(AANDA c)1, whereA isthecomplementofA. True. This is the case because the maximum probability in a given sample space is equal to 1.Hence if an element is not in thye sample space of A then it is in the complement of A. However the sum of the two sample spaces must always add up to 1. dIn a hypothesis testing, if the pvalue is less do not have sufficient evidence to reject the null hypothesis. False. If the p value is less we fail to reject the null hypothesis.We fail to reject the null hypothesis when the p value is greater. (e) The volume of milk in a jug of milk is 128 oz. The value 128 is from a discrete data set. False. Volume of milk is continuous as it can assume any value between values of whole numbers.It cannot be counted as whole values. 2. Complete the frequency table with frequency and relative frequency. (5 pts) Filled as shown below; Check out time(In Frequency Relative frequency minutes) 1.01.9 2 2/25 = 0.08 2.02.9 8 8/25 = 0.32 3.03.9 10 10/25 = 0.40 4.04.9 5 5/25 = 0.20 TOTAL 25 1 3. What percentage of the checkout times was less than 3 minutes? (5 pts) The percentage of the check times that were less than 3 minutes are given by; From time 1.0 to time 2.9 which is given by the class; 1.0 – 1.9 and 2.02.9 which is represented by a probability of 0.08 and 0.32 respectively. The probability is given by = 0.08+0.32= 0.40 The percentage represented by this probability is given by; 0.40*100% = 40% 4. In what class interval must the median lie? Explain your answer. (5 pts) The median lies in the class with the median frequency. The median frequency is given by; Median frequency= 25/2= 12.5 this implies that the median frequency is 13 since the number of observation is odd. The 13 observation lies in the third slab which is the class of 3.0 – 3.9 5. Assume that the largest observation in this dataset is 5.8. Suppose this observation were incorrectly recorded as 8.5 instead of 5.8. Will the mean increase, decrease, or remain the same? Will the median increase, decrease or remain the same? Why? (5 pts) If the observation was incorrectly recorded as 8.5 instead of 5.8 then it will lead to an increase in the mean. The mean is found by the sum of the individual observation followed by dividing by the number of observation. Hence the higher an observation is the higher the mean and vice versa.This is because mean is affected by extreme values. The median on the other side will remain the same.This is because it is simply finding the middlemost number without any regard to the specific value and its magnitude. The median is not affected by the change of value since we will still have the same number of observation. 6. A random sample of STAT200 weekly study times in hours is as follows: 2 15 15 18 30 Find the sample standard deviation. (Round the answer to two decimal places. Show all work. Just the answer, without supporting work, will receive no credit.) (10 pts) We calculate the sum first; Sum= 2+15+15+18+30 = 80 Mean = sum/number of observation Mean= 80/5 = 16 Difference between the observation from the mean is given by; (X- X X-µ µ)^2 2 -14 196 15 -1 1 15 -1 1 18 2 4 30 14 196 sum 80 0 398 From the deviation from the mean, we can see that the sum of the deviation is 0. Squaring the deviation from the mean we get 398 1 2 S= √ ∑ (Xi−μ) n−1 1∗398 √99.5 Standard deviatio√4= = Unbiased standard deviation= 9.975 1 Biased standard deviation = = √79.6 √5 Biased standard deviation = 8.922 A fair coin is tossed 4 times. 7. How many outcomes are there in the sample space? (5 pts) The number of outcomes is represented by whole sample space.In this the sample space is 16. The 16 outcomes are; HHHH,HHHT,HHTH,HTHH,THHH,HHTT,HTHT,THHT,HTTT,THTT,TTHT,TTTH,TT HH,TTTT,THHH,THTH 8. What is the probability that the third toss is heads, given that the first toss is heads? (10 pts) From the sample space; HHHH,HHHT,HHTH,HTHH,THHH,HHTT,HTHT,THHT,HTTT,THTT,TTHT,TTTH,TT HH, THHH,THTH, TTTT The probability of having the third toss is a head=4/16 = 1/4 9. Let A be the event that the first toss is heads, and B be the event that the third toss is heads. Are A and B independent? Why or why not? (5 pts) The probability that the first toss is Head = 7/16 The probability that the third toss is heads = 9/16 The events A and B are not independent. Since P(A and B)= ¼ P(A)= 7/16 P(B)= 9/16 P(A and B)≠P(A)P(B) 1/4 ≠ 7/16 *9/16 Refer to the following situation for Questions 10, 11, and 12. The boxplots below show the real estate values of single family homes in two neighboring cities, in thousands of dollars. For each question, give your answer as one of the following: (a) Tinytown; (b) BigBurg; (c) Both cities have the same value requested; (d) It is impossible to tell using only the given information. Then explain your answer in each case. (5 pts each) 10. Which city has greater variability in real estate values? The city with the greatest variability is BigBurg. This city has a greater interquartile range compared to tiny town which has a less interquartile range.The interquartile range for Tiny town is 110 60 = 50 while the interquartile range for 110 45 = 65. 11. Which city has the greater percentage of households with values $85,000 and over? The city that has a greater percentage of household with values at $85,000 and over is BigBurg. This is evidenced by the greater deviation from the median.The difference between the maximum and the median is large for Bigburg as compared to Tiny town. 12. Which city has a greater percentage of homes with real estate values between $55,000 and $85,000? The city with a great percentage of homes with real estate values between $55,000 and $85,000 is bigburg. Refer to the following information for Questions 13 and 14. Show all work. Just the answer, without supporting work, will receive no credit. There are 1000 juniors in a college. Among the 1000 juniors, 200 students are taking STAT200, and 100 students are taking PSYC300. There are 50 students taking both courses. 13. What is the probability that a randomly selected junior is taking at least one of these two courses? (10 pts) Probability of being in at least on of the classes= probability of being in STAT200 +probability of being in Psyc300 – probability of being in both 200/1500+100/1500 – 50/1500= 250/1500 = 1/6 14. What is the probability that a randomly selected junior is taking PSYC300, given that he/she is taking STAT200? (10 pts) Probability that he is taking both PSYC300 and STAT 200 is 50/1500 = 1/30 Probability that he is taking STAT300 = 200/1500 = 2/15 Probability that he is taking PSYC300 given that he is taking STAT200 = 1/30 ÷ 2/15 = 0.25 15. UMUC Stat Club is sending a delegate of 2 members to attend the 2015 Joint Statistical Meeting in Seattle. There are 10 qualified candidates. How many different ways can the delegate be selected? We will use combination to solve this problem. The number of ways different delegates can be chosen is given by; 10 ! 10! 10∗9 10C 2 (0−2 !2)! = 8!2! = 2∗1 = 45 different ways 16. Imagine you are in a game show. There are 4 prizes hidden on a game board with 10 spaces. One prize is worth $100, another is worth $50, and two are worth $10. You have to pay $20 to the host if your choice is not correct. Let the random variable x be the winning. Show all work. Just the answer, without supporting work, will receive no credit. (a) What is your expected winning in this game? X P(x) 100 0.10 50 0.10 10 0.20 20 0.60 The expected winning is given by; ∑ XP(X) = 0.1(100)+0.1(50)+0.20(10)+0.60(20) ∑ XP(X) =10+5+20 12 Expected winning= $23 (b) Determine the standard deviation of x. (Round the answer to two decimal places) (10 pts) Standard deviation is given by; x p(x) X^2 XP(X) 1000 100 0.1 0 1000 50 0.1 2500 250 10 0.2 100 20 -20 0.6 400 240 The sum of X P(X) = 1000+250+20+240= 1510 2 2 2 Var= X P(X) (XP(X) = 1510 23 = 981 Standard deviation = 31.32 17. Mimi just started her tennis class three weeks ago. On average, she is able to return 20% Show all work. Just the answer, without supporting work, will receive no credit. (a) Let X be the number of returns that Mimi gets. As we know, the distribution of X is a binomial probability distribution. What is the number of trials (n), probability of successes (p) and probability of failures (q), respectively? (5 pts) N= 8 P= 0.20 q= 0.80 (b) Find the probability that that she returns at least 1 of the 8 serves from her opponent. (10 pts) Prob(X≤1)= 1 prob(X=0) 8 0 8 Prob(x=0)= C 0(0.20) (0.80) = 0.16772 1 0.16772= 0.8322 (b) How many serves can she expect to return? Expected value=mean= E(X)= np E(X)= 8* 0.20 = 1.6 She will return atmost 2 serves Refer to the following information for Questions 18, 19, and 20. Show all work. Just the answer, without supporting work, will receive no credit. The IQ scores are normally distributed with a mean of 100 and a standard deviation of 15. 18. What is the probability that a randomly selected person has an IQ between 85 and 115? (10 pts) 19. Find the 9 t percentile of the IQ distribution. (5 pts) The probability is given by; X−μ X−μ <Z< δ/√n δ/√n 85−110 115−100 <Z< = 1<Z<1 15 15 = 0.8413 – 0.1587 =0.6826 20. If a random sample of 100 people is selected, what is the standard deviation of the sample mean? (5 pts) The standard deviation of the sample mean would be; δ/√n = 15/ ¿√100 = 15/10 = 1.5 21. A random sample of 100 light bulbs has a mean lifetime of 3000 hours. Assume that the population standard deviation of the lifetime is 500 hours. Construct a 95% confidence interval estimate of the mean lifetime. Show all work. Just the answer, without supporting work, will receive no credit. (10 pts) δ/√ n The confidence interval= X± Z C.I= 3000± 1.96 500/√100 C.I= 3000±1.96(500/10) C.I= 3000± 98 C.I= [2902,3098] 22. Consider the hypothesis test given H 0 :p0. H :p0.5 In a random sample of 225 subjects, the sample proportion is found to b p0.51 . (a) Determine the test statistic. Show all work; writing the correct test statistic, without supporting work, will receive no credit. The test statistics is given by; P1−P 0 Z= √ Poqo \ n 0.51−0.5 Z= 0.5∗0.5 √ 225 Z= 0.01/0.033 Z= 0.3 (b) Determine the pvalue for this test. Show all work; writing the correct Pvalue, without supporting work, will receive no credit. Since it is a left tailed to determine the p value. Reading from the normal distribution table for the p value corresponding to Z<0.03 and we will have; P value = 0.6179 (c) Is there sufficient evidence to justify the rejection o 0 at the0.01 level? Explain. Since the p value is greater than 0.01 we conclude that we do not have sufficient evidence to reject the null hypothesis at α =0.01 23. A new prep class was designed to improve AP statistics test scores. Five students were selected at random. The numbers of correct answers on two practice exams were recorded; one before the class and one after. The data recorded in the table below. We want to test if the numbers of correct answers, on average, are higher after the class. Number of Correct Answers Subject Before the class After the class 1 12 14 2 15 18 3 9 11 4 12 10 5 12 12 Is there evidence to suggest that the mean number of correct answers after the class exceeds the mean number of correct answers before the class? Assume we want to use a 0.01 significance level to test the claim. (a) Identify the null hypothesis and the alternative hypothesis. Null hypothesis; The is no difference in mean between the mean number of correct answers before the class and after the class. H o μ 0 μ 1 ≤0 Alternative Hypothesis The is significance difference in mean between the mean number of correct answers before the class and after the class. class. μ μ H o 0 1> 0 (b) Determine the test statistic. Show all work; writing the correct test statistic, without supporting work, will receive no credit. Mean before the test= (12+15+9+12+12)/5=12 Mean after the test = (14+18+11+10+12)= 13 Standard deviation before test; (1212) +(1512) +(912) +(1212) +(1212) =9+9=18 Var= 18/51 = 4.5 Std= 2.121 Standard deviation after the test; 2 2 2 2 2 (1413) +(1813) +(1113) +(1013) +(1213) =1+25+4+9+1=40 Var=40 Std= 40/51= 10 Var1 Var2 Pooled standard deviation= + √ n1 n2 4.5 10 Pooled standard dev = + √ 5 5 √ 14.5 Pooled standard dev= 5 Pooled standard deviation = 1.703 12−13 Test statistic = 1.703 Test Statistic = 0.5872 (c) Determine the pvalue. Show all work; writing the correct critical value, without supporting work, will receive no credit. The p value will be given by; We read the p value from the left tail; P value= 10.2776 = 0.7224 (d) Is there sufficient evidence to support the claim that the mean number of correct answers after the class exceeds the mean number of correct answers before the class? Justify your conclusion. (20 pts) We do not reject the null hypothesis. We conclude that we do not sufficient evidence to conclude that the mean number of correct answers after the class exceeds the mean number of correct answers before the class 24. A random sample of 4 professional athletes produced the following data where x is the number of endorsements the player has and y is the amount of money made (in millions of dollars). x 0 1 3 5 y 3 2 3 8 (a) Find an equation of the least squares regression line. Show all work; writing the correct equation, without supporting work, will receive no credit. (15 pts) To compute the least squares regression line we will have to use to points. SUMof X= 0+1+3+5=9 SUM OF Y= 3+2+3+8= 16 Sum of XY= 0+2+9+40= 51 Sum of X = 0+1+9+25 = 35 The regression equation is given by; Y=a+bx To solve for a we will have; 2 ∑Y ⋅∑X −∑X ⋅∑ XY 16∗35−9∗514 A= n⋅∑X 2−(∑ X)2 = 4∗35−9 2 = 1.712 n⋅∑XY−∑ X⋅∑Y 4⋅51−9⋅16 B= n⋅∑ X − ( X ) = 4∗35−(9)2 = 1.017 Therefore the equation is given by; Y= 1.712+ 1.017X (c) Based on the equation from part (a), what is the predicted value of y if x = 4? Show all work and justify your answer. We replace the unknown X value into the equation; Y= 1.712+ 1.017(4) Y= 5.78 25. Randomly selected nonfatal occupational injuries and illnesses are categorized according to the day of the week that they first occurred, and the results are listed below. Use a 0.05 significance level to test the claim that such injuries and illnesses occur with equal frequency on the different days of the week. Show all work and justify your answer. Day Mon Tue Wed Thu Fri Number 22 22 20 19 17 (a) Identify the null hypothesis and the alternative hypothesis. The null hypothesis; The illness and injuries occur with equal frequency; H0:uniform The alternative frequency; The illness and injuries do not occur with equal frequency; H1:not uniform (c) Determine the test statistic. Show all work; writing the correct test statistic, without supporting work, will receive no credit. mon 22 tues 22 wed 20 thurs 19 fri 17 TOTA L 100 EXPE FOR MON; 1/5 *100= 20 EXPE FOR TUE; 1/5 *100= 20 EXPE FOR WED 1/5 *100= 20 EXPE FOR THURS; 1/5 *100= 20 EXPE FOR FRID; 1/5 *100= 20 Since it is a chi square the test statistics is given byh; O−E ) X = ∑ E Mon= (22-20) /20 = 0.20 Tues= (22-20) /20 = 0.20 2 Wedn= (20-20) /22 = 0 Thurd=(19-20) /20= 0.05 Frid= (17-20) /20 =0.45 2 O−E ) X = ∑ E = 0.20+0.20+0+0.05+0.45 = 0.90 The test statistic= 0.90 (c) Determine the p-value. Show all work; writing the correct critical value, without supporting work, will receive no credit. The p value from the chisquare table at 0.05 at 4 degrees of freedom is given; 0.92456 (d) Is there sufficient evidence to support the claim that such injuries and illnesses occur with equal frequency on the different days of the week? Justify your answer. (15 pts) Since the p value is greater than 0.05 we do not reject the null hypothesis. We conclude that the illness occur with equal frequency.
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'