Stat 200 HOMEWORK 7
Stat 200 HOMEWORK 7
Popular in Course
verified elite notetaker
Popular in Department
This 8 page Study Guide was uploaded by kimwood Notetaker on Friday November 6, 2015. The Study Guide belongs to a course at a university taught by a professor in Fall. Since its upload, it has received 242 views.
Reviews for Stat 200 HOMEWORK 7
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 11/06/15
Lane Chap. 14 2. The formula for a regression equation is Y’ = 2X + 9. a. What would be the predicted score for a person scoring 6 on X? The predicted score for that person would be, Y = 2*6+9 = 21 b. If someone’s predicted score was 14, what was this person’s score on X? In this case Y = 14 thus, 14 = 2X+9 X = (14-9)/2 = 2.5 Thus the score on X was 2.5. 6. For the X, Y data below, compute: a. r and determine if it is signiﬁcantly different from zero. The obtained output from Minitab is given below, Correlation: X, Y Pearson correlation of X and Y = 0.849 PValue = 0.032 From the above output we can see that the correlation coefficient between X and Y is 0.849 with corresponding P value 0.032. As the P value is smaller than the signiﬁcance level of 0.05 so we can conclude that the correlation coefficient is signiﬁcantly different from zero. b. the slope of the regression line and test if it differs signiﬁcantly from zero. Using the data analysis tool pack of Excel the obtained output is given below, SUMMARY OUTPUT Regression Statistics Multiple R 0.8492 R Square 0.7211 Adjusted R Square 0.6514 Standard Error 3.5028 Observations 6 ANOVA Signiﬁcan df SS MS F ce F 126.92 10.34 Regression 1 126.9207 07 41 0.0324 12.269 Residual 4 49.0793 8 Total 5 176 Coefficie Standard P- Lower Upper nts Error t Stat value 95% 95% 0.371 Intercept 3.1231 3.1085 1.0047 9 -5.5075 11.7537 0.032 X 1.1332 0.3523 3.2162 4 0.1550 2.1115 From the above output we can see that the slope of the regression line is 1.1332 with corresponding P value 0.0324. As the P value is smaller than the signiﬁcance level of 0.05 so we are rejecting the null hypothesis of insigniﬁcance and concluding that slope differs signiﬁcantly from zero. c. the 95% conﬁdence interval for the slope. The required 95% conﬁdence interval for the slope is given in the above output which is (0.1550, 2.1115) X Y 4 6 3 7 5 12 11 17 10 9 14 21 Lane Chap. 17 5. At a school pep rally, a group of sophomore students organized a free rafﬂe for prizes. They claim that they put the names of all of the students in the school in the basket and that they randomly drew 36 names out of this basket. Of the prize winners, 6 were freshmen, 14 were sophomores, 9 were juniors, and 7 were seniors. The results do not seem that random to you. You think it is a little ﬁshy that sophomores organized the rafﬂe and also won the most prizes. Your school is composed of 30% freshmen, 25% sophomores, 25% juniors, and 20% seniors. a. What are the expected frequencies of winners from each class? The required expected frequencies of winners from its class are, Class Expected Frequency Freshmen 36*0.30 = 10.80 Sophomor 36*0.25 = 9 es Juniors 36*0.25 = 9 Seniors 36*0.30 = 7.20 b. Conduct a signiﬁcance test to determine whether the winners of the prizes were distributed throughout the classes as would be expected based on the percentage of students in each group. Report your Chi Square and p values. The chi-sq value can be calculated using the formula, 2 (ObservedFrequency−Expected Frequency) Chi-sq = ∑ Expected Frequency (6−10.8) ( 14−9 ) (9−9 ) ( 7−7.20)2 = + + + = 4.917 10.8 9 9 7.20 Df = number of groups-1 = 4-1=3 So, p-value = P(Chi-sq(3)>4.917) = 0.1780 c. What do you conclude? We can see that here the P value is 0.1780 which is larger than the signiﬁcance level of 0.05. Thus we are failing to reject the null hypothesis and we are concluding that there is not sufficient evidence to reject the null hypothesis that the result is random 14. A geologist collects hand-specimen sized pieces of limestone from a particular area. A qualitative assessment of both texture and color is made with the following results. Is there evidence of association between color and texture for these limestones? Explain your answer. COLOUR Texture Light Medium Dark Fine 4 20 8 Mediu 5 23 12 m Coarse 21 23 4 Here I’m using Minitab to perform the hypothesis testing. The obtained Minitab output is given below, Chi-Square Test for Association: Worksheet rows, Worksheet columns Rows: Worksheet rows Columns: Worksheet columns Light Medium Dark All 1 4 20 8 32 8.00 17.60 6.40 2 5 23 12 40 10.00 22.00 8.00 3 21 23 4 48 12.00 26.40 9.60 All 30 66 24 120 Cell Contents: Count Expected count Pearson ChiSquare = 17.727, DF = 4, PValue = 0.001 Likelihood Ratio ChiSquare = 18.141, DF = 4, PValue = 0.001 The above output shows that the P value of this test is 0.001 which is smaller than the signiﬁcance level of 0.05. Thus we are rejecting the null hypothesis of no association and concluding that there is a signiﬁcant association between color and texture for these limestones at 0.05 signiﬁcance level. Illowsky Chap.11 Decide whether the following statements are true or false. 70. The standard deviation of the chi-square distribution is twice the mean. For a Chi-sq distribution mean is the degrees of freedom and variance is 2 times the degrees of freedom and thus the above statement is false. Use the following information to answer the next exercise: Suppose an airline claims that its ﬂights are consistently on time with an average delay of at most 15 minutes. It claims that the average delay is so consistent that the variance is no more than 150 minutes. Doubting the consistency part of the claim, a disgruntled traveler calculates the delays for his next 25 ﬂights. The average delay for those 25 ﬂights is 22 minutes with a standard deviation of 15 minutes. 113. df= 24 Illowsky Chap.12 66. Can a coefficient of determination be negative? Why or why not? The coefficient of determination cannot be negative because it is the division of two sum of squares. As the sum of squares are the positive terms (sum of positive terms thus positive) so coefficient of determination is positive divided by positive thus cannot be negative. The cost of a leading liquid laundry detergent in different sizes is given. Size (Ounces) Cost ($) Cost per ounces 16 3.99 32 4.99 64 5.99 200 10.99 82. a. Using “size” as the independent variable and “cost” as the dependent variable, draw a scatter plot. The obtained scatter plot is given below, Scatterplotof Cost($)vsSize(Ounces) 11 10 9 ) 8 ( s o 7 C 6 5 4 0 50 100 150 200 Size(Ounces) b. Does it appear from inspection that there is a relationship between the variables? Why or why not? From the above scatter plot we can see that there is a positive association between the two variables thus there exists a positive relationship between the two variables. c. Calculate the least-squares line. Put the equation in the form of: ŷ=a+bx The obtained output is given below, Regression Analysis: Cost ($) versus Size (Ounces) Analysis of Variance Source DF Adj SS Adj MS FValue PValue Regression 1 28.9163 28.9163 691.36 0.001 Size (Ounces) 1 28.9163 28.9163 691.36 0.001 Error 2 0.0837 0.0418 Total 3 29.0000 Model Summary S Rsq Rsq(adj) Rsq(pred) 0.204512 99.71% 99.57% 98.23% Coefficients Term Coef SE Coef TValue PValue VIF Constant 3.598 0.150 23.96 0.002 Size (Ounces) 0.03707 0.00141 26.29 0.001 1.00 Regression Equation Cost ($) = 3.598 + 0.03707 Size (Ounces) So we can see that the regression equation is, Cost ($) = 3.598 + 0.03707 Size (Ounces) d. Find the correlation coefficient. Is it signiﬁcant? The output in this case is, Correlation: Size (Ounces), Cost ($) Pearson correlation of Size (Ounces) and Cost ($) = 0.999 PValue = 0.001 As the P value 0.001 is smaller than the signiﬁcance level of 0.05 so we can say that the correlation is signiﬁcant. e. If the laundry detergent were sold in a 40-ounce size, ﬁnd the estimated cost. The estimated cost in this case is, Estimated cost = 3.598 + 0.03707*40 = $5.0808 f. If the laundry detergent were sold in a 90-ounce size, ﬁnd the estimated cost. The estimated cost in this case is, Estimated cost = 3.598 + 0.03707*90 = $6.9343 g. Does it appear that a line is the best way to ﬁt the data? Why or why not? The scatter plot shows up clear sign of linear relationship between the variables thus it appears that a line is the best way to ﬁt the data. h. Are there any outliers in the given data? No there is no outliers in the given data though one value is really far from the other values. But as it is on the regression line so it can’t be taken as outlier i. Is the least-squares line valid for predicting what a 300-ounce size of the laundry detergent would you cost? Why or why not? As the value 300 ounce is outside of the range considered for the regression line so using this regression line to predict the cost of a 300 ounce is not valid. j. What is the slope of the least-squares (best-ﬁt) line? Interpret the slope. The slope of the best ﬁt line is 0.03707 this implies per 1 ounce increase leads to $0.03707 increase in cost.
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'