Join StudySoup

Get Full Access to
UM - STATS 250 - Study Guide - Midterm

Description

Reviews

Statistics 350 Fall 2007 Exam 2 Explanations

1.

a. ��� = 214/400 = 0.535

b. ��. ��. (���) = ����(1−���)

�� = �0.535(1−0.535)

400 = ��. ��������

c. ��̂ ± �� ) ∗��. ��. (��̂) = 0.535 ± (1.645)(0.0249) = 0.535 ± 0.041 = (��. ������, ��. ������) To find z*=1.645 look for 0.95 inside of Table A.1 or use Table A.2 infinite row with 90% level.

d. Fail to Reject H0, because the value of 0.50 is in the confidence interval.

e. Both ii, iii are relevant. Neither i nor iv make sense since our response is not a quantitative continuous variable.

2.

a. H0: µ= 1.4 Ha: µ < 1.4

b. The histogram does not look approximately normal, rather, it is somewhat skewed to the left. However, the t-test value can still be considered valid due to the CLT or Central Limit Theorem (since we have a large sample size we will have that the sample mean will have approximately a normal model).

c. Our test statistic follows the t(35) distribution (“t-distribution with 35 degrees of freedom”), if the null hypothesis is true.

d. Test Statistic Value: �� = ��.����−��.��

Don't forget about the age old question of How do you analyse investments?

��.������ = −��. ����, p-value = 0.002/2 = 0.001

We divide the given p-value (‘Sig. (2-tailed)’) by 2 since our alternative hypothesis is one-sided to the left and our test statistic is negative.

e. i (since 0.001 < 0.05 or even 0.001 < 0.01)

3. This is a paired design with pairs being formed by a son and his father. Observe from the Paired Samples Test output that the differences are computed as “Son’s height” – “Father’s height”. Hypotheses- H0: µd = 0 Ha: µd > 0 We also discuss several other topics like What are the hormones that bind to nuclear receptors in the cytosol or nuclear?

Don't forget about the age old question of What is the meaning of withdrawal in substance use disorder?

Computations- Test-statistic, t = 2.665 p-value, 0.009/2 = 0.0045

Decision- reject H0

Conclusion- There is sufficient evidence to say that in the population of father-son pairs represented by this sample, sons are taller than their fathers, on average.

The assumption is that the population of differences is normally distributed.

Circle the third plot, on the far right since it is the qq plot of the differences.

5

4. H0: The person does not have arthritis (no arthritis) versus Ha: The person has arthritis. a. See right.

b. Cell B

c. Cell C

d. Doctor does not administer

a treatment that is necessary If you want to learn more check out What is the meaning of hot jupiters in astronomy?

for the patient.

Making a type 2 error means

the doctor fails to detect the

patient’s arthritis.

5.

a. Symbol: p

Description: The parameter of interest is the population proportion of all students at this university that are opposed to the plan.

b. False. The sample size is too small: np0 = 18(0.70) = 12.6 > 10 but n (1 – p0) = 18(0.30) = 5.4 < 10. We also discuss several other topics like What is significant about the ediacaran biota?

c. Test statistic: X = 17. The p-value is 0.0142. Here’s how to compute the p-value: P(X > 17) = P(X = 17) + P(X = 18) = 0.0126 + 0.0016 = 0.0142, where, We also discuss several other topics like How does winning the lottery tend to effect people's happiness?

P(X = 17) = �1817� (0.7)17(0.3)1 = 0.0126 and P(X = 18) = �1818� (0.7)18(0.3)0 = 0.0016 d. Reject H0 since the p-value of 0.0016 < 0.05

e. There is sufficient evidence to support the claim that over 70% of all students at this university are opposed to the plan.

f. Power is the name given to the probability of rejecting the H0 when Ha is true.

g. The news channel could Sample more students to get a test with higher power. Repeating the study does not affect the power.

Using a lower significance level will decrease the power.

Want to review type 1 error, type 2 error, and the power of the test?

Check out: http://www.intuitor.com/statistics/T1T2Errors.html

6. 5% of 100 = 5 ‘statistically significant’ results.

6

7.

a. The value 163/718 is a sample proportion.

b. Ha: p1 > p2 since it is suspected that males are more prone to binge drinking. 163 169 ˆ = ++

c. 0.1943

p =

718 991

d. Both iii and iv are correct and should be circled.

i) Incorrect. The null hypothesis is either true or false regardless of our repeating the study. ii) Incorrect. This is only true under the assumption that the null hypothesis is true. iii) Correct. This is a correct interpretation of the p-value.

iv) Correct. 0.002 < 0.01

8.

a. You should report the results from the pooled test …

Because … The two sample standard deviations are similar, namely 0.880 is close to 0.915. Because … Levene’s test gave a p-value of 0.868 which is much larger than the usual significance level for Levene’s test of 0.10, which means we cannot refute the

statement that the population variances are equal.

b. 0.894 The pooled estimate must be in between the standard deviations for each sample. Only 0.894 is between 0.880 and 0.915.

c. i. False – a confidence level is not a probability that a specific confidence interval is correct. ii. False – as a parameter (µ1 - µ2) is a fixed value and thus always or never in the interval.

Want to review the confidence level idea?

Visit http://onlinestatbook.com/stat_sim/conf_interval/index.html

9.

a. CI, µ We want to estimate a continuous variable. See “interested in estimating how long …”

b. HT, µd Two continuous variables (times spent) are measured on each student in the sample making this paired, and we want to decide which is greater. See “interested in determining whether the time spent …”

c. HT, p1 – p2 We a categorical variable (relief/no relief), two populations (one assigned to each remedy) and want to decide which receives more relief. See “interested in assessing if …”

d. HT, µ We have a single continuous variable which we wish to compare to a fixed value. See “to check this claim”

e. HT, p The firm has a single categorical variable (yes/no) and would like to decide if there is a majority. See “would like to assess if…”

f. CI, µ 1 – µ 2 The company would like to estimate the difference of a single continuous variable measured on each of two populations (men/women). See “to estimate how much”…

Want more practice identifying the correct statistical procedure?

Check out: Name That Scenario on the left-hand side of Ctools>Stats 250 (lecture site).

7

Statistics 350 Fall 2007 Exam 2

1. A research paper in gerontology suggests that about half of all elderly people prefer to live in an apartment rather than in a one-family home. In a recent sample survey of retired persons, 214 out of 400 stated that they preferred an apartment.

a. What is the sample estimate of the population proportion of all retired people who prefer an apartment? [2]

Final answer: __________________

b. Compute the standard error of the estimate in part (a).

[2]

Final answer: __________________

c. Using the above information, obtain a 90% confidence interval for the population proportion of all retired people who prefer an apartment.

[2]

Final answer: ( ___________, ___________ )

d. We wish to assess if the population proportion of all retired people who prefer an apartment differs from half. Use your confidence interval above and give your decision.

[2]

Decision: (circle one) Reject H0 Fail to reject H0

Briefly explain your reason.

e. Which of the following is an assumption for the inference above to be valid?

[2] Circle all relevant items.

i. Original response is normally distributed.

ii. The sample is selected randomly from the population of interest

iii. The sample size is sufficiently large.

iv. Histogram of sample values is symmetric and unimodal.

153

2. Bjork Larsen needed to determine whether to use a new racing wax for the Swedish Nordic Ski Team. He thinks that the new wax will be worth the price if it would lead to a faster race time, on average. In the past, the average time for a standard length race was 1.40 hours. He selects a random sample of 36 Swedish skiers and has each use the new wax system for the upcoming standard length race.

a. Let μ = the average race time for all Swedish skiers using the new wax system.

[2] State the appropriate hypotheses to be tested. 10

9

H0:________________ Ha:_________________

8

Some SPSS output based on the race times

7

for the 36 skiers is provided at the right and below. 6

y

c

b. Complete the sentence.

n

e

5

u

q

The histogram does not look approximately normal,

e

r

F

4

rather, it is somewhat skewed to the ____________. 3

However, the t-test result can still be considered valid 2

due to what main result?

1

[2]

0

One-Sample Statistics

1.00 1.10 1.20 1.30 1.40 1.50 Race Time (hours)

N Mean

Std.

Deviation

Std. Error Mean

Race Time (hours) 36 1.330 .124 .021 One-Sample Test

Test Value = 1.40

95% Confidence

Interval of the

Sig.

Mean

Difference

t df

(2-tailed)

Difference Lower Upper

Race Time (hours) -3.361 35 .002 -.070 -.112 -.028

c. What is the distribution of the test statistic if the new wax system does not lead to faster race times on average?

[2] Final answer: _________________________

d. The value of the test statistic is missing in the SPSS output. Provide its value and the corresponding p-value for testing the hypotheses in part (a).

[2] Test Statistic Value t = _______________ p-value = _________________________________

e. Based on the test results, which of the following is the appropriate real-world conclusion? [1] Clearly circle your answer.

i. The new wax system significantly decreased the average race time supporting its use by the Swedish Nordic Ski Team.

ii. There was insufficient evidence to demonstrate the new wax system decreased the average race time.

154

3. Data was collected on the height of male college students and their fathers. Suppose we are interested in testing the hypothesis that sons are taller than their fathers, on average using a 5% significance level. The researcher’s assistant had not taken Stat 350 and did not know which test to perform, so he generated SPSS output for both the paired and the independent samples t-tests. You will need to determine which output is appropriate to use as you conduct the test below.

Paired Samples Test

Paired Differences

95% Confidence

Interval of the

Std. Error

Difference

Mean Std. Deviation

Mean Lower Upper

t df Sig. (2-tailed)

Pair 1 Sons - Fathers 1.556 5.090 .584 .393 2.719 2.665 75 .009

Independent Samples Test

Levene's Test for

1 = Sons 2 = Fathers

Equality of Variances

t-test for Equality of Means

95% Confidence Interval of the

Mean

Std. Error

Difference

F Sig.

t df Sig. (2-tailed)

Difference

Difference Lower Upper

height

[8]

Equal variances assumed

Equal variances not assumed

4.138 .044 2.120 150 .036 1.556 .734 .106 3.006 2.120 116.202 .036 1.556 .734 .102 3.010

State the hypothesis using the appropriate statistical notation.

H0:_______________________________ Ha:_______________________________ Give the value of the test statistic and the corresponding p-value.

Test Statistic = _______________________ p-value = ________________________ Based on the p-value, your decision would be (circle one): reject H0 do not reject H0.

Based on the decision, the conclusion would be: There is (circle one) sufficient insufficient evidence that sons are taller than their fathers, on average.

Here are several plots that could be used to check one of the assumptions required for this test to be valid. Clearly state that assumption and circle the plot(s) that should be examined to check it. The assumption is (be specific):

155

4. A medical doctor uses a diagnostic test to determine if her patient has rheumatoid arthritis. The doctor will prescribe treatment only if she thinks the patient has arthritis based on the test results. In a sense, the doctor is using a null and an alternative hypothesis to decide whether or not to administer treatment. The hypotheses might be stated as:

H0: The person does not have arthritis (no arthritis) versus Ha: The person has arthritis.

a. The grid below presents the four possible combinations of the doctor's decision and the "true" situation. There are two statements in each cell, one about the doctor's decision and one about the patient's actual condition. For each statement, clearly circle the word phrase in parentheses that makes the statements match the doctor's decision and the true state of the patient. There are eight statements all together, so be sure to make a selection for each one.

[2]

b. Identify the cell that represents a Type 1 Error (circle your answer below).

[1] Cell A Cell B Cell C Cell D

c. Identify the cell that represents a Type 2 Error (circle your answer below).

[1] Cell A Cell B Cell C Cell D

d. Circle the real-world consequence of the doctor committing a Type 2 Error.

[1]

• Doctor administers an unnecessary treatment to the patient.

• Doctor administers a necessary treatment to the patient.

• Doctor does not administer a treatment that is necessary for the patient.

• Doctor does not administer a treatment that is not necessary for the patient.

156

5. Regents of a large state university proposed a plan to increase student fees in order to build new parking facilities. A news channel claims that over 70% of the students are opposed to the plan. We wish to test this claim, that is, test H0: p = 0.70 versus Ha: p > 0.70. A random sample of 18 students is taken and 17 of them are opposed to the plan.

a. What is the population parameter of interest here? Give the statistical symbol and then describe in one brief sentence what it represents in this situation.

[2]

Symbol: ________ Description: __________________________________________ _____________________________________________________________________________

b. True or False: “In this example we can use normal approximation to binomial distribution.” [2] Circle one: True False

Give specific support for your selected answer:

c. Provide the test statistic and p-value for testing the hypotheses. Show all work. [4]

Test statistic: _________________ p-value: ________________________________

d. What is your decision at a 5% significance level? Circle one: Reject H0 Fail to reject H0 [1]

e. Give a one sentence conclusion in the context of this problem.

[2]

Therefore …

f. Suppose the alternative hypothesis is really true. Then the probability of rejecting the null hypothesis would be called …

[1] Circle one: Alpha Type 1 error Beta Type 2 error Level of significance Power

g. What could the news channel do to produce a test with higher power? Circle all correct answers. [1] Sample more students Repeat the study many times Use a 1% significance level

157

6. A researcher will compare men and women on 100 different quantitative variables. He will use a 5% significance level to carry out the 100 independent samples t-tests (one for each variable). If, for each test, the null hypothesis is actually true, about how many "statistically significant" results will be produced?

[2]

Final answer: ______________

7. A large study was conducted about binge drinking behavior of college students. Out of the 718 males surveyed, 163 revealed that they indulged in binge drinking, while 169 of the 991 women surveyed responded affirmatively. We wish to compare the two genders with respect to the population proportion that indulge in binge drinking.

163 = is a …

a. The value 0.227

718

[2] Circle all that apply:

Normal distribution Parameter Population proportion Sample Sample Proportion

b. We wish to test the null hypothesis that there is no difference between men and women college students with respect to the population proportion that indulge in binge drinking against the alternative that men are more prone to binge drinking. Let 1=the population of all male college students and 2=the population of all female college students.

[1] Finish the hypotheses to be tested. H0 p1 = p2 versus Ha: p1 _______ p2

c. Under the above null hypothesis, the two population proportions are the same.

Compute the estimate of this common population proportion. Show all work.

[2]

Final answer: __________________

d. The test statistic is z = 2.91 and the p-value is 0.002. Consider the following statements and clearly circle all that are correct.

[2]

i. If this study were repeated many times, the null hypothesis would be true only 0.2% of the time.

ii. If this study were repeated many times, we would see a z test statistic of 2.91 or more extreme in about 0.2% of the repetitions.

iii. If this study were repeated many times and if there is no difference between men and women college students with respect to the population proportion that indulge in binge drinking, we would see a z test statistic of 2.91 or more extreme in about 0.2% of the repetitions.

iv. The results are statistically significant at the 1% level.

158

8. A sample of 29 wines from a large distributor have been randomly selected for a wine tasting event. Three responses that are commonly measured, each on a quantitative scale, are aroma, flavor, and quality. The 29 wines were also classified by the region where each was produced. You are asked to provide a 95% confidence interval estimate for the difference in the population mean flavor rating for the two regions. SPSS output is provided. You may assume all conditions for a two independent samples t interval are met.

Group Statistics

Region

N Mean

Std.

Deviation

Std. Error Mean

Flavor

1

17 4.376 .880 .213

2

12 5.617 .915 .264

Independent Samples Test

Levene's Test for Equality of Variances

t-test for Equality of Means

95% Confidence Interval of the

Sig.

Mean

Std. Error

Difference

F Sig.

t df

(2-tailed)

Difference

Difference Lower Upper

Flavor

Equal variances assumed

Equal variances not assumed

.028 .868 -3.68 27 .0010 -1.240 .337 -1.932 -.548

-3.65 23.236 .0013 -1.240 .340 -1.942 -.538

a. Based on the output, give two reasons why you should report the pooled version of the two independent samples t interval for these data. Be specific, that is, include values in each of your explanations. [4]

Because …

Because …

b. Which of the following values must be the pooled estimate of the common population standard deviation? [1] Circle one: 0.028 0.868 0.337 0.894 0.799

c. The pooled 95% confidence interval for the difference in the population mean flavor rating for region 1 versus region 2 is given by (-1.932, -0.548). [2 points each]

i. The probability that (μ1 – μ2) lies between

–1.932 and –0.548 equals 0.95. True False

ii. If this procedure were repeated many times,

95% of the possible (μ1 – μ2) values will fall

between –1.932 and –0.548. True False

159

9. For each scenario below, determine if the most appropriate statistical analysis technique is conducting a hypothesis test (HT) or constructing a confidence interval (CI). Then also give the symbol for the corresponding parameter that is of interest.

[2 points each]

a. The supervisor of a tourist information desk at a local airport is interested in estimating how long it takes an employee to serve a customer on average. Using a stopwatch, he measures the amount of time (in minutes) it takes for each of 10 randomly selected customers to be served.

Circle one: HT CI Parameter of interest: _______________

b. In a recent survey, college students were asked the amount of time they spend watching television on a typical day and the amount of time they spend surfing on the Internet on a typical day. The researchers were interested in determining whether the time spent surfing the Internet was higher than the time spent watching television, on average.

Circle one: HT CI Parameter of interest: _______________

c. A study was done by randomly assigning 200 volunteers with sore throats to either drink a cup of herbal tea or use a throat lozenge to ease their pain. Each subject reported whether or not they experienced any relief. The researchers were interested in assessing if the tea was more effective as compared to the throat lozenge.

Circle one: HT CI Parameter of interest: _______________

d. The state lottery office claims that the average household income of those people playing the lottery is greater than $37,000. A sample of 25 households will be obtained and the incomes recorded to check this claim.

Circle one: HT CI Parameter of interest: _______________

e. A firm conducts a survey on a sample of employees. Management would like to assess if a majority of employees use the on-site exercise studio during the lunch hour.

Circle one: HT CI Parameter of interest: _______________

f. Men and women shop at a retail clothing store. The manager would like to estimate how much more (or less), on average, a woman spends on a typical purchase occasion than a man.

Circle one: HT CI Parameter of interest: _______________

160