A computer was used to generate four random numbers from a normal distribution with a set mean and variance: 1.1650, .6268, .0751, .3516. Five more random normal numbers with the same variance but perhaps a different mean were then generated (the mean may or may not actually be different): .3035, 2.6961, 1.0591, 2.7971, 1.2641. a. What do you think the means of the random normal number generators were? What do you think the difference of the means was? b. What do you think the variance of the random number generator was? c. What is the estimated standard error of your estimate of the difference of the means? d. Form a 90% confidence interval for the difference of the means of the random number generators. e. In this situation, is it more appropriate to use a one-sided test or a two-sided test of the equality of the means? f. What is the p-value of a two-sided test of the null hypothesis of equal means? g. Would the hypothesis that the means were the same versus a two-sided alternative be rejected at the significance level \(\alpha=.1\) ? h. Suppose you know that the variance of the normal distribution was \(\sigma^2=1\). How would your answers to the preceding questions change?
Read more- Statistics / Mathematical Statistics and Data Analysis 3 / Chapter 11 / Problem 45
Table of Contents
Textbook Solutions for Mathematical Statistics and Data Analysis
Question
This and the next two problems are based on discussions and data in Le Cam and Neyman (1967), which is devoted to the analysis of weather modification experiments. The examples illustrate some ways in which principles of experimental design have been used in this field. During the summers of 1957 through 1960, a series of randomized cloud-seeding experiments were carried out in the mountains of Arizona. Of each pair of successive days, one day was randomly selected for seeding to be done. The seeding was done during a two-hour to four-hour period starting at midday, and rainfall during the afternoon was measured by a network of 29 gauges. The data for the four years are given in the following table (in inches). Observations in this table are listed in chronological order.
a. Analyze the data for each year and for the years pooled together to see if there appears to be any effect due to seeding. You should use graphical descriptive methods to get a qualitative impression of the results and hypothesis tests to assess the significance of the results.
b. Why should the day on which seeding is to be done be chosen at random rather than just alternating seeded and unseeded days? Why should the days be paired at all, rather than just deciding randomly which days to seed?
Solution
The first step in solving 11 problem number 45 trying to solve the problem we have to refer to the textbook question: This and the next two problems are based on discussions and data in Le Cam and Neyman (1967), which is devoted to the analysis of weather modification experiments. The examples illustrate some ways in which principles of experimental design have been used in this field. During the summers of 1957 through 1960, a series of randomized cloud-seeding experiments were carried out in the mountains of Arizona. Of each pair of successive days, one day was randomly selected for seeding to be done. The seeding was done during a two-hour to four-hour period starting at midday, and rainfall during the afternoon was measured by a network of 29 gauges. The data for the four years are given in the following table (in inches). Observations in this table are listed in chronological order.a. Analyze the data for each year and for the years pooled together to see if there appears to be any effect due to seeding. You should use graphical descriptive methods to get a qualitative impression of the results and hypothesis tests to assess the significance of the results.b. Why should the day on which seeding is to be done be chosen at random rather than just alternating seeded and unseeded days? Why should the days be paired at all, rather than just deciding randomly which days to seed?
From the textbook chapter Comparing Two Samples you will find a few key concepts needed to solve this.
Visible to paid subscribers only
Step 3 of 7)Visible to paid subscribers only
full solution
This and the next two problems are based on discussions
Chapter 11 textbook questions
-
Chapter 11: Problem 1 Mathematical Statistics and Data Analysis 3
-
Chapter 11: Problem 2 Mathematical Statistics and Data Analysis 3
The difference of the means of two normal distributions with equal variance is to be estimated by sampling an equal number of observations from each distribution. If it were possible, would it be better to halve the standard deviations of the populations or double the sample sizes?
Read more -
Chapter 11: Problem 3 Mathematical Statistics and Data Analysis 3
In Section 11.2.1, we considered two methods of estimating \(\operatorname{Var}(\bar{X}-\bar{Y})\). Under the assumption that the two population variances were equal, we estimated this quantity by \(s_p^2\left(\frac{1}{n}+\frac{1}{m}\right)\) and without this assumption by \(\frac{s_X^2}{n}+\frac{s_Y^2}{m}\) Show that these two estimates are identical if m = n.
Read more -
Chapter 11: Problem 4 Mathematical Statistics and Data Analysis 3
Respond to the following: Using the t distribution is absolutely ridiculous—another example of deliberate mystification! It’s valid when the populations are normal and have equal variance. If the sample sizes were so small that the t distribution were practically different from the normal distribution, you would be unable to check these assumptions.
Read more -
Chapter 11: Problem 5 Mathematical Statistics and Data Analysis 3
Respond to the following: Here is another example of deliberate mystification - the idea of formulating and testing a null hypothesis. Let's take Example A of Section 11.2.1. It seems to me that it is inconceivable that the expected values of any two methods of measurement could be exactly equal. It is certain that there will be subtle differences at the very least. What is the sense, then, in testing \(H_0: \mu_X=\mu_Y\)?
Read more -
Chapter 11: Problem 6 Mathematical Statistics and Data Analysis 3
Respond to the following: I have two batches of numbers and I have a corresponding \(\bar{x}\) and \(\bar{y}\). Why should I test whether they are equal when I can just see whether they are or not?
Read more -
Chapter 11: Problem 7 Mathematical Statistics and Data Analysis 3
In the development of Section 11.2.1, where are the following assumptions used? (1) \(X_1, X_2, \ldots, X_n\) are independent random variables; (2) \(Y_1, Y_2, \ldots, Y_n\) are independent random variables; (3) the X 's and Y 's are independent.
Read more -
Chapter 11: Problem 8 Mathematical Statistics and Data Analysis 3
An experiment to determine the efficacy of a drug for reducing high blood pressure is performed using four subjects in the following way: two of the subjects are chosen at random for the control group and two for the treatment group. During the course of treatment with the drug, the blood pressure of each of the subjects in the treatment group is measured for ten consecutive days as is the blood pressure of each of the subjects in the control group. a. In order to test whether the treatment has an effect, do you think it is appropriate to use the two-sample t test with n = m = 20? b. Do you think it is appropriate to use the Mann-Whitney test with n = m = 20?
Read more -
Chapter 11: Problem 9 Mathematical Statistics and Data Analysis 3
Referring to the data in Section 11.2.1.1, compare iron retention at concentrations of 10.2 and .3 millimolar using graphical procedures and parametric and nonparametric tests. Write a brief summary of your conclusions.
Read more -
Chapter 11: Problem 10 Mathematical Statistics and Data Analysis 3
Verify that the two-sample t test at level \(\alpha\) of \(H_0: \mu_X=\mu_Y\) versus \(H_A: \mu_X \neq \mu_Y\) rejects if and only if the confidence interval for \(\mu_X-\mu_Y\) does not contain zero.
Read more -
Chapter 11: Problem 11 Mathematical Statistics and Data Analysis 3
Explain how to modify the t test of Section 11.2.1 to test \(H_0: \mu_X=\mu_Y+\Delta\) versus \(H_A: \mu_X \neq \mu_Y+\Delta\) where \(\Delta\) is specified.
Read more -
Chapter 11: Problem 12 Mathematical Statistics and Data Analysis 3
An equivalence between hypothesis tests and confidence intervals was demonstrated in Chapter 9. In Chapter 10, a nonparametric confidence interval for the median, \(\eta\), was derived. Explain how to use this confidence interval to test the hypothesis \(H_0: \eta=\eta_0\). In the case where \(\eta_0=0\), show that using this approach on a sample of differences from a paired experiment is equivalent to the sign test. The sign test counts the number of positive differences and uses the fact that in the case that the null hypothesis is true, the distribution of the number of positive differences is binomial with (n, .5). Apply the sign test to the data from the measurement of mercury levels, listed in Section 11.3.3.
Read more -
Chapter 11: Problem 13 Mathematical Statistics and Data Analysis 3
Let \(X_1, \ldots, X_{25}\) be i.i.d. N(.3,1). Consider testing the null hypothesis \(H_0: \mu=0\) versus \(H_A: \mu>0\) at significance level \(\alpha=.05\). Compare the power of the sign test and the power of the test based on normal theory assuming that \(\sigma\) is known.
Read more -
Chapter 11: Problem 14 Mathematical Statistics and Data Analysis 3
Suppose that \(X_1, \ldots, X_n\) are i.i.d. \(N\left(\mu, \sigma^2\right)\). To test the null hypothesis \(H_0: \mu=\) \(\mu_0\), the t test is often used: \(t=\frac{\bar{X}-\mu_0}{s_{\bar{X}}}\) Under \(H_0, t\) follows a t distribution with n - 1 df. Show that the likelihood ratio test of this \(H_0\) is equivalent to the t test.
Read more -
Chapter 11: Problem 15 Mathematical Statistics and Data Analysis 3
Suppose that n measurements are to be taken under a treatment condition and another n measurements are to be taken independently under a control condition. It is thought that the standard deviation of a single observation is about 10 under both conditions. How large should n be so that a 95% confidence interval for \(\mu_X-\mu_Y\) has a width of 2 ? Use the normal distribution rather than the $t$ distribution, since n will turn out to be rather large.
Read more -
Chapter 11: Problem 16 Mathematical Statistics and Data Analysis 3
Referring to Problem 15, how large should n be so that the test of \(H_0: \mu_X=\mu_Y\) against the one-sided alternative \(H_A: \mu_X>\mu_Y\) has a power of .5 if \(\mu_X-\mu_Y=2\) and \(\alpha=.10\) ?
Read more -
Chapter 11: Problem 17 Mathematical Statistics and Data Analysis 3
Consider conducting a two-sided test of the null hypothesis \(H_0: \mu_X=\mu_Y\) as described in Problem 16. Sketch power curves for (a) \(\alpha=.05, n=20 ;\) (b) \(\alpha=.10, n=20\); (c) \(\alpha=.05, n=40 ;\) (d) \(\alpha=.10, n=40\). Compare the curves.
Read more -
Chapter 11: Problem 18 Mathematical Statistics and Data Analysis 3
Two independent samples are to be compared to see if there is a difference in the population means. If a total of m subjects are available for the experiment, how should this total be allocated between the two samples in order to (a) provide the shortest confidence interval for \(\mu_X-\mu_Y\) and (b) make the test of \(H_0: \mu_X=\mu_Y\) as powerful as possible? Assume that the observations in the two samples are normally distributed with the same variance.
Read more -
Chapter 11: Problem 19 Mathematical Statistics and Data Analysis 3
An experiment is planned to compare the mean of a control group to the mean of an independent sample of a group given a treatment. Suppose that there are to be 25 samples in each group. Suppose that the observations are approximately normally distributed and that the standard deviation of a single measurement in either group is \(\sigma=5\). a. What will the standard error of \(\bar{Y}-\bar{X}\) be? b. With a significance level \(\alpha=.05\), what is the rejection region of the test of the null hypothesis \(H_0: \mu_Y=\mu_X\) versus the alternative \(H_A: \mu_Y>\mu_X\) ? c. What is the power of the test if \(\mu_Y=\mu_X+1\) ? d. Suppose that the p-value of the test turns out to be 0.07. Would the test reject at significance level \(\alpha=.10\) ? e. What is the rejection region if the alternative is \(H_A: \mu_Y \neq \mu_X\) ? What is the power if \(\mu_Y=\mu_X+1\) ?
Read more -
Chapter 11: Problem 20 Mathematical Statistics and Data Analysis 3
Consider Example A of Section 11.3.1 using a Bayesian model. As in the example, use a normal model for the differences and also use an improper prior for the expected difference and the precision (as in the case of unknown mean and variance in Section 8.6). Find the posterior probability that the expected difference is positive. Find a 90% posterior credibility interval for the expected difference.
Read more -
Chapter 11: Problem 21 Mathematical Statistics and Data Analysis 3
A study was done to compare the performances of engine bearings made of different compounds (McCool 1979). Ten bearings of each type were tested. The following table gives the times until failure (in units of millions of cycles): a. Use normal theory to test the hypothesis that there is no difference between the two types of bearings. b. Test the same hypothesis using a nonparametric method. c. Which of the methods - that of part (a) or that of part (b) - do you think is better in this case? d. Estimate \(\pi\), the probability that a type I bearing will outlast a type II bearing. e. Use the bootstrap to estimate the sampling distribution of \(\hat{\pi}\) and its standard error. f. Use the bootstrap to find an approximate 90% confidence interval for \(\pi\).
Read more -
Chapter 11: Problem 22 Mathematical Statistics and Data Analysis 3
An experiment was done to compare two methods of measuring the calcium content of animal feeds. The standard method uses calcium oxalate precipitation followed by titration and is quite time-consuming. A new method using flame photometry is faster. Measurements of the percent calcium content made by each method of 118 routine feed samples (Heckman 1960) are contained in the file calcium. Analyze the data to see if there is any systematic difference between the two methods. Use both parametric and nonparametric tests and graphical methods.
Read more -
Chapter 11: Problem 23 Mathematical Statistics and Data Analysis 3
Let \(X_1, \ldots, X_n\) be i.i.d. with cdf F, and let \(Y_1, \ldots, Y_m\) be i.i.d. with cdf G. The hypothesis to be tested is that F = G . Suppose for simplicity that m + n is even so that in the combined sample of X's and Y's, (m+n) / 2 observations are less than the median and (m+n) / 2 are greater. a. As a test statistic, consider T, the number of X's less than the median of the combined sample. Show that T follows a hypergeometric distribution under the null hypothesis: \(P(T=t)=\frac{\left(\begin{array}{c}(m+n) / 2 \\t\end{array}\right)\left(\begin{array}{c}(m+n) / 2 \\n-t\end{array}\right)}{\left(\begin{array}{c}m+n \\n\end{array}\right)}\) Explain how to form a rejection region for this test. b. Show how to find a confidence interval for the difference between the median of F and the median of \)G\) under the shift model, \(G(x)=F(x-\Delta)\). (Hint: Use the order statistics.) c. Apply the results (a) and (b) to the data of Problem 21.
Read more -
Chapter 11: Problem 24 Mathematical Statistics and Data Analysis 3
Find the exact null distribution of the Mann-Whitney statistic, \(U_Y\), in the case where \(m=3\) and n = 2.
Read more -
Chapter 11: Problem 25 Mathematical Statistics and Data Analysis 3
Referring to Example A in Section 11.2 .1, (a) if the smallest observation for method B (79.94) is made arbitrarily small, will the t test still reject? (b) If the largest observation for method B (80.03) is made arbitrarily large, will the t test still reject? (c) Answer the same questions for the Mann-Whitney test.
Read more -
Chapter 11: Problem 26 Mathematical Statistics and Data Analysis 3
Let \(X_1, \ldots, X_n\) be a sample from an N(0,1) distribution and let \(Y_1, \ldots, Y_n\) be an independent sample from an N(1,1) distribution. a. Determine the expected rank sum of the X 's. b. Determine the variance of the rank sum of the X 's.
Read more -
Chapter 11: Problem 27 Mathematical Statistics and Data Analysis 3
Find the exact null distribution of \(W_+\) in the case where n = 4.
Read more -
Chapter 11: Problem 28 Mathematical Statistics and Data Analysis 3
Forn=10, 20, and 30, find the .05 and .01 critical values for a two-sided signedrank test from the tables and then by using the normal approximation. Comparethe values.
Read more -
Chapter 11: Problem 29 Mathematical Statistics and Data Analysis 3
(Permutation Test for Means) Here is another view on hypothesis testing that we will illustrate with Example A of Section 11.2.1. We ask whether the measurements produced by methods A and B are identical or exchangeable in the following sense. There are 13 + 8 = 21 measurements in all and there are \(\left(\begin{array}{c}21 \\ 8\end{array}\right)\), or about \(2 \times 10^5\), ways that 8 of these could be assigned to method \)\mathbf{B}\). Is the particular assignment we have observed unusual among these in the sense that the means of the two samples are unusually different? a. It's not inconceivable, but it may be asking too much for you to generate all \(\left(\begin{array}{c}21 \\ 8\end{array}\right)\) partitions. So just choose a random sample of these partitions, say of size 1000 , and make a histogram of the resulting values of \(\bar{X}_A-\bar{X}_B\). Where on this distribution does the value of \(\bar{X}_A-\bar{X}_B\) that was actually observed fall? Compare to the result of Example B of Section 11.2.1. b. In what way is this procedure similar to the Mann-Whitney test?
Read more -
Chapter 11: Problem 30 Mathematical Statistics and Data Analysis 3
Use the bootstrap to estimate the standard error of and a confidence interval for \(\bar{X}_A-\bar{X}_B\) and compare to the result of Example A of Section 11.2.1.
Read more -
Chapter 11: Problem 31 Mathematical Statistics and Data Analysis 3
In Section 11.2.3, if F = G, what are \(E(\hat{\pi})\) and \(\operatorname{Var}(\hat{\pi})\) ? Would there be any advantage in using equal sample sizes m = n in estimating \(\pi\) or does it make no difference?
Read more -
Chapter 11: Problem 32 Mathematical Statistics and Data Analysis 3
If \(X \sim N\left(\mu_X, \sigma_X^2\right)\) and Y is independent \(N\left(\mu_Y, \sigma_Y^2\right)\), what is \(\pi=P(X<Y)\) in terms of \(\mu_X, \mu_Y, \sigma_X\), and \(\sigma_Y\) ?
Read more -
Chapter 11: Problem 33 Mathematical Statistics and Data Analysis 3
To compare two variances in the normal case, let \(X_1, \ldots, X_n\) be i.i.d. \(N\left(\mu_X, \sigma_X^2\right)\), and let \(Y_1, \ldots, Y_m\) be i.i.d. \(N\left(\mu_Y, \sigma_Y^2\right)\), where the X's and Y's are independent samples. Argue that under \(H_0: \sigma_X=\sigma_Y\), \(\frac{s_X^2}{s_Y^2} \sim F_{n-1, m-1}\) a. Construct rejection regions for one- and two-sided tests of \(H_0\). b. Construct a confidence interval for the ratio \(\sigma_X^2 / \sigma_Y^2\). c. Apply the results of parts (a) and (b) to Example A in Section 11.2.1. (Caution: This test and confidence interval are not robust against violations of the assumption of normality.)
Read more -
Chapter 11: Problem 34 Mathematical Statistics and Data Analysis 3
This problem contrasts the power functions of paired and unpaired designs. Graph and compare the power curves for testing \(H_0: \mu_X=\mu_Y\) for the following two designs. a. Paired: \(\operatorname{Cov}\left(X_i, Y_i\right)=50, \sigma_X=\sigma_Y=10, i=1, \ldots, 25\). b. Unpaired: \(X_1, \ldots, X_{25}\) and \(Y_1, \ldots, Y_{25}\) are independent with variance as in part (a).
Read more -
Chapter 11: Problem 35 Mathematical Statistics and Data Analysis 3
An experiment was done to measure the effects of ozone, a component of smog.A group of 22 seventy-day-old rats were kept in an environment containing ozone for 7 days, and their weight gains were recorded. Another group of 23 rats of a similar age were kept in an ozone-free environment for a similar time, and their weight gains were recorded. The data (in grams) are given below. Analyze the data to determine the effect of ozone. Write a summary of your conclusions.[This problem is from Doksum and Sievers (1976) who provide an interesting analysis.]
Read more -
Chapter 11: Problem 36 Mathematical Statistics and Data Analysis 3
Lin, Sutton, and Qurashi (1979) compared microbiological and hydroxylamine methods for the analysis of ampicillin dosages. In one series of experiments, pairs of tablets were analyzed by the two methods. The data in the following table give the percentages of claimed amount of ampicillin found by the two methods in several pairs of tablets. What are \(\bar{X}-\bar{Y}\) and \(s_{\bar{X}-\bar{Y}}\) ? If the pairing had been erroneously ignored and it had been assumed that the two samples were independent, what would have been the estimate of the standard deviation of \(\bar{X}-\bar{Y}\) ? Analyze the data to determine if there is a systematic difference between the two methods.
Read more -
Chapter 11: Problem 37 Mathematical Statistics and Data Analysis 3
Stanley and Walton (1961) ran a controlled clinical trial to investigate the effect of the drug stelazine on chronic schizophrenics. The trials were conducted on chronic schizophrenics in two closed wards. In each of the wards, the patients were divided into two groups matched for age, length of time in the hospital, and score on a behavior rating sheet. One member of each pair was given stelazine, and the other a placebo. Only the hospital pharmacist knew which member of each pair received the actual drug. The following table gives the behavioral rating scores for the patients at the beginning of the trial and after 3 mo. High scores are good. a. For each of the wards, test whether stelazine is associated with improvement in the patients’ scores. b. Test if there is any difference in improvement between the wards. [These data are also presented in Lehmann (1975), who discusses methods of combining the data from the wards.]
Read more -
Chapter 11: Problem 38 Mathematical Statistics and Data Analysis 3
Bailey, Cox, and Springer (1978) used high-pressure liquid chromatography to measure the amounts of various intermediates and by-products in food dyes. The following table gives the percentages added and found for two substances in the dye FD&C Yellow No. 5. Is there any evidence that the amounts found differ systematically from the amounts added?
Read more -
Chapter 11: Problem 39 Mathematical Statistics and Data Analysis 3
An experiment was done to test a method for reducing faults on telephone lines (Welch 1987). Fourteen matched pairs of areas were used. The following table shows the fault rates for the control areas and for the test areas: a. Plot the differences versus the control rate and summarize what you see. b. Calculate the mean difference, its standard deviation, and a confidence interval. c. Calculate the median difference and a confidence interval and compare to the previous result. d. Do you think it is more appropriate to use a t test or a nonparametric method to test whether the apparent difference between test and control could be due to chance? Why? Carry out both tests and compare.
Read more -
Chapter 11: Problem 40 Mathematical Statistics and Data Analysis 3
Biological effects of magnetic fields are a matter of current concern and research.In an early study of the effects of a strong magnetic field on the development of mice (Barnothy 1964), 10 cages, each containing three 30-day-old albino female mice, were subjected for a period of 12 days to a field with an average strength of 80 Oe/cm. Thirty other mice housed in 10 similar cages were not placed ina magnetic field and served as controls. The following table shows the weight gains, in grams, for each of the cages. a.Display the data graphically with parallel dot plots. (Draw two parallel num-ber lines and put dots on one corresponding to the weight gains of the con-trols and on the other at points corresponding to the gains of the treatment group.) b.Find a 95% confidence interval for the difference of the mean weight gains. c.Use a t test to assess the statistical significance of the observed difference.What is the p-value of the test? d.Repeat using a nonparametric test. e.What is the difference of the median weight gains? f.Use the bootstrap to estimate the standard error of the difference of median weight gains. g.Form a confidence interval for the difference of median weight gains based on the bootstrap approximation to the sampling distribution.
Read more -
Chapter 11: Problem 41 Mathematical Statistics and Data Analysis 3
The Hodges-Lehmann shift estimate is defined to be \(\hat{\Delta}=\operatorname{median}\left(X_i-Y_j\right)\), where \(X_1, X_2, \ldots, X_n\) are independent observations from a distribution F and \(Y_1, Y_2, \ldots, Y_m\) are independent observations from a distribution \)G\) and are independent of the \(X_i\). a. Show that if F and G are normal distributions, then \(E(\hat{\Delta})=\mu_X-\mu_Y\). b. Why is \(\hat{\Delta}\) robust to outliers? c. What is \(\hat{\Delta}\) for the previous problem and how does it compare to the differences of the means and of the medians? d. Use the bootstrap to approximate the sampling distribution and the standard error of \(\hat{\Delta}\). e. From the bootstrap approximation to the sampling distribution, form an approximate 90% confidence interval for \(\hat{\Delta}\).
Read more -
Chapter 11: Problem 42 Mathematical Statistics and Data Analysis 3
Use the data of Problem 40 of Chapter 10. a.Estimate \(\pi\), the probability that more rain will fall from a randomly selected seeded cloud than from a randomly selected unseeded cloud. b.Use the bootstrap to estimate the standard error of \(\hat \pi\). c.Use the bootstrap to form an approximate confidence interval for \(\pi\).
Read more -
Chapter 11: Problem 43 Mathematical Statistics and Data Analysis 3
Suppose that \(X_1, X_2, \ldots, X_n\) and \(Y_1, Y_2, \ldots, Y_m\) are two independent samples. As a measure of the difference in location of the two samples, the difference of the 20% trimmed means is used. Explain how the bootstrap could be used to estimate the standard error of this difference.
Read more -
Chapter 11: Problem 44 Mathematical Statistics and Data Analysis 3
Interest in the role of vitamin C in mental illness in general and schizophrenia in particular was spurred by a paper of Linus Pauling in 1968. This exercise takes its data from a study of plasma levels and urinary vitamin C excretion in schizophrenic patients (Suboti?anec et al. 1986). Twenty schizophrenic patients and 15 controls with a diagnosis of neurosis of different origin who had been patients at the same hospital for a minimum of 2 months were selected for the study. Before the experiment, all the subjects were on the same basic hospital diet. A sample of 2 ml of venous blood for vitamin C determination was drawn from each subject before breakfast and after the subjects had emptied their bladders. Each subject was then given $1 \mathrm{~g}$ ascorbic acid dissolved in water. No foods containing ascorbic acid were available during the test. For the next 6 h all urine was collected from the subjects for assay of vitamin C. A second blood sample was also drawn 2 h after the dose of vitamin C. The following two tables show the plasma concentrations (mg/dl). a.Graphically compare the two groups at the two times and for the difference in concentration at the two times. b.Use thettest to assess the strength of the evidence for differences between the two groups at 0 h, at 2 h, and the difference 2 h?0h. c.Use the Mann-Whitney test to test the hypotheses of (b).The following tables show the amounts of urinary vitamin C, both total and milligrams per kilogram of body weight, for the two groups: d. Use descriptive statistics and graphical presentations to compare the two groups with respect to total excretion and mg/kg body weight. Do the data look normally distributed? e. Use a t test to compare the two groups on both variables. Is the normality assumption reasonable? f. Use the Mann-Whitney test to compare the two groups. How do the results compare with those obtained in part (e)? The lower levels of plasma vitamin C in the schizophrenics before administration of ascorbic acid could be attributed to several factors. Interindividual differences in the intake of meals cannot be excluded, despite the fact that all patients were offered the same food. A more interesting possibility is that the differences are the result of poorer resorption or of higher ascorbic acid utilization in schizophrenics. In order to answer this question, another experiment was run on 15 schizophrenics and 15 controls. All subjects were given 70 mg of ascorbic acid daily for 4 weeks before the ascorbic acid loading test. The following table shows the concentration of plasma vitamin C (mg/dl) and the 6-h urinary excretion (mg) after administration of 1 g ascorbic acid. g. Use graphical methods and descriptive statistics to compare the two groups with respect to plasma concentrations and urinary excretion. h. Use the t test to compare the two groups on the two variables. Does the normality assumption look reasonable? i. Compare the two groups using the Mann-Whitney test.
Read more -
Chapter 11: Problem 45 Mathematical Statistics and Data Analysis 3
This and the next two problems are based on discussions and data in Le Cam and Neyman (1967), which is devoted to the analysis of weather modification experiments. The examples illustrate some ways in which principles of experimental design have been used in this field. During the summers of 1957 through 1960, a series of randomized cloud-seeding experiments were carried out in the mountains of Arizona. Of each pair of successive days, one day was randomly selected for seeding to be done. The seeding was done during a two-hour to four-hour period starting at midday, and rainfall during the afternoon was measured by a network of 29 gauges. The data for the four years are given in the following table (in inches). Observations in this table are listed in chronological order. a. Analyze the data for each year and for the years pooled together to see if there appears to be any effect due to seeding. You should use graphical descriptive methods to get a qualitative impression of the results and hypothesis tests to assess the significance of the results. b. Why should the day on which seeding is to be done be chosen at random rather than just alternating seeded and unseeded days? Why should the days be paired at all, rather than just deciding randomly which days to seed?
Read more -
Chapter 11: Problem 46 Mathematical Statistics and Data Analysis 3
The National Weather Bureau’s ACN cloud-seeding project was carried out in the states of Oregon and Washington. Cloud seeding was accomplished by dispersing dry ice from an aircraft; only clouds that were deemed “ripe” for seeding were candidates for seeding. On each occasion, a decision was made at random whether to seed, the probability of seeding being \(\frac {2}{3}\) . This resulted in 22 seeded and 13 control cases. Three types of targets were considered, two of which are dealt with in this problem. Type I targets were large geographical areas downwind from the seeding; type II targets were sections of type I targets located so as to have, theoretically, the greatest sensitivity to cloud seeding. The following table gives the average target rainfalls (in inches) for the seeded and control cases, listed in chronological order. Is there evidence that seeding has an effect on either type of target? In what ways is the design of this experiment different from that of the one in Problem 45?
Read more -
Chapter 11: Problem 47 Mathematical Statistics and Data Analysis 3
During 1963 and 1964, an experiment was carried out in France; its design differed somewhat from those of the previous two problems. A 1500-km target area was selected, and an adjacent area of about the same size was designated as the control area; 33 ground generators were used to produce silver iodide to seed the target area. Precipitation was measured by a network of gauges for each suitable “rainy period,” which was defined as a sequence of periods of continuous precipitation between dry spells of a specified length. When a forecaster determined that the situation was favorable for seeding, he telephoned an order to a service agent, who then opened a sealed envelope that contained an order to actually seed or not. The envelopes had been prepared in advance, using a table of random numbers. The following table gives precipitation (in inches) in the target and control areas for the seeded and unseeded periods. a. Analyze the data, which are listed in chronological order, to see if there is an effect of seeding. b. The analysis done by the French investigators used the square root transformation in order to make normal theory more applicable. Do you think that taking the square root was an effective transformation for this purpose? c. Reflect on the nature of this design. In particular, what advantage is there to using the control area? Why not just compare seeded and unseeded periods on the target area?
Read more -
Chapter 11: Problem 48 Mathematical Statistics and Data Analysis 3
Proteinuria, the presence of excess protein in urine, is a symptom of renal (kidney) distress among diabetics. Taguma et al. (1985) studied the effects of captopril for treating proteinuria in diabetics. Urinary protein was measured for 12 patients before and after eight weeks of captopril therapy. The amounts of urinary protein (in g/24 hrs) before and after therapy are shown in the following table. What can you conclude about the effect of captopril? Consider using parametric or nonparametric methods and analyzing the data on the original scale or on a log scale.
Read more -
Chapter 11: Problem 49 Mathematical Statistics and Data Analysis 3
Egyptian researchers, Kamal et al. (1991), took a sample of 126 police officers subject to inhalation of vehicle exhaust in downtown Cairo and found an average blood level concentration of lead equal to \(29.2 \ \mu \mathrm {g/dl}\) with a standard deviation of \(7.5 \ \mu \mathrm {g/dl}\). A sample of 50 policemen from a suburb, Abbasia, had an average concentration of \(18.2 \ \mu \mathrm {g/dl}\) and a standard deviation of \(5.8 \ \mu \mathrm {g/dl}\). Form a confidence interval for the population difference and test the null hypothesis that there is no difference in the populations.
Read more -
Chapter 11: Problem 50 Mathematical Statistics and Data Analysis 3
The file bodytemp contains normal body temperature readings (degrees Fahrenheit) and heart rates (beats per minute) of 65 males (coded by 1) and 65 females (coded by 2) from Shoemaker (1996). a. Using normal theory, form a 95% confidence interval for the difference of mean body temperatures between males and females. Is the use of the normal approximation reasonable? b. Using normal theory, form a 95% confidence interval for the difference of mean heart rates between males and females. Is the use of the normal approximation reasonable? c. Use both parametric and nonparametric tests to compare the body temperatures and heart rates. What do you conclude?
Read more -
Chapter 11: Problem 51 Mathematical Statistics and Data Analysis 3
A common symptom of otitis-media (inflammation of the middle ear) in young children is the prolonged presence of fluid in the middle ear, called middle-ear effusion. It is hypothesized that breast-fed babies tend to have less prolonged effusions than do bottle-fed babies. Rosner (2006) presents the results of a study of 24 pairs of infants who were matched according to sex, socioeconomic status, and type of medication taken. One member of each pair was bottle-fed and the other was breast-fed. The file ears gives the durations (in days) of middle-ear effusions after the first episode of otitis-media. a. Examine the data using graphical methods and summarize your conclusions. b. In order to test the hypothesis of no difference, do you think it is more appropriate to use a parametric or a nonparametric test? Carry out a test. What do you conclude?
Read more -
Chapter 11: Problem 52 Mathematical Statistics and Data Analysis 3
The media often present short reports of the results of experiments. To the critical reader or listener, such reports often raise more questions than they answer. Comment on possible pitfalls in the interpretation of each of the following. a. It is reported that patients whose hospital rooms have a window recover faster than those whose rooms do not. b. Nonsmoking wives whose husbands smoke have a cancer rate twice that of wives whose husbands do not smoke. c. A 2-year study in North Carolina found that 75% of all industrial accidents in the state happened to workers who had skipped breakfast. d. A school integration program involved busing children from minority schools to majority (primarily white) schools. Participation in the program was voluntary. It was found that the students who were bused scored lower on standardized tests than did their peers who chose not to be bused. e. When a group of students were asked to match pictures of newborns with pictures of their mothers, they were correct 36% of the time. f. A survey found that those who drank a moderate amount of beer were healthier than those who totally abstained from alcohol. g. A 15-year study of more than 45,000 Swedish soldiers revealed that heavy users of marijuana were six times more likely than nonusers to develop schizophrenia. h. A University of Wisconsin study showed that within 10 years of the wedding, 38% of those who had lived together before marriage had split up, compared to 27% of those who had married without a “trial period.” i. A study of nearly 4,000 elderly North Carolinians has found that those who attended religious services every week were 46% less likely to die over a six-year period than people who attended less often or not at all, according to researchers at Duke University Medical Center.
Read more -
Chapter 11: Problem 53 Mathematical Statistics and Data Analysis 3
Explain why in Levine’s experiment (Example A in Section 11.3.1) subjects also smoked cigarettes made of lettuce leaves and unlit cigarettes.
Read more -
Chapter 11: Problem 54 Mathematical Statistics and Data Analysis 3
This example is taken from an interesting article by Joiner (1981) and from data in Ryan, Joiner, and Ryan (1976). The National Institute of Standards and Technology supplies standard materials of many varieties to manufacturers and other parties, who use these materials to calibrate their own testing equipment. Great pains are taken to make these reference materials as homogeneous as possible. In an experiment, a long homogeneous steel rod was cut into 4-inch lengths, 20 of which were randomly selected and tested for oxygen content. Two measurements were made on each piece. The 40 measurements were made over a period of 5 days, with eight measurements per day. In order to avoid possible bias from time-related trends, the sequence of measurements was randomized. The file steelrods contains the measurements. There is an unexpected systematic source of variability in these data. Can you find it by making an appropriate plot? Would this effect have been detectable if the measurements had not been randomized over time?
Read more