This 10 page Class Notes was uploaded by an elite notetaker on Friday February 6, 2015.

Date Created: 02/06/15

CHAPTER 9 Analysis and Inference for TwoWay Tables Two way tables compare two categorical variables measured on a set of cases Examples 0 Gender versus major 0 Political party versus voting status Sometimes one or both variables are quantitative but we classify them into categories for data collection andor analysis For example suppose our variables are years of college education and income We decide to group years of education into four classes none some college Bachelor s degree and postgraduate We also decide to classify income in dollars into four classes lt10000 1000030000 3000150000 and gt50000 TwoWay Table 0 Describes the relationship between two categorical variables 0 Represent a table of counts Example 1 A survey on the severity of rodent problems in commercial poultry houses studied a random sample of poultry operations Each operation was classified by type egg or turkey production and by the extent of the rodent problems Here are the results Rodent Type Problem Egg Turkey Mild 34 22 Moderate 33 22 Severe 7 4 Note 0 There are two variables in this study type of poultry operation and extent of rodent problem Both of these variables are categorical 0 Each poultry operation represents a case Each case fits into one type of poultry operation and is ranked as mild moderate or severe in its rodent problems Consequently each case fits in one and only one cell of the body of the table The Joint Distribution of the Categorical Variables If we want the proportion of cases associated with any cell in the table we divide the count for that cell by the grand total the total number of cases in the entire table If we do this for each cell we will have the joint distribution of our two categorical variables 1 Find the joint distribution for the example above Marginal Distributions of Categorical variables The marginal distributions of each categorical variable are obtained from row and column totals Basically we are examining the distributions of a single variable in the twoway table Marginal distributions allow us to compare the relative frequencies among the levels of a single categorical variable 2 Find the marginal distribution of type of poultry operation for the example above 3 Find the marginal distribution of severity of rodent infestation for the example above Conditional Distributions of Categorical variables In conditional distributions we find the distribution of one categorical variable given a common level of another categorical variable 4 For the example above find the conditional distribution of severity of rodent problems among turkey operations 5 For the example above find the conditional distribution of type of operation for poultry operations that have mild rodent problems Often we are interested in comparing conditional distributions 6 Compare the severity of rodent infestation problems for each type of poultry operation Simpson s Paradox Simpson s Paradox refers to the reversal of the direction of a comparison or an association when data from several groups are combined to form a single group The Wickem Study on smoking Also read page 588590 Inference for TwoWay Tables In this chapter a test that compares more than two groups is presented The new test starts by presenting the data as a twoway table Twoway tables are also used to describe relationships between any two categorical variables Example 1 continued Suppose we wanted to do the following test H 0 There is no relationship between degree of rodent infestation and type of poultry operation H a There is a relationship between degree of rodent infestation and type of poultry operation Note The alternative hypothesis is no longer onesided or two sided It is manysided To do this hypothesis test we use the chisquared statistic We basically go through the same steps as we did in other hypothesis tests that we have covered ChiSguare test 0 State the Null and Alternative hypothesis 0 Find the test statistic The Chisquare statistic is a measure of how far the observed counts in the twoway table are from the expected counts The formula for the statistic is 2 observed exp ectea 2 Z exp ecled The sum is over all r X c cells in the table SPSS will calculate this for us 0 Calculate the Pvalue SPSS will do this for us 0 Compare the Pvalue to the a level If Pvalue S a then we reject H 0 If Pvalue gt a then we fail to reject H 0 0 State conclusions in terms of the problem Before we work through our example we will look at the Chi Squared distribution The Chi Squared Distribution The Chisquared distributions are a family of distributions that take only positive values and are skewed to the right A specific chisquare distribution is specified by one parameter called the degrees of freedom The chisquare test for a twoway table with r rows and c columns uses Pvalues from the chisquare distribution with rlcl degrees of freedom The Pvalue is the area to the right of 12 under the chisquare density curve Note As the degrees of freedom increase the density curve of the 12 distribution become less skewed and larger values become more probable Note The chisquare test is an overall test A small Pvalue tells us that there is some relationship between the row variable and the column variable It tells us nothing about the nature of the relationship Because the 12 statistic tells us nothing about the nature of the relationship we should always accompany the chisquare test by a description of what the data shows such as 0 Calculate and compare appropriate percents 0 Look at the cells that contribute the most to the 12 statistic ie look at the standardized residuals observed eXpectedeXpected in each cell 0 Look at bar graphs of the data We can expand this table to show the following Expected values percentages for row column and total Using SPSS o gtanalyze gtdescrzpt739ve statistics gtcrosstabs and pull problem into rows box and pull type into columns box 0 click on the cells button and check expected values and percentages for row column and total You will get the chart below problem type Crosstabulation ty e egg turkey Total problem mild Count 34 22 56 Expected Count 340 220 560 within problem 607 393 1000 within type 459 458 459 of Total 279 180 459 moderate Count 33 22 55 Expected Count 334 216 550 within problem 600 400 1000 within type 446 458 451 of Total 270 180 451 severe Count 7 4 11 Expected Count 67 43 110 within problem 636 364 1000 within type 95 83 90 of Total 57 33 90 Total Count 74 48 122 Expected Count 740 480 1220 within problem 607 393 1000 within type 1000 1000 1000 of Total 607 393 1000 Using SPSS Lastly to get the chisquare statistic you will need to do the following o gtanalyze gtdescriptive statistics gtcrosstabs and pull problem into rows box and pull type into columns box 0 click on statistics box and check chisquare Below is the output ChiSquare Tests Asymp Sig Value df 2sided Pearson ChiSquare 0513 2 975 Likelihood Ratio 051 2 975 N of Valid Cases 122 3 1 cells 167 have expected count less than 5 The minimum expected count is 433 Now let39s go through the hypothesis test 0 H 0 There is no relationship between degree of rodent Infestation and type of poultry operation H a There is a relationship between degree of rodent infestation and type of poultry operation 0 12 o PValue 0 Conclusions Cell Counts Required for the Chisquare test You can safely use the chisquare test when no more than 20 of the expected counts are less than 5 and all individual expected counts are l or greater In particular all four expected counts in a 2 x 2 table should be 5 or greater Now let s check the cell counts for our example Can we trust our results based on the above rule Example 2 From 4th edition of Moore and McCabe with variations Psychological factors and social factors can in uence the survival of patients with serious diseases One study examined the relationship between survival of patients with coronary heart disease and pet ownership Each of 92 patients was classified as having a pet or not by whether they survived for one year The researchers suspected that having a pet might be connected to the patient status Here are the data Pet Ownership Patient status No Yes Alive 28 50 Dead 1 l 3 Total 39 53 10 a Assuming the patient is still alive what is the probability that he owns a pet Is this a joint marginal or conditional prob ability b What is the probability that a patient owns and pet and is still alive Is this a joint marginal or conditional probability c What is the probability that a patient owns a pet Is this a joint marginal or conditional probability d State the hypotheses for a 12 test for this problem find the 12 test statistic its degrees of freedom and pValue for this problem State your conclusions in terms of the original problem ChiSquare Tests Value df Asymp Sig 2sided Exact Sig 2sided Exact Sig 1sided Pearson ChiSquare Continuity Correctior Likelihood Ratio Fishel s Exact Test N of Valid Cases 8851U 7190 9011 92 1 1 1 003 007 003 006 004 3 Computed only for a 2x2 table b 0 cells 0 have expected count less than 5 The minimum expected count is 5 93 e Find the sample proportion of pet owners who are alive and the proportion of nonpet owners who are alive 11

