# Chi-Square (Chapter 20) Notes PSYCH-UA 10 - 001

The entire Chi-Square lecture
Date Created: 05/03/16
Chapter 20 – Chi-Square Tests 5/3/16 Chi Squares tests are non-parametric.  Non-parametric work with categorical (nominal/ordinal) data ; use frequencies per category  Make few assumptions about the population distribution  Less powerful 2 Chi-square distribution (X )  Always positively skewed  Shape depends on degrees of freedom Example: N=90 Tortilla chips Potato chips Cheese Doodles fo (observed frequency)38 28 24 fe (expected frequency)30 30 30 The question: is one preferred more than the others? 1. Fill the Fecells with 30, assuming there is an equal preference 2. Calculate X using formula (the goodness of fit test) f  f 2    o e e  = 3.47 2 3. Test for significance  need critical value. (df = k -1 = 2). X cv= 5.99 a. Note: as df increases, cv increases because you’re measuring participants b. Values that fall inside the cv are saying that the observed and expected frequencies are relatively similar. c. Values that fall past the cv are saying that the observed and expected frequencies are different. 4. In this case, we don’t have significance. There is no significant preference for anyone’s snack food. Problem: Chi-square can tell you that there is a mismatch, but it doesn’t tell you where. You would do more chi-squares with a Bonferonni adjusted alpha. Ways to get f e 1. Divide N by k (theory of “no preference”) 2. Representation in the population – take sample size and multiply it by proportion Example: Under 30 30­60 Over 60 (30%) (40%) (30%) o 5 5 40 f 15 20 15 e 50 (0.30) = 15 50 (0.40) = 20 50 (0.30) = 15 Plug into chi-square formula  X =5.99 2 Varieties of a one way X test: 1. Population proportions are known  Sometimes we have good estimates of population proportions 2. Expected frequencies are hypothesized to be equal  Games of chance 3. Shape of a distribution is being tested  Ex: Dealing with height – let’s say we reject the null: i. Was height normally distributed? ii. Was my sample randomly selected?  helps figure out why there is significance 4. Theoretical model is being tested  Hoping that observed and expected are similar (no significance) Two-Variable Contingency Tables Self Esteem Row  High Medium Low sum Acad. Perform. High 17 32 11 60 Low 13 43 34 90 Column  30 75 45 N=150 Sums  Pearson’s chi-square test of association (independence) • H0: there is no association or correlation between self-esteem and academic performance; the way one of the variables is distributed into categories does not change at different levels of the second variable • H : negation of the null; not independent A Note: Process to find expected frequencies is different: 1. = 60(30) / 150 = 12 = 60 (75) / 150 = 32 2 2. Once you filled out your frequencies, plug them into chi-square formula. X = 8.23 3. df = (R – 1) (C – 1) = 2  X2cv= 5.99 4. In this case, we reject the null. There is a relationship between self-esteem and academic performance. An easier way to do the 2x2 Chi-Square tables: Example: When looking at relationship between age and altruism: Does it look like there’s a difference among ages in the returned column? No but let’s test it: Returned Not returned Younger than 35 19 (A) 9 (B) Older than 35 20 (C) 8 (D) 1. Plug into this formula: = 0.085 2 2. df= k – 1 =1  X cv= 3.84 3. Fail to reject the null. There is no relationship between age and altruism. Assumptions of Chi-Square tests:  Mutually exclusive categories (categories don’t overlap – you’re either in one or the other)  Mutually exhaustive categories (testing against all possible options)  Independence of observations (one subject per category)  Expected frequencies can’t be too small! So you can combine categories with low frequencies, but you can’t combine categories to give you desired answers Now we ask: If there’s a relationship, how strong is it? -If you have a 2X2 Contingency Table … Phi coefficient gives you a measure of strength of association. Looking back at Altruism example  plug Chi square and N into formula = 0.039. -If you have a larger Contingency Table … In this formula, L = the # of either rows of columns, depending on which one is smaller The Cross-Product Ratio (or Odds Ratio)  Only used for 2x2 Contingency Tables  Cross-product ratio = (a)(d) (b)(c)  1.0 represents a total lack of association between the variables  Much larger OR smaller than 1.0 indicates a STRONG degree of association

