Statistical Concepts and Reasoning

by: Hilbert Denesik

Statistical Concepts and Reasoning STAT 100

Pennsylvania State University > Statistics > STAT 100
Hilbert Denesik
Penn State
GPA 3.92


Feb 18 Statistic for the day Average annual beer consumption of American college students Almost 4 billion cans Estimated percentage of freshman class nationwide that will drop out for alcohol related reasons 7 Assignment Read Chapter 12 pp 235238 Exercises 9 11 15 17 Exercise 1 Follow the 4 steps and answer the Research Question Was there a relationship between sex and ownership of cell phones among STAT 100 students in 2004 Data Rows sex Columns cell phone no yes All female 12 124 136 male 14 87 101 All 26 211 237 Counts and percents Spring 2004 Rows Sex Columns Cel 1phone No Yes All Female 12 124 135 882 9118 10000 Male 14 87 101 1386 8614 10000 So 91 18 ofwomen in the sample say yes but only 8614 ofmen m the sample say yes Are they statistically signi cantly different Step 1 We must compute what the skeptic expects No Yes Women 3977 B 136 Men C D tOl 26 Zl l 237 2 M 1492 Repeat for B C D 237 The strategy for determining statistical significance I First gure out what you expect to see ifthere is no difference between females and ma s a l Second gure out how far the data is from what is expected I Third decide ifthe distance in the second step is large I Fourth iflarge then claim there is a statistically signi cant difference Step 1 cont d excel i l tl sl39kll t39 It Red Expected counts if skeptic is correct Cellphone A11 Female 2 34 135 1492 121 03 Male 101 14 87 1108 8992 Total 26 211 237 linen ifiiitgiwmin Ill Step 2 Red Expected counts if skeptic is correct Step 2 cont d Cellphone No Yes All Female 12 124 135 14 92 121 08 Male 14 87 101 11 08 89 92 Total 26 211 237 ChiSq 0571 0070 0769 0095 1506 77 2 7 2 1 1492 571 124 12108 2070 1492 12108 7 z 77 z 14 1108 2769 87 8992 095 1108 8992 ChiSq 0571 0070 0769 0095 1506 Step 3 Accepted definition of large for scientific purposes Something is MWnen Cnirsquared distribution With it is in the outer5 tail l degree offreedorn of the appropriate dismbumn a lfcnirsquared statistic is largertnan384 itis 39 declared large and the 5 research advocate Wins Our chisquared value cmmvm 1506 Step 4 No statistically significant difference Rows Sex Columns Cellphone Yes All Femal e 12 1 24 13 6 882 9118 10000 Male 1 87 10 1 1386 8614 10000 Hence the difference 9118 ofwomen Versus 8614 ofmen is not statistically signi cant in this case Note sample size has been automatieany considered Counts and percents Fall 2001 Rows sex Columns cellphone no yes A11 female 26 51 77 3377 6623 10000 male 19 16 35 5429 4571 10000 So 66 23 ofwomen in the sample say yes but only 45 71 ofmen in the sample say yes Are they statistieany signi cantly different FALL 2001 results Expected counts are below observed elxnts no yes Total Female 4e oi 77 3094 4505 Male 1 16 35 14 06 20 94 Total 45 67 112 ChiSq 0788 0529 1734 1164 4215 FALL 2001 It is large this time cnisouaieo distribution Witn i degree orrieeooin o 3 84 is 05 5 in neie 2 it enisauaiea is in r1212 it 95 in neie is declared large and tne ieseaien adyucate Wins oiii enisauaisa is A 2i5 But our chisquared is 4 215 so the research advocate wins There vim astatistieaiiy signi cant difference in 2001 Change over time Cell phone ownership for sample of STAT 100 students of77 of too 4 ofll7 Spring 2005 A cautionary tale Why 1 degree of freedom Note that two of Rows Sex columns Cellphone No yes All the expected countsare Female 3 114 117 H th 5 479 11221 117nu sma er an Male 6 97 lE3 421 9379 1u3uu Thiscan make All 9 211 22D our resuns somewhat iffy No Yes Women l36 llen lot 26 211 237 The best approach in this case Report the result no signi cant difference but point out the small expected counts of479 and 421 Note that gray box is the ONLY one we can in arbitrarily Once thatbox is lled all others are detemiined by margins How many degrees of freedom here hypothetical 2X3 table Degrees offreedom d0 always equai Number ofrows 71 x Number ofcolumns e 1 Exercise 2 Followthe 4 steps and answer the research question Is there a statistically signi cant difference in calones between small and large sandwiches Data Response Calories Low High Small 5 2 7 Large 2 5 7 7 7 14 Explanatory Siz Solution Expected counts are below let quotMlnrs 10 high Total small s 7 3 50 350 large u 7 350 350 Total 7 7 14 ChiSq 0643 0643 0643 0643 2571 In this case the skenu39c wins and the research advocate loses s annot c1airn that there is are1ationship between size and calories But note small expected counts


