Introduction to Probability and Statistics
Introduction to Probability and Statistics MAT 135
Popular in Course
Popular in Mat Mathematics
This 3 page Class Notes was uploaded by Florian Watsica on Thursday October 15, 2015. The Class Notes belongs to MAT 135 at Murray State University taught by Edward Thome in Fall. Since its upload, it has received 14 views. For similar materials see /class/223599/mat-135-murray-state-university in Mat Mathematics at Murray State University.
Reviews for Introduction to Probability and Statistics
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 10/15/15
The Simpson39s Paradox When combined data seems to render impossible results Derek J eter rst appeared in Maj or League Baseball in 1995 playing for the New York Yankees He quickly made a name for himself and by 1996 he had won the Rookie of the Year Award and led the Yankees to their rst World Series Title since 1978 against the defending champions the Atlanta Braves The Yankees won 4 out of the 6 games played even though the Braves outscored the Yankees 26 to 18 Even today his batting average ranks among the highest in the league So what if I told you that another player during the same two years 1995 and 1996 actually had a better batting average than Derek Jeter when their individual seasons were compared Impressive yes but possible Of course David Justice debuted for the Atlanta Braves in 1989 and continued to play for them until an injury in May of 1996 Like Derek Jeter David Justice showed amazing prowess in the sport and in 1995 he hit a crucial homerun in Game 6 of the World Series bringing the Braves from the bottom of the League in 1989 to a World Championship in just 7 years Both players left their mark in the sport winning their respective teams a World Series Championship and both receiving the Rookie of the Year award Both players had spectacular batting averages and their fair share of last minute game changing moments But the question remains during the years 1995 and 1996 who was the better batter After having read thus far you might assume that David Justice having had a better batting average during the two individual years would be the clear and easy answer This is where the Simpson s Paradox so kindly would like to tell you good guess but you re out Confused Good you re beginning to grasp the idea To explain this phenomenon we will brie y cover the two player s batting averages for the two years 1995 1996 Combined Derek Jeter 1248 250 183582314195630 310 David Justice 104411 253 45140 1321 149551 270 Figure 11 httpenwikipediaorgzwikiSimpson s paradox So as you can see Derek J eter had a lower batting average during the individual seasons than David Justice but when these ratios are combined J eter manages to have a combined batting average of 310 compared to 270 for Justice So what just happened Looking at the batting averages for 1995 and 1996 would lead most anyone so say that the combined average should still have been higher for David Justice and yet the exact opposite appears to be true Batting averages are calculated by taking the ratio of a batter s safe hits and dividing it by the of cial number of times at bat So all of the data is correct David Justice had better batting averages during both of the individual seasons and Derek Jeter had a better combined batting average than David Justice Everything should be screaming impossible at this point It39s almost like saying that someone with better grades during their freshmen and sophomore year had a lower GPA than a student with worse grades The difference is how we approach combining the ratios and how we make sense of the both partitioned separated and aggregated combined data So what is the importance the Simpson s Paradox It shows the importance of deciding what sets of data we should rely on or refer to when trying to paint an accurate picture This is especially important when ratios for success or failure rates of medical procedures are addressed For example although one treatment process may have a better success rate on paper it may not be a better treatment plan for each of the individual circumstances When actual data is collected and represented there are many more factors than what we can take into account when creating a table with terms as ambiguous as pass or fail In our case we are deciding whether the separated or combined data would more accurately explain which of the two baseball players had a better batting average over the course of 1995 and 1996 For this particular instance it seems best to use the separated data to infer that in fact David Justice was the better batter during those two seasons even though Derek Jeter s combined batting average was higher It is important to note that the situation and not the data should define which ratio would be better to use In today s society mass media can easily disperse awed interpretations of data to the public even though mathematically it appears correct to the untrained eye The most basic explanation of the Simpson s Paradox is that in probability and statistics when improvement or decline takes place among all of the subpopulations of a data set we are left with the opposite result when we combine these subpopulations So if improvements took place across the board the combined result would end up being the opposite Another interesting example of the Simpson s Paradox would be the Civil Rights Act of 1964 One might assume that a higher ratio of Democrats voted in favor of the act than that of Republicans House Democrat Republican N01thern 94 145154 quot85 138162 Southern 7 794 0 010 Both 61 152248 80 138172 Senate Democrat Republican N01thern 98 4546 84 2732 Southern 5 121 0 01 Both 69 4667 82 2733 Figure 12 httpenwikipediaorgwikiSimpson s paradox Once again we can see that individually a higher percentage of Democrats voted in favor of the act when analyzing the comparisons between House and Senate with the subcategories of Northern and Southern members We can also see that when the ratios were combined a reversal appears to have taken place just as before A lower percentage of Republicans voted for the act in both the North and the South and yet when the ratios were combined more Republicans appeared to vote in favor of the act than did Democrats Even if we were to combine the number of House and Senate votes for each group we would still come to a similar and confusing result 165 Republicans out of 205 voted yes with a total ratio of 80 while 198 Democrats out of 315 voted yes leaving them with a lower ratio of only 62 In analyzing this data we can safely say that government party had a weaker correlation to the pattern of voting than location did This aside individually more Democrats in each of the subpopulations did vote for the act Overall the Simpsons Paradox should teach us to be wary of how we interpret data that is presented to us Mass media can easily skew or misrepresent data to t a speci c theory or explanation Looking at the individual variables ratios and nding if the aggregate or partitioned data would better explain the story behind the table are all ways to prevent yourself from become trapped within this mindblowing paradox
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'