Let’s take an example of a 2×2×2 table below with result as response variable (Y), opponent’s region as explanatory variable (X) and opponent’s ranking as control variable (Z).
Table 1: Game result of a team by opponent’s region and opponent’s ranking
ls6k1hh2gdnyxxvi7hvs.png)
Given the condition that the opponent’s ranking is high, win percentage of a team against non-European opponent is higher as compared to win percentage against European team. Similarly, given the condition that the opponent’s ranking is low, win percentage of a team against non-European opponent is higher as compared to win percentage against European team. However when we look at the marginal table i.e. “Total” row in the above table, win percentage of a team against European opponent is higher as compared to win percentage against non-European team.
The above phenomenon when the marginal association can have different direction as compared to each conditional association is known as Simpson’s paradox. The above example is Simpson’s paradox in categorical data; it can also happen in continuous and time-to-event data. Simpson paradox occurs due to:
i) An ignored confounding variable that can have a strong effect on the response variable. For example: other variables such as home ground/away ground, presence/absence of star player etc. might affect the result.
ii) An uneven distribution of the confounding variable among the groups that are compared.
For example: team has played against 9 high ranked European opponents, 7 low ranked European opponents, 32 high ranked non-European opponents and only 6 low ranked non-European opponents.
Reference:
• Agresti A. An introduction to categorical data analysis. 2nd ed.: A John Wiley & Sons, Inc., Publication; 2007. p. 51-52. • Bokai WANG, Pan WU, Brian KWAN, Xin M. TU, Changyong FENG. Simpson’s Pradox: Examples. Shanghai Arch Psychiatry. 2018 Apr 25; 30(2): 139–143. • Suzanne Ameringer, Ronald C. Serlin, Sandra Ward. Simpson’s Paradox and Experimental Research. Nurs Res. 2009 Mar–Apr; 58(2): 123–127.