Averages of Averages
My friend recently graduated from high school. After the ceremony, he was comparing his transcript with a fellow student and notice something strange.
Below are the results he saw (names changed):
Both Bob and Eve struggled in the first semester but Bob came out with a 2 percent lead. In the second semester, they stepped up their game and scored in the 90s. Again, Bob beat Eve. From these results, one would expect Bob to have the higher overall average for the year, no questions asked. Here are the final marks:
What!? He had a higher average than her in both the first and second semester. However, her overall average for the year was higher than his. How could this possibly happen?? I’ll give you a minute to speculate…
Here are the marks for the individual courses Bob and Eve completed:
The bold marks are first semester and the non-bold marks are second semester. As you can see, Eve took fewer courses in the first semester. Hence, her poor initial performance, was not weighted as heavily in her overall average. On the other hand, Bob took equal semesters. Hence, his poor performance had a greater negative influence on his final average. This is called Simpson’s Paradox1. The paradox is just what my friend experienced. He “won” both the first and second semester individually. However, he “lost” in the aggregate; the overall year average.
This paradox teaches a valuable lesson: you cannot blindly average averages. Eve cannot calculate her final mark by averaging 81 and 92.8 to get 86.9. Instead, she must go back to her individual marks and calculate her average from the original data.