# Averages of Averages

Averages of Averages

My friend recently graduated from high school. After the ceremony, he was comparing his transcript with a fellow student and notice something strange.

Below are the results he saw (names changed):

 Bob Eve Sem 1 83 81 Sem 2 93 92.8

Both Bob and Eve struggled in the first semester but Bob came out with a 2 percent lead. In the second semester, they stepped up their game and scored in the 90s. Again, Bob beat Eve. From these results, one would expect Bob to have the higher overall average for the year, no questions asked. Here are the final marks:

 Bob Eve Final Average 88 89.8

What!? He had a higher average than her in both the first and second semester. However, her overall average for the year was higher than his. How could this possibly happen?? I’ll give you a minute to speculate…

Here are the marks for the individual courses Bob and Eve completed:

 Bob Eve English 83 80 Biology 85 83 History 81 80 Chemistry 80 99 Physics 86 94 Math 83 99 Band 100 90 Gym 93 93 Art 91 91 IT 94 81 Programming 95 95 Design 85 93

The bold marks are first semester and the non-bold marks are second semester. As you can see, Eve took fewer courses in the first semester. Hence, her poor initial performance, was not weighted as heavily in her overall average. On the other hand, Bob took equal semesters. Hence, his poor performance had a greater negative influence on his final average. This is called Simpson’s Paradox1. The paradox is just what my friend experienced. He “won” both the first and second semester individually. However, he “lost” in the aggregate; the overall year average.

This paradox teaches a valuable lesson: you cannot blindly average averages. Eve cannot calculate her final mark by averaging 81 and 92.8 to get 86.9. Instead, she must go back to her individual marks and calculate her average from the original data.