In this edition of Fun With Statistics we’ll tackle a statistical nuance that has bugged me for a long time: percentiles with populations that change over time. This has a direct application to product line aging versus profitability calculations that many companies use for life cycle and discontinuation analyses, where they commonly make errors that cost a lot of dollars. Leave it to our friend Russell Roberts at Cafe Hayek to provide an understandable example of the problem: income percentiles.
I apologize in advance for quoting so much of his post, but that’s the only way to do it justice.
As I have written here before, looking at slices of the population over time is a very misleading indicator of what happens to particular families over time, particularly when family composition is changing. Arnold Kling makes the same point and does it superbly:
In his new book Unequal Democracy, Larry Bartels writes (p.7),
families at the 20th percentile experienced declining real incomes in 20 of the 58 years…by comparison, families at the 95th percentile have experienced only one decline of 3% or more in their real incomes since 1951.
I have a nit to pick, which is that Census department percentiles are not families.
Now he dives into explaining the statistical fallacy.
Suppose that we start out with 20 families, and the 4th-lowest family (the 20th percentile) has an income of $10,000, while the 3rd family has an income of $9500. Next year, suppose that everyone’s family income rises by 2 percent, but we add a new family at the bottom of the income distribution, with an income of $6000. As a result, the new 20th percentile is now somewhere between the income of the original 3rd family (now the 4th family out of 21) and the original 4th family (now the 5th family). The income of the 20th percentile goes down, even though the income of every family has gone up.
Next, consider what happens when you have millions of families, and you add lots of new families each year. Because new families (immigrants and young families) tend to join the income escalator at the bottom, it should be no surprise that the bottom percentile shows declines more frequently than the top percentile.
Also take into account the impact of the flood of immigrants, who primarily move into low-paying agricultural jobs, and the impact that has on the lowest 10% percentile. Would you expect to see upward movement? Roberts then points out one more anomaly that can create radical shifts in income.
Another issue that people raise with Census data is that the basic unit is the household. If a household breaks into two households, due to divorce, average household income plunges by 50 percent, even though nobody’s income has changed. Trends in household income tend to look worse than trends in income per person.
After which he provides considerable statistics on income by percentile, population changes, and the change in the number of households over the past several decades. The bottom line?
So over the last half-century, the number of households has increased at a much faster rate than the number of people, mainly because of divorce. That totally contaminates the comparison of percentiles over time and makes it appear that people are falling behind or standing still when in, fact, particular families are seeing their standard of living rise. Arnold calls a nitpick. I call it a massive structural flaw.
"Massive structural flaw" indeed. But at least it makes for a lot of misinformation, easily shaped by whichever political bias you subscribe to.
But the lesson here can be applied to the business world as well. Think about your individual products and product families, and their contribution margin to the overall bottom line. Presumably newer high-margin products are always entering the picture, creating the inverse scenario to the population/income scenario above. As a product or product family ages, it’s margin goes down, but how about total cash contribution which is driven by volume? Are you analyzing the percentiles correctly?