15 Days of Economics Day 5: What’s in a name

Cyrus Singer
2 min readJun 16, 2020

There have been numerous reports about the effect of someone’s name on their life. This includes one’s chances of getting highered for a job, the chance of incarceration, and many other effects. It is intriguing and interesting to read about these findings.

Many of these findings are well-founded and reliable. For example, the effect of an ethnic name on resume evaluation. This was shown statistically to be linked to the rates of the implicit association on the basis of ethnicity. Another example is in relation to incarceration. Some ethnicities have different naming patterns as well as different incarceration rates. Here the name is not the direct cause but an effect of ethnicity which is related to incarceration rates. Findings like these are important to know and understand in order to progress to a more equal society.

However, there are many statistical relationships that seem peculiar such as boys with girl’s names being more likely to be suspended. This may make it seem like a person’s name has a more pronounced effect than it actually does. This is most likely an effect of flawed statistics.

As technology has progressed there is a huge amount of data collected on many individuals. This has enabled more powerful and automatic searching for statistical relationships. In the case of names birth registries have been digitized recently along with certain criminal information. Furthermore, people’s lifestyles and occupation information have become more available through platforms such as Linkedin and Facebook. This means that if researchers want to look for the effect of names amongst large datasets it is now much easier.

This means that the data used for finding statistical relationships is high dimensionality data. This is when there is a large amount of different information for each person. While this may seem great at first, it can lead to a statistical model overfitting to the data. As you add more data, you will find more statistical relationships and some of these accidental relationships will be strong. This is not because there is some hidden causal relationship, this is a mathematical flaw.

David Leinweber, a fintech researcher, made this abundantly clear by tasking computers to find relationships in data and producing a very strong statistical relationship between the S&P 500 index price and butter production in Bangladesh. This relationship remained highly correlated from 1981 to 1993. Just to be clear, there is no causal relationship(hidden or not) between the S&P 500 and butter production in Bangladesh. This an effect of dealing with high dimensionality data.

If this were a real relationship we would see resultant correlations between other things that Bangladeshi butter production is correlated such as gross butter sales, also correlated with the S&P. This does not happen for butter but it does happen for ethnicity. Large data sets often lead to false relationships but purpose-designed studies seldom do.

References

--

--