What are the chances that…
A while back, I had a story mentioning a Cornell law professor named Michael C. Dorf. Part of the discussion revolved around correctly identifying the person from his name. I mean maybe there is more than one Michael Dorf, but more than one Michael C. Dorf? And even if there were multiple Michael C. Dorfs, surely there wouldn’t be two attorneys with that name. And in the hugely unlikely event that two attorneys share that name, it’s unthinkable that they could be both linked to Barack Obama.
Only, there are two of them. One is the Cornell law professor that wrote a paper on presidential eligibility, and the other is a Chicago attorney who actually represented Barack Obama.
I wrote the preceding as if this were an amazing coincidence, but I don’t think it is all that amazing. I mean Dorf is an unusual surname: it ranks number 35,938 in the 1990 census (US Census tabulation). Michael, however, is quite common and C is a common initial. There are lots of attorneys too, 1,225,452 according to the American Bar Association. What perhaps does make this instance really unusual is the connection to Obama, but even that connection is tenuous. The Cornell professor really isn’t connected to Obama except that he wrote an article about presidential eligibility, specifically the possibility of a president achieving a third term by being elected vice president after having served as President. The Chicago attorney’s association is more direct, but back in the past, when Obama was a state senator.
There are three errors of thinking we make in spotting remarkable coincidences (or are they?). The first is to fail to realize that when we talk about the population of the United States, some 300 million people, that a lot of infrequent coincidences are statistically likely. I remember doing quality assurance on a large statewide database, checking for duplicates, and being struck by the number of people born on the same day with the same name, and this wasn’t even a large state.
The second error is to fail to consider how encompassing the criteria are, and whether the criteria are being manipulated to include a coincidence. In the Dorf example, the category of connection to Obama was expanded, and if that hadn’t worked out, perhaps the criteria would have been “lawyers from Illinois” or “Democrats” or “went to the same law school” or something else. It is one thing to ask “what is the chance that …?” before the fact and quite another to ask “what is the chance that we can find some connection given all the possible connections we could look for?” after the fact.
The third error is to look at any particular unusual event and to assign significance to it. Say that we conclude accurately that we are looking at a one in ten thousand event. But if there are a million people spending hundreds of millions of hours searching for unusual events linked to Barack Obama, chances are that quite a few unlikely (on their own) events will be found.
When a large number of unlikely events is presented in a list, they appear extremely unlikely to have all happened, but such lists are not given in the context of the other list, many orders of magnitude larger, of things that are not unusual at all.
We humans are well-adapted to recognize and assign significance to unusual occurrences. We are not, however, well-adapted to dealing with large numbers and the wealth of information available on the Internet. What looks unusual may not be.