Pennsylvania Rabbit Hole Part VIII


Just When You Think
It Can’t Get Worse—More Anomalies

As we continued to explore the data, we found more anomalies. For example, looking at groups of people who shared an address in the November 7 data set, we noticed that sometimes two female occupants shared a date of birth and a first name, but had different last names, and different voter IDs.

It seemed as if these voters changed their last names due to marriage but were added as new voters rather than having their existing records updated. But when we checked the other data sets, many of these apparent duplicates no longer existed.

Same households, Same birthdays

We went further with our exploration of people with the same birth-date sharing a household. In a large population, there will be some legitimate examples of this. For example, adult twins rooming together. And it’s not unheard of for couples to share the same birth-date.

However, in our November 7 data we found over 20,000 such households. But it gets worse.

> In over 2500 of these, the two occupants with the same birth date also had identical names.

> In over 10,000 more of these households, the “birthday twins” had different last names, making it less likely that they were young adult twins living at home.

This made us curious about how many duplicates in the November data lived at different addresses.

Anomalies: what are the odds?

Of course, it’s relatively normal for people in a very large population to share a name and birth date, even a first name, last name, and birth date—especially in the case of more common names. But we wondered whether we would find more of these “name and birthday” twins in the November data than in other data sets.

To do this, we created more one-dimensional arrays from each of our data sets. These arrays are lists of first name + DOB, last name + DOB, and first name + last name + DOB for each voter, for each set of registration data.

We then counted all the unique values in each list and subtracted that number from the total.

As an illustration, let’s say you have a group of 100 people, and in the group there are two people named Jennifer who were born September 1, 1960. In that group there would be 99 unique first name/ birth-date combos. Subtracting 99 from 100 shows us that one person in the set has the same first name and birth-date as another person.

Digging Deeper

In a very large set of people (8.75 million in the case of the November 7 set of registered voters), you would expect to find many people who shared a birth-date and name. But we wanted to know whether there were more voters with a shared birth-date and name in the November 7 data.

And indeed, there were. Of particular interest are those who share a first name, last name, and date of birth. In the latest data snapshot, from July 31 of this year, there were just under 7,500 voters with identical name and birthday combinations. In the November data there were over 22,500—more than double of July’s total. While some of the names on the “duplicates” list are common, many are not.

Also striking was the huge increase in first name and birth-date combinations. Nearly 175,000 more voters in the November data set shared a first name and date of birth than in the July 31, 2017 data set.

Again, the data appeared to defy common sense. And we weren’t done looking.

