Can Social Security Numbers Be Hacked?

Apparently so. And that should give everyone pause, since SSN has become the de facto national identification system. From PNAS:

We demonstrate that it is possible to predict, entirely from public data, narrow ranges of values wherein individual SSNs are likely to fall. Unless mitigating strategies are implemented, the predictability of SSNs exposes them to risks of identify theft on mass scales.

Any third party with internet access and some statistical knowledge can exploit such predictability in 2 steps: first, by analyzing publicly available records in the SSA Death Master File (DMF) to detect statistical patterns in the SSN assignment for individuals whose deaths have been reported to the SSA; thereafter, by interpolating an alive person's state and date of birth with the patterns detected across deceased individuals' SSNs, to predict a range of values likely to include his or her SSN. Birth data, in turn, can be inferred from several offline and online sources, including data brokers, voter registration lists, online white pages, or the profiles that millions of individuals publish on social networking sites (10). Using this method, we identified with a single attempt the first 5 digits for 44% of DMF records of deceased individuals born in the U.S. from 1989 to 2003 and the complete SSNs with <1,000 attempts (making SSNs akin to 3-digit financial PINs) for 8.5% of those records. Extrapolating to the U.S. living population, this would imply the potential identification of millions of SSNs for individuals whose birth data were available. Such findings highlight the hidden privacy costs of widespread information dissemination and the complex interactions among multiple data sources in modern information economies, underscoring the role of public records as breeder documents of more sensitive data.

What's worse is that many information gathering entities don't require perfect matches to account for typing mistakes:

In practical applications, SSNs are often used as authenticators in inquiries processed by credit reporting agencies (CRAs). Because consumer credit reports contain errors and inconsistencies, CRAs are known to accept as valid even inquiries where just 7 of 9 SSN digits are actually correct. This implies that, for some practical purposes, the prediction accuracies we reported may be conservative by 2 orders of magnitude: With just 10 or fewer attempts per target, the inquiries associated with 9.2% of all SSNs issued after 1988 could be accepted as valid by CRAs and 29.1% of those issued in the 25 states with fewer births.

This is disturbing.

Cited article: Acquisti, A. & R. Gross. 2009. Predicting Social Security numbers from public data. PNAS 106: 10975-10980. doi: 10.1073/pnas.0904891106

More like this

I can’t help but contrast last week’s release by the Bureau of Labor Statistics (BLS) of workplace fatality data,with the reports issued this week by community groups to commemorate International Workers’ Memorial Day (WMD). BLS gave us the sterile number: 4,585. That’s the government’s official,…
Kai Wang is a postdoctoral fellow at the Center for Applied Genomics, Children's Hospital of Philadelphia and an author on numerous genome-wide association studies. He left this lengthy comment as a response to my recent post on this comment by McClellan and King in Cell, and I felt it warranted…
Four of the seven PLoS journals published today. Let's take a look. As always, you should rate the articles, post notes and comments and send trackbacks when you blog about the papers. You can now also easily place articles on various social services (CiteULike, Mendeley, Connotea, Stumbleupon,…
Recent advances in functional neuroimaging have enabled researchers to predict perceptual experiences with a high degree of accuracy. For example, it is possible to determine whether a subject is looking at a face or some other category of visual stimulus, such as a house. This is possible because…

The only surprise is that it's taken this long.

Prior to the late 1980s, parents did not need to provide social security numbers for children they were claiming as dependents. Then the law changed, because (not surprisingly) too many people had been claiming dependents that didn't exist. So while a substantial fraction of people over 25 got SSNs in states other than their birth state (I'm in this category; we moved several times between when I was born and when, around junior high, I finally got an SSN), almost everybody under 21 who is not an immigrant has an SSN issued in their state of birth. It's long been known that you can identify the state of issue of an SSN from the first three digits, and the remaining digits are assigned by a publicly known algorithm. So it's no surprise that somebody could reverse engineer the SSN of somebody born after the IRS started requiring dependents' SSNs merely by knowing that person's date and place of birth.

By Eric Lund (not verified) on 09 Jul 2009 #permalink

Incidentally, the SSN, as you say, is the de factor standard in national identification.

Only...

My father was born in 1947. On his social security card (which is the original, and that's kind of surprising now that I think of it, because it's not even laminated. My boating license is a solid piece of plastic issued in 1992 and it's almost dissolved.) , it says, directly below the number, "It is unlawful to use this number for the purposes of identification"

The trouble isn't that social security numbers are guessable; but that they are (simultaneously) treated like OMG Super Secret passwords and demanded by virtually everybody, for virtually any purpose. We ought to be working from the assumption that SSNs are public, not treating them like passwords. Fat chance, of course.

(In that vein, the notion of "identity theft" really annoys me. Calling "bank fraud", caused by the bank's horribly weak verification mechanisms, "identity theft" is really just a way to pin responsibility on you rather than them.)

Everybody who wants to work with social security numbers should have to read up on some very basic security principles. One of them is that a social security number is an identifier not a secret. The two are very different things used for different purposes.

Of course it's easy to guess SSNs. They weren't designed to be secret. They were designed to be informative. It's like being surprised that the serial number on your computer encodes information the manufacturer can use to trace it back to the time and date of manufacture.

By Troublesome Frog (not verified) on 10 Jul 2009 #permalink