Apparently so. And that should give everyone pause, since SSN has become the de facto national identification system. From PNAS:
We demonstrate that it is possible to predict, entirely from public data, narrow ranges of values wherein individual SSNs are likely to fall. Unless mitigating strategies are implemented, the predictability of SSNs exposes them to risks of identify theft on mass scales.Any third party with internet access and some statistical knowledge can exploit such predictability in 2 steps: first, by analyzing publicly available records in the SSA Death Master File (DMF) to detect statistical patterns in the SSN assignment for individuals whose deaths have been reported to the SSA; thereafter, by interpolating an alive person's state and date of birth with the patterns detected across deceased individuals' SSNs, to predict a range of values likely to include his or her SSN. Birth data, in turn, can be inferred from several offline and online sources, including data brokers, voter registration lists, online white pages, or the profiles that millions of individuals publish on social networking sites (10). Using this method, we identified with a single attempt the first 5 digits for 44% of DMF records of deceased individuals born in the U.S. from 1989 to 2003 and the complete SSNs with <1,000 attempts (making SSNs akin to 3-digit financial PINs) for 8.5% of those records. Extrapolating to the U.S. living population, this would imply the potential identification of millions of SSNs for individuals whose birth data were available. Such findings highlight the hidden privacy costs of widespread information dissemination and the complex interactions among multiple data sources in modern information economies, underscoring the role of public records as breeder documents of more sensitive data.
What's worse is that many information gathering entities don't require perfect matches to account for typing mistakes:
In practical applications, SSNs are often used as authenticators in inquiries processed by credit reporting agencies (CRAs). Because consumer credit reports contain errors and inconsistencies, CRAs are known to accept as valid even inquiries where just 7 of 9 SSN digits are actually correct. This implies that, for some practical purposes, the prediction accuracies we reported may be conservative by 2 orders of magnitude: With just 10 or fewer attempts per target, the inquiries associated with 9.2% of all SSNs issued after 1988 could be accepted as valid by CRAs and 29.1% of those issued in the 25 states with fewer births.
This is disturbing.
Cited article: Acquisti, A. & R. Gross. 2009. Predicting Social Security numbers from public data. PNAS 106: 10975-10980. doi: 10.1073/pnas.0904891106
- Log in to post comments
The only surprise is that it's taken this long.
Prior to the late 1980s, parents did not need to provide social security numbers for children they were claiming as dependents. Then the law changed, because (not surprisingly) too many people had been claiming dependents that didn't exist. So while a substantial fraction of people over 25 got SSNs in states other than their birth state (I'm in this category; we moved several times between when I was born and when, around junior high, I finally got an SSN), almost everybody under 21 who is not an immigrant has an SSN issued in their state of birth. It's long been known that you can identify the state of issue of an SSN from the first three digits, and the remaining digits are assigned by a publicly known algorithm. So it's no surprise that somebody could reverse engineer the SSN of somebody born after the IRS started requiring dependents' SSNs merely by knowing that person's date and place of birth.
Incidentally, the SSN, as you say, is the de factor standard in national identification.
Only...
My father was born in 1947. On his social security card (which is the original, and that's kind of surprising now that I think of it, because it's not even laminated. My boating license is a solid piece of plastic issued in 1992 and it's almost dissolved.) , it says, directly below the number, "It is unlawful to use this number for the purposes of identification"
erotikshop
The trouble isn't that social security numbers are guessable; but that they are (simultaneously) treated like OMG Super Secret passwords and demanded by virtually everybody, for virtually any purpose. We ought to be working from the assumption that SSNs are public, not treating them like passwords. Fat chance, of course.
(In that vein, the notion of "identity theft" really annoys me. Calling "bank fraud", caused by the bank's horribly weak verification mechanisms, "identity theft" is really just a way to pin responsibility on you rather than them.)
Everybody who wants to work with social security numbers should have to read up on some very basic security principles. One of them is that a social security number is an identifier not a secret. The two are very different things used for different purposes.
Of course it's easy to guess SSNs. They weren't designed to be secret. They were designed to be informative. It's like being surprised that the serial number on your computer encodes information the manufacturer can use to trace it back to the time and date of manufacture.