There's many a slip 'twixt spit and SNP: errors in personal genomics data

i-fd59a31aa6c4172c74be16f9e4e38b4a-train-wreck.jpgPeter Aldhous has a great piece of detective work in New Scientist, which has revealed a bizarre and sporadic glitch in the online software provided by personal genomics company deCODEme to allow customers to view their genetic data.

The glitch appears to be restricted to the display of data from the mitochondrial genome (a piece of DNA with a special fascination for genetic genealogists, since it is inherited almost exclusively along the maternal line). On several separate occasions the deCODEme browser presented Aldhous with a mitochondrial profile that was spectacularly wrong, differing from the profile in his raw data at 44 out of 93 positions.
Aldhous was kind enough to email me the raw data and some screenshots to illustrate the problem. It's clear that the error wasn't the result of Aldhous being presented with someone else's data: the profile is unlike any ever seen in a human being (genetic genealogist and blogger Blaine Bettinger is quoted in Aldhous' article asking whether it was certain that the sequence was from Homo sapiens). Nor is it the result of inaccuracies in the raw data - Aldhous' profile from deCODEme competitor 23andMe agreed with his raw deCODEme data at every site called by both companies.
Instead, it appears as though some problem in the code that translates a customer's raw data into the viewable format of the browser was doing something very strange: calling Aldhous' genotype at each mitochondrial position seemingly at random (44 errors out of 93 sites is compatible with pure chance). Even more bizarrely, whenever Aldhous saw the incorrect profile, it was always the same incorrect profile - on other occasions the browser presented his genotypes completely accurately.
Aldhous says in his article that deCODE is "still investigating" the source of the bug, but I understand that following the publication of his article the company's programmers have tracked down the source of the error and corrected it.
Lessons for personal genomics customers
Now, it's important to emphasise that this error is actually pretty benign: it's unlikely that it would ever have even been spotted by most customers, and Aldhous goes to great pains to emphasise that it didn't affect the risk profiles generated by deCODEme for various common diseases. It's also worth keeping in mind that the genotyping methods used by personal genomics companies are generally extremely accurate: comparisons between data on the same person generated by 23andMe and deCODEme, for instance, typically show discrepancies at fewer than one in 10,000 sites.
However, this incident serves as a canary in the personal genomics coal-mine - a warning of the challenges that lie ahead for companies in ensuring that massive, complex genetic data-sets are presented accurately to consumers.
It's also a useful reminder to personal genomics consumers to not take their results for granted. The process between spitting into a cup and viewing your genetic results online involves multiple steps where things can go wrong, ranging from errors in sample tracking (the most pernicious and difficult to correct), through genotyping problems (usually much easier to spot), to errors in data analysis and display. 
In general the odds of a given genetic data-point being wrong are very low, but they're sufficiently far above zero to warrant caution in making too much out of any single result - mind you, given the extremely small effect sizes of most of the variants currently assayed by personal genomics companies, that's good advice anyway. Certainly it would be a good idea for customers to seek independent validation of any result if they intend to use it to guide serious health or lifestyle decisions.
But the most important piece of advice for personal genomics customers is to engage with your data. Aldhous only detected these anomalies because he was exploring his own genetic data in multiple ways, cross-checking it against both other data and his own (informed) expectations, and was persistent enough to follow up on the strange results he found. 
That's a good example for other personal genomics customers to follow: rather than being a passive recipient of genetic forecasts, dig into your data and see if it makes sense, and keep asking questions until it does. In addition to making it more likely that you'll pick up any errors in your results, you'll also develop a much deeper understanding both of the nature of genetics and of your own genome.

More like this

Software company 5AM Solutions has just launched a neat little FireFox plug-in for customers of consumer genomics company 23andMe.  The idea is very simple: Download your raw data from 23andMe (or use one of the files from me or my colleagues at Genomes Unzipped); Install the plug-in from here…
Update 30/11/10: 23andMe has extended their 80% discount until Christmas, without a need for a discount code. Personal genomics company 23andMe has made some fairly major announcements this week: a brand new chip, a new product strategy (including a monthly subscription fee), and yet another…
Welcome to the new look Genetic Future, now hosted on ScienceBlogs. In around five years, a complete genome sequence will be readily affordable for most citizens of wealthy industrialised nations - even those of us on a researcher's salary. At the same time we will have access to vast amounts of…
Blaine Bettinger has an absolutely wonderful post where he compares his results for type 2 diabetes from 23andMe and DeCODEme. I really liked his post and I appreciated the way he showed the data from the two companies and elaborated on their interpretation of his genotype and his risk.…

Canary? Forme Fruste? As more customers come online, more problems will happen. We need complete transparency here and less manipulative marketing.....

-Steve

That's an amazing story, and the situation is going to get far more complex as gene-gene; gene-enviroment; gene-env-gene; etc; interactions are factored into algorithms which feed other algorithms, and then others, to come up with an interpretation of a persons gene and biodata. Tiny bugs in the algorithms or software could cause chaos. Have a look at: Penders et al, A question of style: method, integrity and the meaning of proper science.Endeavour. 2009 Aug 6. PMID: 19665231

Errors in displaying data visualization? This isn't that big a deal. They found the bug and fixed it.

By anomalous (not verified) on 27 Aug 2009 #permalink

Every complex piece of software will contain bugs, because it's written by people and people make mistakes. Just like doctors can make mistakes. As long as these mistakes are admitted and acted upon to prevent these in the future. It's not only the DTC market that needs more transparency, but so does the whole medical world in this respect.