The Thing About "Raw Data"
Many direct-to-consumer DNA testing services let you download your raw data for free. Seems like a pretty great deal! You now have all your DNA at your fingertips. Sort of. Before you start hunting for hidden secrets within your genome, make sure you understand what raw data is, and what it isn’t.
What is Raw Data?
Raw data is the information is spit out by the machine that was used to do your testing. Every genetic test result starts with raw data, which is then interpreted to complete your final results. This interpretation is usually done by a combination of software and trained professionals to help determine what the results actually mean.
Raw Data is Not Your Whole Genome
Each genetic test is designed to look at specific parts of a person’s DNA. Just because you put all of your DNA into the machine, doesn’t mean that the machine is reading all of it. Some tests look at only a set of known mutations that cause a single condition, while others may read entire lengths of many genes. Most direct-to-consumer tests are SNP based tests. SNPs (pronounced “snips”), or Single Nucleotide Polymorphisms, are single letter changes at different spots across the genome. They aren’t bad changes, just spots in the DNA that we notice are often different between different people. Humans have about 3.2 billion letters of DNA, so even if the test looks at a million SNPs, you’re still only looking at less than 0.03% of your entire genome. You’ll never be able to read your whole genome, or even just a single gene, from a list of SNPs.
Raw Data is Not Validated
Raw data is messy, and no machine is accurate 100% of the time. That’s why clinical laboratories spend a lot of time and resources validating their test results. They take samples that are known to have certain results, and run them on their own machine to make sure they get those results too. They do this many times over (sometimes hundreds of times) to be sure their results are accurate. And if anything about the test changes— they switch brands of chemicals; they move the machine to the other side of the room; they start testing blood collected in a different type of tube— then they have to validate all over again!
In this validation process, the lab also figures out where the the limitations of their tests lie. They determine how much of their data is real and how much is artifact or noise. If they get a result they aren’t confident in, they repeat it, or check the sample using a different method. They use software and experts to put those results into context and ensure they are meaningful. The final test result that the lab releases looks very different than the original data that came out of the machine.
Raw Data Can Be Inaccurate
In a genetic lab, a lot of work happens between obtaining raw data and turning it into a test report that a laboratory can stand behind. Part of that process involves excluding inaccuracies. Looking at your own raw data would be a bit like reading the first draft of a novel, long before it’s been edited and ready to publish. One study found that as many as 40% of results in the raw data of direct-to-consumer tests are false-positives. This goes to show how much work and expertise it takes to create an accurate genetic test result.
While downloading your raw data from a genetic testing service may seem like gaining access to a goldmine of health information, remember that this data hasn’t been carefully combed through by experts who know the ins and outs of that particular test. Anything you find should be discussed with a medical professional and re-tested in a clinical lab before you act on it.