Researchers at Microsoft and the University of Washington (UW) said they’ve broken a world record by storing 200 MB of data on synthetic DNA strands. The most impressive part about reaching the milestone isn’t just how much data they were able to encode onto synthetic DNA and then decode, but how much space they were able to store it in.
Movies, images, emails, and other digital data from more than 600 smartphones can be stored in the faint pink smear of DNA at the end of this test tube. Image source: University of Washington.
Since DNA makes up every cell of your body while housing an unfathomable amount of information, harnessing such capabilities for the next generation of digital data storage has been a hot topic of study. Although we’re getting better at shrinking the physical size of data storage devices while simultaneously increasing storage capacity, much more data is produced each year than our current technology can keep up with. It’s predicted that the world’s total data storage will reach 44 trillion GB by 2020.
Because the best of our current range of devices are relatively short-term solutions to the problem, such as hard drives and optical storage, scientists are increasingly looking to nature’s hard drive, DNA, as a potential solution to both the capacity and longevity issues. DNA is a dense storage medium, potentially squeezing in 5.5 petabits (125,000 GB) of information per cubic millimeter. According to University of Washington professor, Luis Ceze, all 700 exabytes of today’s accessible Internet would fit into a space the size of a shoebox. You could then hide that shoebox away in a vault for thousands of years, and the DNA-stored data would remain intact.
The Microsoft and UW team were able to store many documents, including the Universal Declaration of Human Rights in over 100 languages, the top 100 books of Project Gutenberg, the Crop Trust’s seed database, and an HD music video. Totaling to 200 MB, the data took up less physical space than a pencil point.
UW associate professor Luis Ceze and research scientist Lee Organick prepare DNA containing digital data for sequencing. Image source: University of Washington.
By collaborating with biotechnology company Twist Bioscience, the team was able to encode the data onto the DNA strands by taking advantage of the similarities between DNA’s natural code and the binary language of computer code. “Interestingly, DNA already has a digital 'flavor,' as it has four bases and molecules that 'stick' to each other in a very programmable way,” said Ceze. “So the first step in storing digital data into DNA is to map strings of 1s and 0s into strings of As, Cs, Gs, and Ts.”
Using Polymerase Chain Reaction techniques, the team assigned addresses to the sequences to help them find the desired data later. Then, the DNA sequences are chemically manufactured, using a silicon-based DNA synthesis substrate that’s able to make several sequences simultaneously. Once complete, the DNA is put into a test tube and dehydrated, where it can potentially remain for thousands of years, as long as it’s kept away from light and heat.
Reading the data requires a DNA sequencer, which reads the sequence of As, Cs, Gs, and Ts, and algorithms which translate that information back into the original digital data. Because some of that data can be lost in translation, the researchers applied error correction schemes used in computer memory to bypass that hurdle.
According to Ceze, there are still many challenges in making DNA storage mainstream. “We will continue to focus on developing an end-to-end system and work with our Microsoft and Twist Bioscience collaborators to reduce the cost and increase the speed of writing and reading DNA,” he said.
Sources: Microsoft, University of Washington
Learn more about Electronic Products Magazine