How we can store digital data in DNA | Dina Zielinski
TED Talks Daily
0:00
0:00

Full episode transcript -

0:0

this Ted Talk features bioinformatics scientist Dina Zielinski recorded live at Ted X Vienna 2017 Hey, parents, have you been looking for ways to engage your kids with entertaining and educational content? Or maybe you're just trying to limit their screen time. Check out Pinna. It's the only audio streaming service for kids 33 12 with original podcasts, music and audio books, all without ads. Pinna is a Parent's Choice Foundation Gold Award winner and Kids Safe certified. Visit pinna dot f m slash tet To start your 30 day free trial today, that's P I and a dot f m slash ted.

0:49

I could fit all movies ever made inside of this tube. If you can't see it, that's kind of the point. Before we understand how this is possible, it's important to understand the value of this feud. All of our thoughts and actions these days, through photos and videos, even our fitness activities are stored as digital data. Aside from running out of space on our phones, we rarely think about our digital footprint. But humanity has collectively generated more data in the last few years than olive proceeding human history. Big data has become a big problem. Digital storage is really expensive, and none of these devices that we have really stand the test of time. There's this nonprofit website called the Internet Archive. In addition to free books and movies, you can access Web pages as far back as 1996.

Now this is very tempting, but I decided to go back and look at the Ted websites. Very humble beginnings. As you can see, it's changed quite a bit in the last 30 years. So this led me to the first ever Ted back in 1984. And it just so happened to be a Sony executive explaining how a compact disc works now. It's really incredible to be able to go back in time and access this moment. It's also really fascinating that after 30 years after that first Ted, we're still talking about digital storage. Now, if we look back another 30 years, IBM released the first ever hard drive back in 1956. Here it is being loaded for shipping in front of a small audience. It held the equivalent of one MP three song and wait over one ton at $10,000 a megabyte I don't think anyone in this room would be interested in buying this thing except maybe as a collector's item. But it's the best that we could do at the time.

We've come such a long way in. Data storage devices have evolved dramatically, but all media eventually wear out or become obsolete. If someone handed you a floppy drive today to back up your presentation, you'd probably look at them kind of strange. Maybe laugh, but you'd have no way to use the damn thing. These devices can no longer meet our storage needs, although some of them can be repurposed. All technology eventually dies or is lost, along with our data, all of our memories. There's this illusion that the story's problem has been solved, but really, we all just externalize it. We don't worry about storing our e mails and our photos.

They're just in the cloud. But behind the scenes storage is problematic. After all, the cloud is just a lot of hard drives now. Most digital data, we could argue, is not really critical. Surely we could just delete it, but how can we really know what's important today? We've learned so much about human history from drawings and writings and caves from stone tablets. We've deciphered languages from the Rosetta Stone. You know, we'll never really have the whole story, though our data is our story, even more so. Today we won't have our record recorded on stone tablets,

but we don't have to choose what is important now. There's a way to store it all. It turns out that there's a solution that's been around for a few 1,000,000,000 years, and it's actually in this tube. DNA is nature's oldest storage device. After all, it contains all the information necessary to build and maintain a human being. But what makes DNA so great? Well, let's take our own genome as an example. If we were to print out all three billion A's tease season G's on a you know, standard font standard format, and then we were to stack all of those papers. It would be about 100 and 30 meters high, somewhere between the Statue of Liberty and the Washington Monument. Now,

if we converted all those A's C's and G's to digital data to zeros and ones, it would total a few gigs, and that's in each cell of our body. We have more than 30 trillion cells. You get the idea DNA can store a ton of information in a ministerial space. DNA is also very durable, and it doesn't even require electricity to store it. We know this because scientists have recovered DNA from ancient humans that lived hundreds of thousands of years ago. One of those is Otzi the Iceman. It turns out, his Austrian. He was found high, well preserved in the mountains between Italy and Austria. And it turns out that he has living genetic relatives here in Austria today. So you one of you could be a cousin of it. See,

the point is that we have a better chance of recovering information from an ancient human than we do from an old phone. It's also much less likely that we'll lose the ability to read DNA than any single man made device. Every single new storage format requires a new way to read it. We'll always be able to read DNA if we can no longer sequence. We have bigger problems than worrying about data storage. Storing data on DNA is not new. Nature has been doing it for several 1,000,000,000 years in fact, every living thing is a DNA storage device. But how do we store data on DNA? This is photo 51. It's the first ever photo of DNA taken about 60 years ago. This is around the time that that same hard drive was released by IBM. So, really, our understanding of digital storage and of DNA have co evolved. We first learned to sequence or read DNA and very soon after,

how to write it or synthesize it. This is much like how we learn a new language, and now we have the ability to read, write and copied DNA. We do didn't lab all the time. So anything, really anything that can be stored as zeros and ones can be stored in DNA to store something digitally like this photo. We convert it to bits or binary digits. Each pixel in a black and white photo is simply a zero Earl one, and we can write DNA much like an inkjet printer can print letters on a page. We just have to convert our data all of those zeros and ones two A's T's, C's and G's, and then we send this to a synthesis company so we write it, weaken, store it, and when we want to recover our data,

we just sequence it. Now the fun part of all of this is deciding what files to include. So we're serious scientists, so we had to include a manuscript for good posterity. We also included a $50 Amazon gift card. Don't get too excited. It's already been spent. Someone decoded it as well as an operating system, one of the first movies ever made and a pioneer plaque. Some of you might have seen this. It has a depiction of a typical, apparently male and female and our approximate location in the solar system in case the pioneer spacecraft ever encounters extraterrestrials. So once we decided what sort of files we want in code, we package up the data, convert those years and one's two A's T's, C's and G's,

and then we just send this file off to a synthesis company, and this is what we got back. Our files were in this tube all we had to do with Stevens it This all sounds pretty straightforward, but the difference between a really cool fun idea and something we can actually use is overcoming these practical challenges. Now we'll DNA is more robust than any manmade device. It's not perfect. It does have some weaknesses. So we recover our message by sequencing the DNA, and every time data is retrieved, we lose the DNA. That's just part of the sequencing process. We don't want to run out of data, but luckily there's a way to copy the DNA that's even cheaper and easier than seek. Then synthesizing it. We actually tested a way to make 200 trillion copies of our files, and we recovered all the data without her.

So sequencing also introduces errors into our DNA into the A's T's, C's and G's. Nature has a way to deal with this in ourselves, but our DNA is stored. You know, our data is stored in synthetic DNA in a tube, so we had to find our own way. To overcome this problem, we decided to use an algorithm that was used to stream videos. When you're streaming of video, you're essentially trying to recover the original video. The original file when we're trying to recover our original files were simply sequencing. But really, both of these processes are about recovering enough zeros and ones to put our data back together. And so because of our coding strategy, we were able to package up all of our data in a way that allowed us to make millions and trillions of copies and still always recover all of our files back.

This is the movie we encoded. It's one of the first movies ever made and now the first to be copied more than 200 trillion times on DNA. Soon after our work was published, we participated in and ask me anything on the website Reddit. If your fellow nerd you're very familiar with this website, most questions were thoughtful, somewhere comical. For example, one user wanted to know where we would have a literal thumb drive. No, The thing is, our DNA already stores everything needed to make us who we are. It's a lot safer to store data on DNA in a in synthetic DNA in a tube. Writing and reading data from DNA is obviously a lot more time consuming than just saving all your files on a hard drive for no. So initially we should focus on long term storage. Most data are ephemeral.

It's really hard to grasp what's important today or what will be important for future generations. But the point is, we don't have to decide today. There's this great program by UNESCO called The Memory of the World Program. It's been created thio preserve historical materials that are considered a value to all of humanity. Items were nominated to be added to the collection, including that film that we encoded well, a wonderful way to preserve human heritage. It doesn't have to be a choice. Instead of asking the current generation us what might be important in the future, we could store everything in DNA. Storage is not just about how many bites, but how well we can actually store the data and recover it. There's always been this tension between how much data we can generate and how much we can recover and how much weaken store every advance in writing data has required a new way to read it. We can no longer read old media. How many of you even have a dis drive in your laptop?

Never mind a floppy drive. This will never be the case with DNA. As long as we're around, DNA is around and we'll find a way to sequence it. Archiving the world around us is part of human nature. This is the progress we've made in digital storage in 60 years, at a time when we were only beginning to understand DNA. Yet we've made similar progress in half that time with DNA sequencers. And as long as we're around, DNA will never be obsolete. Thank you

13:27

for more Ted talks. Go to ted dot com p r X.

powered by SmashNotes