The Future of DNA Storage in Revolutionizing Biological Data Management
Article Information
Abstract
Compared with traditional storage media, biological data storage has advanced more rapidly in capacity, diversity, and lifespan, and it also enables continuous data retention. The DNA molecule, as Nature's own archival medium, provides unparalleled density, longevity, and passive durability, making it a compelling foundation for the next generation of "cold" and "deeply cold" archives. Since 2019, progress across the stack—coding for insertion–deletion–missing channels, large-scale random access, enzyme writing, nanopore-native retrieval, and chemically robust expansion—has transformed DNA storage from provocative demonstrations into mature technologies with early end-to-end prototypes. From this perspective, we believe biological data management (BDM) is the first area where DNA storage can have practical impact, including raw sequencing archives, microscopic images, clinical omics data, and compliance-driven retention of de-identified records. We synthesize the latest technology levels, highlight what is actually working in today's labs, distinguish challenges from bottlenecks (particularly write costs/latencies and standardization), and propose a 2025–2030 roadmap with specific milestones in coding, writing, access, and media preservation. Finally, this paper proposes guidelines for integrating DNA archives into biological institutes, biobanks, and hospital systems as a supplementary layer to tape/object storage, and outlines a research agenda covering privacy, chain of custody, and sustainability.
Graphical Abstract
Keywords
Data Availability Statement
Funding
Conflicts of Interest
Ethical Approval and Consent to Participate
References
- Li, X., Wang, B., Lv, H., Yin, Q., Zhang, Q., & Wei, X. (2020). Constraining DNA sequences with a triplet-bases unpaired. IEEE transactions on nanobioscience, 19(2), 299-307.
[CrossRef] [Google Scholar] - Cao, B., Li, X., Wang, B., He, T., Zheng, Y., Zhang, X., & Zhang, Q. (2025). Achieving handle-level random access in an encrypted DNA archival storage system via frequency dictionary mapping coding. Patterns.
[CrossRef] [Google Scholar] - Cao, B., Zheng, Y., Shao, Q., Liu, Z., Xie, L., Zhao, Y., ... & Wei, X. (2024). Efficient data reconstruction: The bottleneck of large-scale application of DNA storage. Cell Reports, 43(4).
[CrossRef] [Google Scholar] - Carmean, D., Ceze, L., Seelig, G., Stewart, K., Strauss, K., & Willsey, M. (2018). DNA data storage and hybrid molecular–electronic computing. Proceedings of the IEEE, 107(1), 63-72.
[CrossRef] [Google Scholar] - Choi, Y., Bae, H. J., Lee, A. C., Choi, H., Lee, D., Ryu, T., ... & Kwon, S. (2020). DNA micro‐disks for the management of DNA‐based data storage with index and write‐once–read‐many (WORM) memory features. Advanced Materials, 32(37), 2001249.
[CrossRef] [Google Scholar] - Church, G. M., Gao, Y., & Kosuri, S. (2012). Next-generation digital information storage in DNA. Science, 337(6102), 1628-1628.
[CrossRef] [Google Scholar] - Erlich, Y., & Zielinski, D. (2017). DNA Fountain enables a robust and efficient storage architecture. Science, 355(6328), 950-954.
[CrossRef] [Google Scholar] - Grass, R. N., Heckel, R., Puddu, M., Paunescu, D., & Stark, W. J. (2015). Robust chemical preservation of digital information on DNA in silica with error‐correcting codes. Angewandte Chemie International Edition, 54(8), 2552-2555.
[CrossRef] [Google Scholar] - Koch, J., Gantenbein, S., Masania, K., Stark, W. J., Erlich, Y., & Grass, R. N. (2020). A DNA-of-things storage architecture to create materials with embedded memory. Nature biotechnology, 38(1), 39-43.
[CrossRef] [Google Scholar] - Bee, C., Chen, Y. J., Queen, M., Ward, D., Liu, X., Organick, L., ... & Ceze, L. (2021). Molecular-level similarity search brings computing to DNA data storage. Nature communications, 12(1), 4764.
[CrossRef] [Google Scholar] - Ma, J., Yang, Y., Pei, B., Mi, S., Xiong, Z., & Ouyang, L. (2025). Primer‐Disk‐Enabled DNA Data Storage System with Index and Record‐Many‐Read‐Many Features. Advanced Science, e02367.
[CrossRef] [Google Scholar] - Organick, L., Ang, S. D., Chen, Y. J., Lopez, R., Yekhanin, S., Makarychev, K., ... & Strauss, K. (2018). Random access in large-scale DNA data storage. Nature biotechnology, 36(3), 242-248.
[CrossRef] [Google Scholar] - Press, W. H., Hawkins, J. A., Jones Jr, S. K., Schaub, J. M., & Finkelstein, I. J. (2020). HEDGES error-correcting code for DNA storage corrects indels and allows sequence constraints. Proceedings of the National Academy of Sciences, 117(31), 18489-18496.
[CrossRef] [Google Scholar] - Rasool, A., Hong, J., Hong, Z., Li, Y., Zou, C., Chen, H., ... & Dai, J. (2024). An Effective DNA‐Based File Storage System for Practical Archiving and Retrieval of Medical MRI Data. Small Methods, 8(10), 2301585.
[CrossRef] [Google Scholar] - Ren, Y., Zhang, Y., Liu, Y., Wu, Q., Su, J., Wang, F., ... & Zhang, H. (2022). DNA‐Based Concatenated Encoding System for High‐Reliability and High‐Density Data Storage. Small Methods, 6(4), 2101335.
[CrossRef] [Google Scholar] - Song, L., Geng, F., Gong, Z. Y., Chen, X., Tang, J., Gong, C., ... & Yuan, Y. J. (2022). Robust data storage in DNA by de Bruijn graph-based de novo strand assembly. Nature communications, 13(1), 5361.
[CrossRef] [Google Scholar] - Liu, D., Xu, D., Shi, L., Zhang, J., Bi, K., Luo, B., ... & Ping, Z. (2025). A practical DNA data storage using an expanded alphabet introducing 5-methylcytosine. Gigabyte, 2025, gigabyte147-0.
[CrossRef] [Google Scholar] - Winston, C., Organick, L., Ward, D., Ceze, L., Strauss, K., & Chen, Y. J. (2022). Combinatorial PCR method for efficient, selective oligo retrieval from complex oligo pools. ACS Synthetic Biology, 11(5), 1727-1734.
[CrossRef] [Google Scholar] - Zheng, Y., Cao, B., Zhang, X., Cui, S., Wang, B., & Zhang, Q. (2024). DNA-QLC: an efficient and reliable image encoding scheme for DNA storage. BMC genomics, 25(1), 266.
[CrossRef] [Google Scholar]
Cite This Article
TY - JOUR AU - Chen, Rongrong AU - Li, Xue AU - Cao, Ben PY - 2025 DA - 2025/09/26 TI - The Future of DNA Storage in Revolutionizing Biological Data Management JO - Journal of Artificial Intelligence in Bioinformatics T2 - Journal of Artificial Intelligence in Bioinformatics JF - Journal of Artificial Intelligence in Bioinformatics VL - 1 IS - 2 SP - 51 EP - 57 DO - 10.62762/JAIB.2025.924847 UR - https://www.icck.org/article/abs/JAIB.2025.924847 KW - DNA storage KW - biological data management AB - Compared with traditional storage media, biological data storage has advanced more rapidly in capacity, diversity, and lifespan, and it also enables continuous data retention. The DNA molecule, as Nature's own archival medium, provides unparalleled density, longevity, and passive durability, making it a compelling foundation for the next generation of "cold" and "deeply cold" archives. Since 2019, progress across the stack—coding for insertion–deletion–missing channels, large-scale random access, enzyme writing, nanopore-native retrieval, and chemically robust expansion—has transformed DNA storage from provocative demonstrations into mature technologies with early end-to-end prototypes. From this perspective, we believe biological data management (BDM) is the first area where DNA storage can have practical impact, including raw sequencing archives, microscopic images, clinical omics data, and compliance-driven retention of de-identified records. We synthesize the latest technology levels, highlight what is actually working in today's labs, distinguish challenges from bottlenecks (particularly write costs/latencies and standardization), and propose a 2025–2030 roadmap with specific milestones in coding, writing, access, and media preservation. Finally, this paper proposes guidelines for integrating DNA archives into biological institutes, biobanks, and hospital systems as a supplementary layer to tape/object storage, and outlines a research agenda covering privacy, chain of custody, and sustainability. SN - 3068-7535 PB - Institute of Central Computation and Knowledge LA - English ER -
@article{Chen2025The,
author = {Rongrong Chen and Xue Li and Ben Cao},
title = {The Future of DNA Storage in Revolutionizing Biological Data Management},
journal = {Journal of Artificial Intelligence in Bioinformatics},
year = {2025},
volume = {1},
number = {2},
pages = {51-57},
doi = {10.62762/JAIB.2025.924847},
url = {https://www.icck.org/article/abs/JAIB.2025.924847},
abstract = {Compared with traditional storage media, biological data storage has advanced more rapidly in capacity, diversity, and lifespan, and it also enables continuous data retention. The DNA molecule, as Nature's own archival medium, provides unparalleled density, longevity, and passive durability, making it a compelling foundation for the next generation of "cold" and "deeply cold" archives. Since 2019, progress across the stack—coding for insertion–deletion–missing channels, large-scale random access, enzyme writing, nanopore-native retrieval, and chemically robust expansion—has transformed DNA storage from provocative demonstrations into mature technologies with early end-to-end prototypes. From this perspective, we believe biological data management (BDM) is the first area where DNA storage can have practical impact, including raw sequencing archives, microscopic images, clinical omics data, and compliance-driven retention of de-identified records. We synthesize the latest technology levels, highlight what is actually working in today's labs, distinguish challenges from bottlenecks (particularly write costs/latencies and standardization), and propose a 2025–2030 roadmap with specific milestones in coding, writing, access, and media preservation. Finally, this paper proposes guidelines for integrating DNA archives into biological institutes, biobanks, and hospital systems as a supplementary layer to tape/object storage, and outlines a research agenda covering privacy, chain of custody, and sustainability.},
keywords = {DNA storage, biological data management},
issn = {3068-7535},
publisher = {Institute of Central Computation and Knowledge}
}
Article Metrics
Publisher's Note
ICCK stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and Permissions
Copyright © 2025 by the Author(s). Published by Institute of Central Computation and Knowledge. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
Portico