Store anything. In DNA.
Compress terabytes into molecules no larger than a grain of salt. A production-ready framework for synthetic DNA data storage.
Four steps from binary
to molecule.
Each step activates as you scroll. The framework handles codec selection, error correction, and synthesis manifests — you handle the data.
Binary → Base Pairs
The ADS Codex v3 converts binary data into ternary, then maps each trit to a nucleotide using context-aware lookup tables. Homopolymer runs are eliminated. GC content is balanced to 50±3%.
Reed-Solomon + Insertion Guard
Synthesis introduces insertion and deletion errors at ~1.2% rate — fundamentally different from digital bit-flips. Encode applies a dual-layer ECC: outer Reed-Solomon codes plus inner CIGAR-alignment guards, achieving 99.97% fidelity.
Strand Manifest → Oligo Order
Encode generates a synthesis manifest compatible with Twist Bioscience, IDT, and Integrated DNA Technologies APIs. Each 200-nt oligonucleotide is addressable by index prefix for random-access retrieval.
Sequencing → Decoded Binary
Sequencing reads are fed back into the Encode decoder. PCR amplification by index prefix enables targeted random access — retrieve a single file from petabytes without reading the entire pool.
Numbers that make
silicon look temporary.
Benchmarked on 4.7 GB genomics archive. All metrics from production encoding sessions. No theoretical limits — measured results.
A football field of server racks replaced by a storage unit the size of a football. Source: IARPA MIST Program, 2025.
Post-silicon preservation
for records that must survive centuries.
Silicon supply runs out in 2040. DNA doesn't degrade, doesn't require power, and copies for negligible cost. This is the long game.
Density beyond silicon
One gram of DNA stores 215 petabytes. The entire Library of Congress — 74 terabytes — fits in a structure the size of a poppy seed. 6,000 times over.
10,000-year durability
DNA encapsulated in salt remains stable for decades at room temperature. Under controlled conditions: millennia. No maintenance, no degradation, no refresh cycles.
GC-balanced encoding
Stringent randomization eliminates homopolymer runs and maintains 50±3% GC content — critical for accurate synthesis and sequencing downstream.
Dual-layer error correction
Synthesis errors aren't bit-flips — they're insertions and deletions. Standard ECC fails. Encode uses Reed-Solomon outer codes plus CIGAR-alignment inner guards, tuned for indel error profiles. Fidelity: 99.97%.
DDSA Alliance compatible
Encode outputs synthesis manifests compatible with Twist Bioscience, IDT, Illumina, and Western Digital APIs. Built to the DNA Data Storage Alliance open standard.
Random access retrieval
Each oligonucleotide carries a 20-bp index prefix. PCR amplification targets specific files without sequencing the entire pool — like seeking a file on disk.
Used in production by researchers,
builders, and archivists.
"We encoded 340 GB of climate simulation outputs from our 2025 Arctic survey. The Wukong-3 codec handled the binary-heavy dataset better than anything we'd benchmarked — 847:1 compression, 99.94% decode fidelity on first read. This is the archival infrastructure we've been waiting for."

"Our pipeline prototyping went from 6 weeks to 4 days. Encode's manifest export plugs directly into our Twist order queue. We're encoding 50 MB test datasets in the sandbox before committing to full synthesis — exactly the iteration speed we needed."

"We're evaluating post-silicon archival for 400 years of municipal records — birth certificates, land deeds, legal filings. Encode is the first framework that gave our legal team confidence in the error-correction audit trail. The CIGAR alignment report is exactly what compliance needs."
