Key advances: global gene catalogs, genome-resolved metagenomics, functional biogeography
Tara Oceans applied large‑scale, high‑throughput DNA and RNA sequencing to reveal the genetic and functional diversity of marine plankton. Multi‑marker DNA metabarcoding provided a first global view of taxonomic diversity and relative abundance of prokaryotic and eukaryotic lineages (e.g., Tara Oceans publication numbers 22, 24, 29, 36, 41, 47, 48, 61, 89, 111, 124, 156). Metagenomics and metatranscriptomics revealed gene content, metabolic potential, and in situ gene expression across viruses, bacteria, archaea, and eukaryotes, enabling gene‑centric and genome‑resolved analyses at unprecedented scale (e.g., Tara Oceans publication numbers 23, 56, 62, 86, 94, 99, 106, 108, 122, 129, 132, 140).
To address the historical under‑representation of marine eukaryotes in reference databases, Tara Oceans generated novel genomic resources, including single‑cell genomes from uncultured protists and reference transcriptomes from multiple species (Tara Oceans publication numbers 58, 63, 91, 104). Protocols were specifically optimized for low‑biomass open‑ocean samples to minimize technical bias and ensure cross‑comparability (Tara Oceans publication numbers 53). To date, the project has produced more than 50 terabases of raw sequencing data from over 4,300 size‑fractionated plankton communities, constituting the largest homogeneous multi‑omics dataset for any ecosystem (Table 1).
Kingdom‑specific gene catalogs were reconstructed for viruses, prokaryotes, and eukaryotes using tailored bioinformatics pipelines, enabling comparative analyses of genome evolution, metabolic innovation, and biogeography. More recent efforts have focused on genome‑resolved metagenomics, including the reconstruction of thousands of metagenome‑assembled genomes (MAGs) and biosynthetic gene clusters (BGCs), opening new perspectives on microbial adaptation, chemical ecology, and functional diversity in the ocean (Tara Oceans publication numbers 86, 99, 118, 130, 132).
Table 1. Summary of Tara Oceans ‘omics’ datasets, generated from size-fractionated plankton samples, and single cells collected across the world oceans.
| Size Fractions (μm) | Target groups | Omics type | # of plankton communities analysed | Mean # of reads/sample (in million of paired reads) | Total # of reads (in billion of paired reads) |
|---|---|---|---|---|---|
| < 0.2 | phages | metaG | 112 | 86 | 9.6 |
| 0.2-1.6; 0.1-0.2; 0.45-0.8; 0.2-0.45 | giruses (giant DNA viruses) | metaG | 73 | 111 | 9 |
| 0.2-1.6; 0.2-3 | giruses and prokaryotes | metaT (random priming) metaG 16S metaB | 153 243 1142 | 160 117 0.5 | 33 25 0.44 |
| 0.8-inf; 3-inf; 0.8-5; (0.8-3); 5-20 (3-20); 20-180; 180-2000 | protists and metazoa | 16S metaB 18S metaB metaG metaT (polyA RNA) | 968 850 401 441 | 0.5 1.9 160 160 | 0.44 1.7 83 86 |
| Transcriptomes | protists | De novo sequencing | 78 cultured organisms | 30 | 2.1 |
| SAGs samples | protists | De novo sequencing | 281 single cells | 33 | 11 |
| TOTAL | 4 383 communities | 252 billion paired reads |
