Key advances: global gene catalogs, genome-resolved metagenomics, functional biogeography

Tara Oceans applied large‑scale, high‑throughput DNA and RNA sequencing to reveal the genetic and functional diversity of marine plankton. Multi‑marker DNA metabarcoding provided a first global view of taxonomic diversity and relative abundance of prokaryotic and eukaryotic lineages (e.g., Tara Oceans publication numbers 222429364147486189111124156). Metagenomics and metatranscriptomics revealed gene content, metabolic potential, and in situ gene expression across viruses, bacteria, archaea, and eukaryotes, enabling gene‑centric and genome‑resolved analyses at unprecedented scale (e.g., Tara Oceans publication numbers 235662869499106108122129132140).

To address the historical under‑representation of marine eukaryotes in reference databases, Tara Oceans generated novel genomic resources, including single‑cell genomes from uncultured protists and reference transcriptomes from multiple species (Tara Oceans publication numbers 586391104). Protocols were specifically optimized for low‑biomass open‑ocean samples to minimize technical bias and ensure cross‑comparability (Tara Oceans publication numbers 53). To date, the project has produced more than 50 terabases of raw sequencing data from over 4,300 size‑fractionated plankton communities, constituting the largest homogeneous multi‑omics dataset for any ecosystem (Table 1).

Kingdom‑specific gene catalogs were reconstructed for viruses, prokaryotes, and eukaryotes using tailored bioinformatics pipelines, enabling comparative analyses of genome evolution, metabolic innovation, and biogeography. More recent efforts have focused on genome‑resolved metagenomics, including the reconstruction of thousands of metagenome‑assembled genomes (MAGs) and biosynthetic gene clusters (BGCs), opening new perspectives on microbial adaptation, chemical ecology, and functional diversity in the ocean (Tara Oceans publication numbers 8699118130132).

Table 1. Summary of Tara Oceans ‘omics’ datasets, generated from size-fractionated plankton samples, and single cells collected across the world oceans.

Size Fractions (μm)Target groupsOmics type# of plankton communities analysedMean # of reads/sample (in million of paired reads)Total # of reads (in billion of paired reads)
< 0.2phagesmetaG112869.6
0.2-1.6; 0.1-0.2; 0.45-0.8; 0.2-0.45giruses (giant DNA viruses)metaG731119
0.2-1.6; 0.2-3giruses and prokaryotesmetaT (random priming)
metaG
16S metaB
153
243
1142
160
117
0.5
33
25
0.44
0.8-inf; 3-inf; 0.8-5; (0.8-3); 5-20 (3-20); 20-180; 180-2000protists and metazoa16S metaB
18S metaB
metaG
metaT (polyA RNA)
968
850
401
441
0.5
1.9
160
160
0.44
1.7
83
86
TranscriptomesprotistsDe novo sequencing78 cultured organisms302.1
SAGs samplesprotistsDe novo sequencing281 single cells3311
TOTAL4 383 communities252 billion paired reads