Omics – tara-oceans-science.org

Key advances: global gene catalogs, genome-resolved metagenomics, functional biogeography

Tara Oceans applied large‑scale, high‑throughput DNA and RNA sequencing to reveal the genetic and functional diversity of marine plankton. Multi‑marker DNA metabarcoding provided a first global view of taxonomic diversity and relative abundance of prokaryotic and eukaryotic lineages (e.g., Tara Oceans publication numbers 22, 24, 29, 36, 41, 47, 48, 61, 89, 111, 124, 156). Metagenomics and metatranscriptomics revealed gene content, metabolic potential, and in situ gene expression across viruses, bacteria, archaea, and eukaryotes, enabling gene‑centric and genome‑resolved analyses at unprecedented scale (e.g., Tara Oceans publication numbers 23, 56, 62, 86, 94, 99, 106, 108, 122, 129, 132, 140).

To address the historical under‑representation of marine eukaryotes in reference databases, Tara Oceans generated novel genomic resources, including single‑cell genomes from uncultured protists and reference transcriptomes from multiple species (Tara Oceans publication numbers 58, 63, 91, 104). Protocols were specifically optimized for low‑biomass open‑ocean samples to minimize technical bias and ensure cross‑comparability (Tara Oceans publication numbers 53). To date, the project has produced more than 50 terabases of raw sequencing data from over 4,300 size‑fractionated plankton communities, constituting the largest homogeneous multi‑omics dataset for any ecosystem (Table 1).

Kingdom‑specific gene catalogs were reconstructed for viruses, prokaryotes, and eukaryotes using tailored bioinformatics pipelines, enabling comparative analyses of genome evolution, metabolic innovation, and biogeography. More recent efforts have focused on genome‑resolved metagenomics, including the reconstruction of thousands of metagenome‑assembled genomes (MAGs) and biosynthetic gene clusters (BGCs), opening new perspectives on microbial adaptation, chemical ecology, and functional diversity in the ocean (Tara Oceans publication numbers 86, 99, 118, 130, 132).

Table 1. Summary of Tara Oceans ‘omics’ datasets, generated from size-fractionated plankton samples, and single cells collected across the world oceans.

Size Fractions (μm)	Target groups	Omics type	# of plankton communities analysed	Mean # of reads/sample (in million of paired reads)	Total # of reads (in billion of paired reads)
< 0.2	phages	metaG	112	86	9.6
0.2-1.6; 0.1-0.2; 0.45-0.8; 0.2-0.45	giant viruses (DNA viruses or NCLDV)	metaG	73	111	9
0.2-1.6; 0.2-3	giant viruses (or NCLDV) and prokaryotes	metaT (random priming) metaG 16S metaB	153 243 1142	160 117 0.5	33 25 0.44
0.8-inf; 3-inf; 0.8-5; (0.8-3); 5-20 (3-20); 20-180; 180-2000	protists and metazoa	16S metaB 18S metaB metaG metaT (polyA RNA)	968 850 401 441	0.5 1.9 160 160	0.44 1.7 83 86
Transcriptomes	protists	De novo sequencing	78 cultured organisms	30	2.1
SAGs samples	protists	De novo sequencing	281 single cells	33	11
TOTAL			4 383 communities		252 billion paired reads