Marine prokaryotes (bacteria and archaea) are the most abundant cellular organisms in the ocean, impacting global biogeochemical and energy cycling. Drawing a comprehensive picture of prokaryotic biodiversity and biogeography encompassing ecological and evolutionary concepts is critical to assess the Earth system. In Tara Oceans, we generated the largest ocean microbial sequencing data set available to date and developed new bioinformatics approaches (Sunagawa et al., Nature Methods, 2013, Logares et al., Env. Microb., 2014, Salazar et al., 2017 in prep) (Deliverables E2, 3) to explore the frontiers of global ocean microbial diversity. We found a minimum boundary of 35.000 prokaryotic ‘species’ in the pelagic realm, whose community structure appears to be essentially driven by seawater temperature in surface oceans (Sunagawa et al., Science, 2015). Analysis of the >200 metagenomes resulted in the first Ocean Microbial Reference Gene Catalogue (Sunagawa et al., Science, 2015) comprising >40 million non-redundant genes, which now serves as a treasure for biologists and marine scientists (Fig. 4) (Deliverables F3,5). The OM-RGC improved our understanding of biogeography and microbial functional capabilities across oceanic regions and ecosystems (Sunagawa et al., Mol Syst Biol 2015). A notable derivative was the inference of the first world ocean “interactome” uncovering organismal interactions across all domains of life including viruses, and showing the predominance of biotic relations in shaping the global plankton network (Lima Mendez et al., Science, 2015). We further used the power of Tara Oceans’ multi-omics and morphological data to unveil the functional and ecological significance of critical nitrogen-fixing cyanobacteria-haptophyte symbioses in the euphotic zone worldwide (Cabello et al., ISME, 2015; Cornejo-Castillo et al., Nature Com., 2016). A better understanding of the genetic capacity for mixotrophy and the factors controlling the biogeographic distribution of the two most abundant and widespread phototrophs on Earth (Prochlorococcus and Synechococcus) was established (Yelton et al., ISME, 2016; Farrant et al., PNAS, 2016). Finally, integration of the first large-scale deep oceans metagenomic dataset from the Malaspina circum-global expedition into the Tara Oceans gene repertoire allows critical comparison of microbial diversity across the depth of the ocean (Acinas et al. 2017, Nature, in prep).

Figure 4. Upper panel: Numerical breakdown of the Ocean Microbial Reference Gene Catalog. The OM-RGC contains >40 million non-redundant genes from marine viruses, archaea, bacteria and picoeukaryotes sampled from 243 Tara Oceans (TO) metagenomes generated from a 1000 m depth layer of the world oceans. Lower panel: Use of TO prokaryotic ‘omics’ data to unveil environmental drivers of surface microbial community composition. The principal coordinate (PC) analysis shows that plankton communities are not clearly grouped by their geographic origin (top), but rather separated by the local temperatures (bottom: strong correlation between the first PC and temperature). On the right, correlations [green lines: geographic distance–corrected Mantel tests] of plankton taxonomic [two independent methods: miTags and mOTUs] and functional [biochemical KEGG modules] compositions to key environmental parameters. The environmental parameters are also compared between themselves, with a colour gradient denoting pairwise Spearman’s correlation coefficients. Edge width corresponds to the Mantel’s r statistic for the corresponding distance correlations, and edge colour denotes the statistical significance based on 9,999 permutations (Sunagawa et al. 2015).