The existence of prokaryotes is very important for the stability and thriving of ecosystems. For example, they are a necessary part of soil formation and stabilization processes through the breakdown of organic matter and development of biofilms. One gram of soil contains up to 10 billion microorganisms most of them prokaryotic belonging to about 1, species.
Many species of bacteria use substances released from plant roots, such as acids and carbohydrates, as nutrients. In salty lakes such as the Dead Sea Figure 1 , salt-loving halobacteria decompose dead brine shrimp and nourish young brine shrimp and flies with the products of bacterial metabolism. Figure 1. In addition to living in the ground and the water, prokaryotic microorganisms are abundant in the air, even high in the atmosphere.
There may be up to 2, different kinds of bacteria in the air, similar to their diversity in the soil. Prokaryotes can be found everywhere on earth because they are extremely resilient and adaptable. They are often metabolically flexible, which means that they might easily switch from one energy source to another, depending on the availability of the sources, or from one metabolic pathway to another.
For example, certain prokaryotic cyanobacteria can switch from a conventional type of lipid metabolism, which includes production of fatty aldehydes, to a different type of lipid metabolism that generates biofuel, such as fatty acids and wax esters. Groundwater bacteria store complex high-energy carbohydrates when grown in pure groundwater, but they metabolize these molecules when the groundwater is enriched with phosphates.
Some bacteria get their energy by reducing sulfates into sulfides, but can switch to a different metabolic pathway when necessary, producing acids and free hydrogen ions. Organisms such as animals require organic carbon to grow, but, unlike prokaryotes, they are unable to use inorganic carbon sources like carbon dioxide.
Thus, animals rely on prokaryotes to convert carbon dioxide into organic carbon products that they can use. This process of converting carbon dioxide to organic carbon products is called carbon fixation. Plants and animals also rely heavily on prokaryotes for nitrogen fixation , the conversion of atmospheric nitrogen into ammonia, a compound that some plants can use to form many different biomolecules necessary to their survival.
Bacteria in the genus Rhizobium , for example, are nitrogen-fixing bacteria; they live in the roots of legume plants such as clover, alfalfa, and peas Figure 2. Ammonia produced by Rhizobium helps these plants to survive by enabling them to make building blocks of nucleic acids.
In turn, these plants may be eaten by animals—sustaining their growth and survival—or they may die, in which case the products of nitrogen fixation will enrich the soil and be used by other plants. Figure 2. The bacteroids are visible as darker ovals within the larger plant cell. Another positive function of prokaryotes is in cleaning up the environment. Recently, some researchers focused on the diversity and functions of prokaryotes in manmade environments.
They found that some bacteria play a unique role in degrading toxic chemicals that pollute water and soil. Despite all of the positive and helpful roles prokaryotes play, some are human pathogens that may cause illness or infection when they enter the body.
In addition, some bacteria can contaminate food, causing spoilage or foodborne illness, which makes them subjects of concern in food preparation and safety. Besides pathogens, which have a direct impact on human health, prokaryotes also affect humans in many indirect ways.
For example, prokaryotes are now thought to be key players in the processes of climate change. Carbon trapped in the permafrost is gradually released and metabolized by prokaryotes. This produces massive amounts of carbon dioxide and methane, greenhouse gases that escape into the atmosphere and contribute to the greenhouse effect. As we have learned, prokaryotic microorganisms can associate with plants and animals.
Often, this association results in unique relationships between organisms. For example, bacteria living on the roots or leaves of a plant get nutrients from the plant and, in return, produce substances that protect the plant from pathogens. On the other hand, some bacteria are plant pathogens that use mechanisms of infection similar to bacterial pathogens of animals and humans. Prokaryotes live in a community , or a group of interacting populations of organisms.
A population is a group of individual organisms belonging to the same biological species and limited to a certain geographic area. Populations can have cooperative interactions , which benefit the populations, or competitive interactions , in which one population competes with another for resources. The study of these interactions between populations is called microbial ecology. Any interaction between different species within a community is called symbiosis.
Such interactions fall along a continuum between opposition and cooperation. Interactions in a symbiotic relationship may be beneficial or harmful, or have no effect on one or both of the species involved. Table 1 summarizes the main types of symbiotic interactions among prokaryotes. When two species benefit from each other, the symbiosis is called mutualism or syntropy, or crossfeeding.
For example, humans have a mutualistic relationship with the bacterium Bacteroides thetaiotetraiotamicron , which lives in the intestinal tract. Humans also have a mutualistic relationship with certain strains of Escherichia coli , another bacterium found in the gut.
This is only true for some strains of E. Other strains are pathogenic and do not have a mutualistic relationship with humans.
A type of symbiosis in which one population harms another but remains unaffected itself is called amensalism. In the case of bacteria, some amensalist species produce bactericidal substances that kill other species of bacteria. For example, the bacterium Lucilia sericata produces a protein that destroys Staphylococcus aureus , a bacterium commonly found on the surface of the human skin.
Too much handwashing can affect this relationship and lead to S. In another type of symbiosis, called commensalism , one organism benefits while the other is unaffected.
This occurs when the bacterium Staphylococcus epidermidis uses the dead cells of the human skin as nutrients. Billions of these bacteria live on our skin, but in most cases especially when our immune system is healthy , we do not react to them in any way. If neither of the symbiotic organisms is affected in any way, we call this type of symbiosis neutralism. An example of neutralism is the coexistence of metabolically active vegetating bacteria and endospores dormant, metabolically passive bacteria.
For example, the bacterium Bacillus anthracis typically forms endospores in soil when conditions are unfavorable. If the soil is warmed and enriched with nutrients, some endospores germinate and remain in symbiosis with other endospores that have not germinated.
A type of symbiosis in which one organism benefits while harming the other is called parasitism. The relationship between humans and many pathogenic prokaryotes can be characterized as parasitic because these organisms invade the body, producing toxic substances or infectious diseases that cause harm. The global diversity of Bacteria and Archaea, the most ancient and most widespread forms of life on Earth, is a subject of intense controversy.
This controversy stems largely from the fact that existing estimates are entirely based on theoretical models or extrapolations from small and biased data sets.
Here, in an attempt to census the bulk of Earth's bacterial and archaeal "prokaryotic" clades and to estimate their overall global richness, we analyzed over 1.
Using several statistical approaches, we estimate that there exist globally about 0. The distribution of relative OTU abundances is consistent with a log-normal model commonly observed in larger organisms; the total number of OTUs predicted by this model is also consistent with our global richness estimates. By combining our estimates with the ratio of full-length versus partial-length V4 sequence diversity in the SILVA sequence database, we further estimate that there exist about 2.
When restricting our analysis to the Americas, while controlling for the number of studies, we obtain similar richness estimates as for the global data set, suggesting that most OTUs are globally distributed. Our estimates constrain the extent of a poorly quantified rare microbial biosphere and refute recent predictions that there exist trillions of prokaryotic OTUs.
The global diversity of Bacteria and Archaea "prokaryotes" , the most ancient and most widespread forms of life on Earth, is subject to high uncertainty. Here, to estimate the global diversity of prokaryotes, we analyzed a large number of 16S ribosomal RNA gene sequences, found in all prokaryotes and commonly used to catalogue prokaryotic diversity.
Sequences were obtained from a multitude of environments across thousands of geographic locations worldwide. From this data set, we recovered , prokaryotic operational taxonomic units OTUs , i. Using several statistical approaches and through comparison with existing databases and previous independent surveys, we estimate that there exist globally between 0.
When restricting our analysis to the Americas, while controlling for the number of studies, we obtain similar estimates as for the global data set, suggesting that most OTUs are not restricted to a single continent but are instead globally distributed.
Our estimates constrain the extent of a commonly hypothesized but poorly quantified rare prokaryotic biosphere and refute recent predictions that there exists trillions of prokaryotic OTUs. Our findings also indicate that, contrary to common speculation, extinctions may strongly influence global prokaryotic diversity.
PLoS Biol 17 2 : e Academic Editor: Janet K. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability: All raw sequencing data used are available on public repositories; sample descriptions and accession numbers are provided as S1 Data.
FM was supported by a Banting postdoctoral fellowship. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist. Microorganisms are the most ancient and the most widespread form of life on Earth, inhabiting virtually every ecosystem and driving the bulk of global biogeochemical cycles. Culture-independent methods such as amplicon sequencing of 16S ribosomal RNA genes revealed the existence of a potentially vast undescribed microbial diversity, the full extent of which, however, remains highly controversial [ 1 — 9 ].
Determining the extent of this diversity remains an important but challenging task in our overall understanding of life, with major implications for ecological and evolutionary theory, environmental sciences and industry. Notably, a global census of microbial phylogenetic diversity, or at least knowledge of its full extent, is essential for reconstructing microbial evolution over geological time [ 10 ]. Estimates of global microbial diversity are also needed for scrutinizing proposed biodiversity scaling laws and macroecological theories [ 2 , 6 , 11 ].
Finally, undiscovered microorganisms may exhibit a large breadth of metabolic capabilities of particular interest to industry and medicine.
An efficient exploration of this potential and realistic assessment of the feasibility of such an endeavor requires knowledge of the gaps in existing diversity databases [ 12 — 14 ]. The extent of global microbial diversity remains subject to intense controversy and widely diverging speculations [ 1 — 9 ]. For example, Mora and colleagues [ 3 ] used the subset of currently named prokaryotic species to estimate that there exist approximately 10, bacterial species worldwide; this is clearly a strong underestimate, given that the SILVA sequence database [ 14 ] alone now contains hundreds of thousands of bacterial operational taxonomic units OTUs , i.
Yarza and colleagues [ 4 ] and Schloss and colleagues [ 5 ] estimated that there exist a few million bacterial and archaeal "prokaryotic" OTUs based on sequence discovery statistics in SILVA; however, environmental and taxonomic biases in SILVA [ 15 ] compromise the reliability of these estimates [ 16 ]. Larsen and colleagues [ 9 ] estimated that there exist billions of host-associated bacterial OTUs based on a heuristic and mathematically flawed extrapolation of bacterial OTU counts in typical insect species to all animal species see the "Implications" section below for a detailed discussion.
Locey's estimate has fueled discussions about a potentially immense undiscovered microbial diversity and its uncertain ecological roles [ 16 — 20 ]. Locey's extrapolation of empirical scaling laws from local to global scales and across several orders of magnitude has been criticized and remains controversial [ 8 , 21 ].
Here, to address the above shortcomings, we attempted to explicitly census a large fraction of extant prokaryotic clades and used our census to estimate and chart total global prokaryotic OTU richness.
For this census, we compiled massive publicly available raw Illumina 16S amplicon sequencing data from 34, samples across studies, covering a wide range of environments from over 2, distinct geographical locations worldwide S1 Fig. Environments covered include the surface and deep ocean, oxygen minimum zones, freshwater and hypersaline lakes, rivers, groundwater, marine surface and deep subsurface sediments, agricultural and forest soils, peats, permafrost, deserts, animal hosts and feces, plant leafs and rhizospheres, salt marshes, bioreactors, processed food, methane seeps, mine drainages, sewages, hydrothermal vents, and hot springs overview in S1 Data.
Particular effort was put into representing soils 14, samples across studies , sediments 3, samples across 37 studies , and animal guts 8, samples across 52 studies , which likely harbor a large fraction of Earth's prokaryotic diversity [ 22 ].
Sequences in this composite data set cover at least basepairs in the V4 hypervariable region of the 16S gene, a commonly targeted region in microbial ecology [ 22 — 24 ]. Based on the recovered OTUs, henceforth referred to as Global Prokaryotic Census GPC , and through comparisons to previous surveys and existing databases, we estimate global prokaryotic OTU richness and highlight major implications for microbial ecology and evolution.
We emphasize that our main objective was to estimate global prokaryotic richness using as deep of a census and covering as many environments and geographic locations as possible; as a trade off, our data set does not offer the same level of experimental standardization across samples nor the amount of metadata included in projects such as the Earth Microbiome Project EMP [ 22 ].
That said, we point out that clusters of the 16S gene—regardless of similarity threshold and even if completely free of sequencing errors—only provide an approximate "species" analog to sexually reproducing organisms. Indeed, even strains with identical 16S sequences may exhibit different genomic content and ecological strategies; hence, the 16S gene is not always sufficient for distinguishing ecologically differentiated organisms, even when considering exact sequence variants [ 29 — 30 ].
Whether and how prokaryotic "species" can—or even need to—ever be reasonably defined remains highly debated [ 30 — 33 ]. To date, the 16S gene remains an important and the most popular marker for cataloguing prokaryotic diversity and for describing evolutionary relationships in a well-defined and reproducible manner [ 4 , 27 ].
We stress that prokaryotic 16S diversity detected and estimated based on amplicon sequences, as in this and most previous studies, is limited to clades detectable by the PCR primers used. As discussed below, the GPC partly resolves the issue of limited primer scope by using multiple alternative primers; however, it is in principle still possible that some clades are completely missed.
To ensure maximal phylogenetic coverage, the raw sequencing data from each study was considered as input to our analyses. To avoid spurious i. While this additional quality filter may also remove some biological OTUs, aggressive filtering is necessary for eliminating spurious OTUs, a common and serious problem in amplicon sequencing studies [ 34 — 37 ].
The resulting GPC comprises , prokaryotic OTUs , bacterial and 49, archaeal , accounting for 1,,, reads. Accumulation curves of bacterial and archaeal OTUs discovered by the GPC, as a function of studies included, clearly show a deceleration with increasing number of studies Fig 1A and 1B and provide an estimate of how many novel OTUs would be discovered in subsequent studies.
As we show below, this estimate is consistent with the fractions of other independent data sets and databases covered rediscovered by the GPC. Based on the fraction of reads matched to the rarest OTUs i. This probability is sometimes referred to as "Good's coverage" and corresponds to the proportion of living or recently deceased prokaryotic cells, detectable by current 16S amplicon sequencing techniques, which is represented by OTUs in the GPC.
We emphasize that Good's coverage should not be interpreted as the fraction of global OTU richness represented by the GPC; indeed, estimation of the latter requires additional statistical reasoning, as presented below. A, B Accumulation curves showing the number of bacterial A and archaeal B OTUs discovered, depending on the number of distinct studies included.
Curves are averaged over random subsamplings, and whiskers show corresponding standard deviations. Continuous curves were calculated using all studies worldwide , while blue dashed curves were calculated using solely studies performed in the Americas or near American coasts.
Whiskers indicate standard errors, estimated from the underlying models; most standard errors are likely underestimated by the models, so the variability between models is probably a more honest assessment of uncertainty. E The iChao2split richness estimator is based on the numbers of OTUs discovered once, twice, thrice, or four times when studies are randomly split into four complementary "sampling units" shaded circles.
Average estimates were obtained by repeating the random split multiple times. To estimate the total number of extant prokaryotic OTUs globally discovered plus undiscovered , we used statistical approaches based on the number of OTUs that have been discovered in exactly one study Q 1 , the number of OTUs discovered in exactly two studies Q 2 , and so on.
Indeed, the recommended and only statistically admissible way to estimate OTU richness is by modeling the incidence frequency counts Q i in order to predict the number of unobserved OTUs Q 0 [ 21 , 39 — 41 ]. These methods date back to mathematical theorems for cryptographic analyses during World War II and have been used for microbial as well as macrobial richness estimates [ 40 , 42 — 44 ].
Intuitively, widely distributed and abundant OTUs—which are almost certain to be detected—contain very little information about undetected OTUs, while rarely detected OTUs e. All of the above estimators have been designed to account for heterogeneities in detection frequencies among OTUs i. We note that the majority of existing richness estimators, including the ones described above, are based on models in which individual sampling units are assumed to be equivalent e.
To check whether our estimates are sensitive to this caveat, we also deployed an estimation approach whereby we randomly assigned studies to four complementary and equally sized groups representing four statistically equivalent global "sampling units" and used the iChao2 estimator based on the number of OTUs found in exactly one, two, three of four sampling units "iChao2split," illustration in Fig 1E.
The majority of prokaryotic OTUs are estimated to be bacterial, with bacterial richness Fig 1C being roughly 10 times greater than archaeal richness Fig 1D. To further scrutinize our estimates of global OTU richness and to verify whether a substantial fraction of that richness is indeed covered by the GPC, we determined the fraction of 16S sequences from previous global surveys or existing databases that was rediscovered "recaptured" by the GPC.
While our statistical richness estimators Fig 1C and 1D were designed to account for variable detection probabilities among OTUs, the potential risk of neglecting a large number of extremely rare OTUs cannot be overemphasized. To further assess this risk, we also explicitly investigated the global distribution of relative OTU abundances.
Specifically, for each OTU, we estimated its relative abundance in each sample using the Good—Turing formula [ 38 ] and then took the average across all samples to obtain its mean relative abundance MRA. This model accounted for our quality filtering and finite sequencing depths and was calibrated by comparing OTU discovery rates in the GPC with those in a rarefied variant of the GPC i. Following recommendations by Shoemaker and colleagues [ 11 ], we then fitted a log-normal model to the reconstructed distribution of MRAs of extant OTUs.
We found that the latter was well described by the log-normal model blue dashed curve in Fig 2A , resembling analogous observations commonly made for larger organisms. We point out that the log-normal model is largely phenomenological, although it is sometimes derived from certain stochastic population models [ 50 ]. Hence, we make no assertion as to which mechanisms could possibly lead to the observed log-normal—like distribution of MRAs and as to whether other potentially yet to be discovered models may be even more suitable.
This conclusion contrasts previous speculations that there exists a vast number of extremely rare and largely undetected OTUs, sometimes referred to as "rare microbial biosphere" [ 6 , 17 , 51 ]. According to the fitted log-normal model, there exist only approximately , prokaryotic OTUs across the entire range of MRAs, further supporting our other estimates. The blue dashed curve shows a log-normal distribution model [ 11 ] fitted to the estimated MRA distribution of extant OTUs.
Since OTUs are inevitably taxonomically identified through comparison with reference databases here, SILVA was used to identify OTUs at the kingdom level , censuses such as the GPC may in principle miss clades lacking a close relative in the databases.
This suggests that our taxonomic identification algorithm did not miss a substantial number of biological sequences at larger phylogenetic distances omitted sequences at greater distances are likely spurious, see Methods for details. Primer "blind spots," i. To investigate this caveat and to check whether a large fraction of diversity may have been missed by the GPC due to primer blind spots, we calculated the fraction of 16S sequences recovered from a multitude of environments using primer-independent metagenomics-based methods that were rediscovered by the GPC.
These recapture fractions are comparable to the fraction recovered from the EMP, suggesting that the fraction of OTUs missed by the GPC due to primer blind spots is small. One reason may be that the GPC comprises sequences obtained using a multitude of alternative primers optimized for different clades, therefore partly alleviating the problem of primer nonuniversality.
In particular, 16S sequences currently not detectable by any primers may only represent a minority of prokaryotic diversity, even if any given primer set has limited sensitivity scope. It is thus improbable that primer-independent methods will reveal a prokaryotic richness much i.
When we repeated our analyses using only studies from the Americas or near American coasts studies across 14 countries, see map in S1 Fig instead of the full GPC, OTU discovery rates for any given number of studies remained almost unchanged Fig 1A and 1B.
Hence, for the same "sampling effort," the same OTU richness is recovered from the Americas as from the full GPC, and importantly, the restriction to the Americas does not cause a stronger deceleration of OTU discovery rates.
This suggests that the majority of global prokaryotic OTUs could have been censused from a single hemisphere, if sufficient samples had been available.
Consistent with this conclusion, when controlling for the number of studies included and using the same methods as above, we found that prokaryotic OTU richness estimated for the Americas was very similar to estimates based on an equal number of studies randomly chosen from across the world 0. Our findings extend previous observations that for any given number of samples, similar prokaryotic OTU richness is recovered from soil in New York Central Park as from distinct soil samples worldwide [ 57 ].
Most prokaryotic OTUs thus appear to exhibit low geographic endemism and global dispersal ranges at geological time scales, i. A global distribution of prokaryotic OTUs has long been a central but controversial hypothesis [ 60 , 61 ]. Our finding provides strong support for this hypothesis and is also consistent with previous findings that most marine bacterial OTUs can be recovered from a single location in the ocean with sufficiently deep sequencing [ 62 , 63 ] and with findings that salt-marsh Nitrosomonadales OTUs are globally distributed [ 64 ].
That said, we point out that a global distribution of OTUs does not rule out geographic endemism at finer phylogenetic resolutions since younger clades, e. The uneven representation of various taxonomic groups is generally more pronounced at lower taxonomic levels, with some phyla being strongly overrepresented compared to others Fig 3B.
This indicates that some phyla are not represented in SILVA at all, consistent with conclusions from metagenomic studies [ 56 , 68 ]. Phyla are sorted in decreasing estimated OTU richness; only the 25 richest phyla are shown. For additional phyla not shown here, see S6 Fig. Our estimates also highlight strong differences in the OTU richness specific to different phyla, with Proteobacteria mostly Gammaproteobacteria and Deltaproteobacteria clearly dominating global richness, followed by the Firmicutes mostly Clostridia , Bacteroidetes mostly Bacteroidia , Nanoarchaeota mostly Woesearchaeia , Patescibacteria, and Planctomycetes mostly Planctomycetacia Figs 3A and S11A.
Hence, the large representation of Proteobacteria in reference databases and among cultured species [ 69 ] is not just the result of a biased discovery rate e. Similarly, the large richness of Firmicutes may be explained by their ability to colonize a wide range of animal hosts [ 56 ].
Interestingly, the Nanoarchaeota are known as a deeply branching and poorly characterized ancient clade [ 71 ], which has been suggested to comprise a largely underestimated diversity [ 72 ]. The few isolated Nanoarchaeota indicate that they share a common history of adaptation to ectosymbiosis [ 73 ], and this may have contributed to the difficulty of isolating representatives.
In contrast, while the Actinobacteria phylum contains the second largest number of cultured strains [ 69 , 74 ], it only ranks eigth in terms of estimated total OTU richness Fig 3A , suggesting a strong culturing bias for this phylum, consistent with previous findings [ 69 ]. We point out that extant prokaryotic diversity is the result of diversification and extinction processes operating over billions of years and throughout geological transitions [ 15 ].
It is thus possible that the relative richness of various taxa varied strongly over time. Our work suggests that global prokaryotic OTU richness is about six orders of magnitude lower than previously predicted via extrapolation of diversity scaling laws and OTU abundance distributions fitted to individual microbial communities [ 6 , 8 ].
While we find support for a log-normal distribution of mean relative OTU abundances consistent with assumptions made by Locey and colleagues [ 6 ], at least two aspects differentiate our approach from Locey and colleagues. First, we fitted the log-normal model to a global data set comprising thousands of samples across hundreds of environments rather than to individual local communities, thus obtaining a description of relative abundances that is more suitable for global richness estimates.
Second, we did not assume or extrapolate any phenomenological scaling relationships between different parameters of the model, thus relying on fewer questionable assumptions. The discrepancy between our estimates and those by Locey and colleagues [ 6 ] suggests that phenomenological scaling relationships of microbial diversity cannot be extrapolated to global scales when these relationships were fitted solely to individual communities.
This conclusion also supports arguments by [ 21 ] that the extrapolations performed by Locey and colleagues [ 6 ] have no predictive power and are statistically unsound. Our estimates also contrast extrapolations by Larsen and colleagues [ 9 ], who argued that there exist billions of animal-associated bacterial OTUs based on the number of OTUs typically found in individual insect species and the estimated total number of animal species.
One reason for this discrepancy may be that Larsen's extrapolation did not properly account for the overlap of microbiomes between animal taxa detailed discussion in S1 Text. Our much lower bacterial richness estimates suggest that many symbiotic OTUs are found in multiple host species that may or may not be closely related, potentially due to host trait convergences, consistent with recent observations [ 75 — 77 ].
Since the microbiome of only a minuscule fraction of animal species has been examined so far, it is quite possible that many allegedly "host-specific" bacteria are shared by a broader spectrum of host species than currently known.
This could explain why overall bacterial richness at the OTU level appears to have been largely unaffected by past mass animal extinctions, as recently suggested based on phylogenetic analyses [ 15 ].
Given the long evolutionary history and ubiquity of prokaryotes, a richness of only approximately 0. To put this finding into perspective, we considered a steady state null model, in which global prokaryotic cell counts N are constant over time, in which cells are replaced randomly and regardless of phylogenetic relationships via births and deaths, and in which the 16S-V4 region evolves neutrally [ 59 ] at some constant drift rate r , measured in mutations per site per generation and independently at each site.
Note that one important and potentially wrong assumption of this model is that cell turnover is statistically independent of phylogeny. A similar model was recently proposed by Straub and colleagues [ 78 ] as a null model for 16S phylogenies. The discrepancy also persists even if currently estimated 16S mutation rates r or global cell counts N were off by 10 orders of magnitude or even if global cell counts varied drastically e.
One explanation for this discrepancy could be that the evolution of the 16S-V4 region along a lineage is subject to strong constraints that favor some mutations or sequence variants more than others, thus effectively reducing the "permissible" sequence space [ 82 — 84 ].
Alternatively, some processes not captured by the model may eliminate all but just a small fraction of 16S sequence variants emerging over time.
Phylogenetically correlated turnover, i. This would imply that extinction plays a central role in prokaryotic diversification, as recently suggested by [ 15 ] and contrasting common speculations that prokaryotic OTUs are unlikely to go extinct [ 1 , 86 — 88 ]. For example, at coarser phylogenetic resolutions e. Reciprocally, when we analyzed a subset of our data approximately 0. This suggests that the global richness of exact sequence variants is at most an order of magnitude larger than the number of OTUs.
The sequence length considered may also affect global richness measures. When combined with our V4-based richness estimates, this suggests that there exist 2. Unfortunately, while full-length sequencing undoubtedly improves phylogenetic resolution, technical complications and a higher cost currently prevent the wide adoption of full-length 16S sequencing in microbial community surveys.
Finally, we stress that 16S diversity only provides a coarse surrogate for prokaryotic genomic and phenotypic diversity [ 29 , 30 ], and it is probable that the global number of prokaryote ecotypes greatly exceeds the number of OTUs. Cataloguing the phenotypic and genomic diversity of prokaryotes will undoubtedly be an important but much more challenging future task. In , Curtis and colleagues [ 2 ] hypothesized that experimental approaches to directly enumerating extant prokaryotic diversity will remain fruitless due to logistical challenges.
Our composite data set, covering a multitude of environments worldwide, enabled us to strongly constrain global prokaryotic OTU richness. Indeed, our global richness estimates are similar across a multitude of statistical estimators Fig 1C and 1D , all of which are based on different models of OTU detection probabilities and, in most cases, use a different set of OTU incidence frequency counts. The high fraction of 16S sequences from other amplicon- and metagenomic-sequencing surveys e.
While no particular 16S similarity threshold provides an ideal species analog, OTUs provide an operational and clearly defined measure of richness that can be compared across studies, environments, and geological time [ 15 ].
We reiterate that the goal of the GPC was to enable a more robust estimate of total extant prokaryotic richness than previous studies. Indeed, our estimates are based on an unprecedentedly large and environmentally broad composite sequencing data set, assembled from hundreds of studies utilizing alternative primers and alternative sampling techniques, and using a wide array of alternative statistical estimation methods for increased robustness.
The GPC can thus facilitate future efforts to catalogue and phenotypically describe Earth's extant prokaryotes. The GPC also opens up new avenues for reconstructing prokaryotic evolution over geological time using massive phylogenetic trees and for refining macroecological theories.
While long considered an unseen majority [ 79 ], thanks to ongoing technological revolutions, prokaryotes could one day become one of the most exhaustively characterized and best understood forms of life. Only Illumina sequences were downloaded to ensure sequence qualities en par with current standards and because Illumina-based studies typically achieve much deeper sequencing than studies using previous-generation e.
We only considered sequences covering the V4 hypervariable region for three reasons. Earl et al. Divergence was observed in genes that encode proteins involved in the uptake and breakdown of carbohydrates e. The observed variability among these loci, and others like them, indicates that certain metabolic and environmental-monitoring capabilities might not be required for the life of B. Resulting clusters or clades, when the trees are taken as phylogenies are evaluated on the basis of their separation from one another, and agreement with species or subspecies is recognized in traditional phenotype-based or minimally SSU rRNA tree-based classifications.
Although this seems a far safer classification procedure than taking microdiverse clustering of single marker gene sequences as indicators of natural organismal groupings, results are mixed.
Hanage et al. Similarly, there is between-species exchange among Streptococcus pneumoniae , S. In Papke et al. Whether or not phylogroups should be called species is not decidable by further experimentation: What's at issue is the species definition and how much fuzziness we want to accept within it. As another environmental example, Figure 3 shows a graphic summary of phylogenetic analysis of marine cyanobacterial data.
Although a robust tree can be made from the collective signal from up to 19 Prochlorococcus and Synechococcus genomes O.
Zhaxybayeva, F. Doolittle, T. Papke, and P. Gogarten, in prep. This conflicting signal is not noise, but rather evidence of an important evolutionary process—the exchange of genetic information between clustered populations with interpopulation barriers of varying strength.
There is for this group, dominant in our oceans, mounting evidence that phages play a key role in shuttling such information back and forth, and for local adaptation—to physical conditions, nutrient and light availability—mediated by gene acquisition and loss Kettler et al.
The genome core is not immune to such exchange, and so any strain or species phylogeny constructed from the concatenated sequences of shared genes must be considered a useful fiction, an oversimplification of a much more complex evolutionary history. Each point in a triangle simplex represents a set of orthologous genes that contains at least four analyzed genomes and as many as 19 genomes from this group.
Position of the point in the barycentric coordinate system triangle depends on bootstrap support values for each of three possible tree topologies with which each vertex is associated. The closer the point to the vertex, the higher its bootstrap support for that tree topology.
Poorly resolved relationships result in points located closer to the center of the triangle. For a full description of the methodology used to analyze embedded quartets, see Zhaxybayeva and Gogarten and Zhaxybayeva et al. Genomes are designated by their strain names. Bold Genomes of marine Synechococcus spp. Full analyses of the phylogenetic relationships within this group as well as details on the selection of sets of orthologous genes and phylogenetic analyses performed will be presented elsewhere O.
There must be some degree of fuzziness that is too extreme for us to permit in groups we want to call species, but without prior agreement on this, even species nominalism is unworkable as scientific discourse. The ecotype and BSC concepts might each under some conditions produce clusters so tight that most microbiologists would call them species, but it is not sufficient to show that this is sometimes possible.
Thus, species monism is not tenable. If we abandon monism and accept as pluralists that some prokaryotes form species by periodic selection and some by following the tenets of the BSC and perhaps some by other, still unknown, mechanisms , then it is very hard not to admit that some prokaryotes may not form species at all.
That is, species pluralism robs the category species of its claim on reality. This is not to say that specific groups of organisms for instance, Helicobacter pylori or Sulfolobus solfataricus cannot be taken as real. Even for groups that are fuzzier than allowed by whatever criteria are collectively accepted as a definition for example, Stackebrandt et al.
We remain confident that there are such taxa as Homo sapiens and Canis familiaris. There is a strong-felt need for a robust prokaryotic species ontology. Koeppel et al. This would be a disappointment perhaps, but it is no excuse for forcing a conceptual straitjacket on unruly data.
And such force may not be necessary. In the absence of species definitions or concepts we can still probe prokaryotic diversity at the sequence level Huber et al. We can still document recombination between genomes in a complex population Allen et al. Indeed, some do not use the word. We anticipate that as metagenomics and the sophisticated computational environment needed to understand and represent metagenomic data evolve, the word will disappear from scientific literature.
View all On the origin of prokaryotic species W. Previous Section Next Section. Figure 1. Figure 2. Figure 3. Previous Section. Abbott, R. Speciation in plants and animals: Pattern and process. B Biol. Achtman, M. CrossRef Medline Google Scholar. Acinas, S. Nature : — Allen, E. Aras, R. Nucleic Acids Res. Atwood, K. Baldo, L. Berg, O. Brochet, M. Cameron, A. Cohan, F. Coyne, J. Google Scholar. Darwin, C.
Denamur, E. Cell : — Deurenberg, R. Genetics : — Dorrell, N. Dupre, J. Dykhuizen, D. Earl, A. Trends Microbiol. Ereshefsky, M. CrossRef Google Scholar. Falush, D. Franklin, L. Fraser, C. Science : — Gerrish, P. Genetica — : — Gevers, D. Giovannoni, S. Glockner, F.
Hallam, S. Hanage, W. BMC Biol. Hao, W. Genome Res. BMC Genomics 9 : , doi: Hotopp, J. Microbiology : — Huber, J. Science : 97 — Hunt, D. Jaspers, E. Johnson, Z. Kettler, G. PLoS Genet. Kobayashi, I. Koeppel, A. Konstantinidis, K. Lan, R. Lawrence, J. Maharjan, R. Maiden, M. Majewski, J. Mallet, J. Heredity 95 : — Marri, P. BMC Evol. Martin, A. Massana, R. Mau, B. Genome Biol. Mayr, E. Medini, D. Meibom, K. Mes, T. Medline Google Scholar.
0コメント