Genomes of all known members of a Plasmodium subgenus reveal paths to virulent human malaria
Plasmodium falciparum, the most virulent agent of human malaria, shares a recent common ancestor with the gorilla parasite Plasmodium praefalciparum. Little is known about the other gorilla- and chimpanzee-infecting species in the same (Laverania) subgenus as P. falciparum, but none of them are capable of establishing repeated infection and transmission in humans. To elucidate underlying mechanisms and the evolutionary history of this subgenus, we have generated multiple genomes from all known Laverania species. The completeness of our dataset allows us to conclude that interspecific gene transfers, as well as convergent evolution, were important in the evolution of these species. Striking copy number and structural variations were observed within gene families and one, stevor, shows a host-specific sequence pattern. The complete genome sequence of the closest ancestor of P. falciparum enables us to estimate the timing of the beginning of speciation to be 40,000–60,000 years ago followed by a population bottleneck around 4,000–6,000 years ago. Our data allow us also to search in detail for the features of P. falciparum that made it the only member of the Laverania able to infect and spread in humans.
Fig. 1 Overview of the dating of the evolution of the Laverania. a, Maximum likelihood tree of the Laverania on the basis of “Lav12sp” set of orthologues. All bootstrap values are 100. Coalescence-based estimates of the timing of speciation events are displayed on nodes, on the basis of intergenic and genic alignments. The number of genomes obtained per ape Laverania species is provided between parentheses. b, Multiple sequentially Markovian coalescent estimates of the effective population size (Ne) in the P. falciparum and P. praefalciparum population. Assuming our estimate of the number of mitotic events per year, a bottleneck occurred in P. falciparum 4,000–6,000 years ago. The y axis shows the natural logarithm (Ln) of Ne. Bootstrapping (pale lines) was performed by randomly resampling segregating sites from the input 50 times. Ma, million years ago.
Fig. 2 Overview of the analyses of core genes over all Laverania genomes. a, Summary of evolution of core genes. From outer to inner track: scatterplot of branch-site test for each genome (see Supplementary Table 4 for P. falciparum data); per-species d N /d S values (0.5 < d N /dS < 2); orthologues are represented by vertical black lines under the chromosome track, with dots representing P. falciparum 3D7 var genes on the forward (blue) or reverse strands (red), or var pseudogenes (black); average of the relative polymorphism (π) across species, with the underlying π for each species calculated from multiple strains (“Lav15st” dataset) and normalized by the average for that species; signatures of convergent evolution on the basis of host-specific fixed differences analysis with the chromosome 4 region that includes the Rh5 locus highlighted (black box). b, Magnified view of the Rh5 region that is enriched with host-specific fixed differences. Convergent evolution analysis was performed using orthologues conserved across the Laverania. Filled circles represent the subset of differences that were fixed within all the isolates available (“Lav15st” set) and for which we could reject neutral evolution (for the gene list see Supplementary Table 5).
Fig. 3 Clustering of Pir (Rifin and Stevor) proteins families. Graphical representation of similarity between all pir proteins >250 amino acids, coloured by species. A BLAST cutoff of 45% global identity was used (see Methods). More connected genes are more similar. Black circles highlight clade A rifin proteins that cluster with clade B rifin proteins.
Fig. 4 Evolution of var gene domains in the Laverania. a, Heatmap of numbers of var gene domains in each Laverania species. Duffy represents regions closest to the Pfam Duffy binding domain. CIDRn is a new domain discovered in this study in clade A. Only domains from var genes longer than 2.5 kb were considered. Heatmap colours blue-yellow-white indicate decreasing copy numbers. b, Graphical representation of similarity between domains, using domains from var genes longer than 2.5 kb. Domains are coloured by species and clustered by a minimum BLAST cutoff of 45% global identity. Larger circles denote var genes in the opposite orientation. c, Maximum likelihood trees of the ATS. Apparent ATS sequences from clade A that cluster with clade B are indicated (**).
Fig. 5 Overview of the genomic evolution of the Laverania subgenus. The values of polymorphism (π) within the species are indicated by triangles of different size at the end of the tree branches, as well the bottleneck in P. falciparum (constricted branch width), ~5,000 years ago. Also shown are the gene transfers that occurred between certain clade A and B species and the huge genomic differences that accumulated in clade B after the divergence with P. blacklocki.
- Thomas D. Otto, Aude Gilabert, Thomas Crellen, Céline Arnathau, Mandy Sanders, Samuel O. Oyola, Alain Prince Okouga, Larson Boundenga, Eric Willaume, Barthélémy Ngoubangoye, Nancy Diamella Moukodoum, Christophe Paupy, Patrick Durand, Virginie Rougeron, Benjamin Ollomo, François Renaud, Chris Newbold, Matthew Berriman, Franck Prugnolle
- firstname.lastname@example.org; email@example.com; firstname.lastname@example.org
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, UK - Laboratoire MIVEGEC, UMR 5290-224 CNRS 5290-IRD224-UM, Montpellier, France - 5290-224 CNRS 5290-IRD224-UM, Montpellier, France. - Department of Infectious Disease Epidemiology, Imperial College London, London, UK - International Livestock Research Institute, Nairobi, Kenya - Centre International de Recherches Médicales de Franceville, Franceville, Gabon - Sodepal, Parc of la Lékédi, Bakoumba, Gabon - 8 Weatherall Institute of Molecular Medicine, University of Oxford, John Radcliffe Hospital, Oxford, UK - Present address: Centre of Immunobiology, Institute of Infection, Immunity & Inflammation, College of Medical, Veterinary and Life Sciences University of Glasgow, Glasgow, UK
- This work was funded by ANR ORIGIN JCJC 2012, LMI ZOFAC, CNRS, CIRMF, IRD, and the Wellcome Trust (grants WT 098051 and WT 206194 to the Sanger Institute, grant 104792/Z/14/Z to C.N.). T.C. holds an MRC DTP Studentship. We thank G. Rutledge for performing the sWGA and J. Rayner and F. J. Ayala for helpful discussion. We thank the PlasmoDB team for promptly making these data available.
Nat Microbiol 3, 687–697 (2018)