Malaria protection due to sickle haemoglobin depends on parasite genotype - Nature.com

Abstract

Host genetic factors can confer resistance against malaria1, raising the question of whether this has led to evolutionary adaptation of parasite populations. Here we searched for association between candidate host and parasite genetic variants in 3,346 Gambian and Kenyan children with severe malaria caused by Plasmodium falciparum. We identified a strong association between sickle haemoglobin (HbS) in the host and three regions of the parasite genome, which is not explained by population structure or other covariates, and which is replicated in additional samples. The HbS-associated alleles include nonsynonymous variants in the gene for the acyl-CoA synthetase family member2,3,4 PfACS8 on chromosome 2, in a second region of chromosome 2, and in a region containing structural variation on chromosome 11. The alleles are in strong linkage disequilibrium and have frequencies that covary with the frequency of HbS across populations, in particular being much more common in Africa than other parts of the world. The estimated protective effect of HbS against severe malaria, as determined by comparison of cases with population controls, varies greatly according to the parasite genotype at these three loci. These findings open up a new avenue of enquiry into the biological and epidemiological significance of the HbS-associated polymorphisms in the parasite genome and the evolutionary forces that have led to their high frequency and strong linkage disequilibrium in African P. falciparum populations.

Main

Malaria can be viewed as an evolutionary arms race between the host and parasite populations. Human populations in Africa have acquired a high frequency of HbS and other erythrocyte polymorphisms that provide protection against the severe symptoms of P. falciparum infection1,5, while P. falciparum populations have evolved a complex repertoire of genetic variation to evade the human immune system and to resist antimalarial drugs6,7. This raises the basic question: are there genetic forms of P. falciparum that can overcome the human variants that confer resistance to this parasite?

To address this question, we analysed both host and parasite genome variation in samples from 5,096 children from Gambia and Kenya with severe malaria caused by P. falciparum (Extended Data Fig. 1, Supplementary Fig. 1, Methods). The samples were collected over the period 1995–2009 as part of a genome-wide association study (GWAS) of human resistance to severe malaria5,8,9. In brief, we sequenced the P. falciparum genome using the Illumina X Ten platform using two approaches based on sequencing whole DNA and selective whole-genome amplification10. We used an established pipeline11 to identify and call genotypes at more than two million single nucleotide polymorphisms (SNPs) and short insertion and deletion variants across the P. falciparum genome in these samples (Methods), although the majority of these occurred at low frequency. Our analysis is based on the 4,171 samples that had high quality data for both parasite and human genotypes, of which a subset of 3,346 had human genome-wide genotyping available and were used for discovery analysis. We focused on a set of 51,225 biallelic variants in the P. falciparum genome that passed all quality control filters and were observed in at least 25 infections in this subset (Methods). Our analyses exclude mixed-genotype calls that arise in malaria when a host is infected with multiple parasite lineages. Full details of our sequencing and data processing can be found in Supplementary Methods.

We used a logistic-regression approach to test for pairwise association between these P. falciparum variants and four categories of human variants that are plausibly associated with malaria resistance: (1) known autosomal protective mutations, including HbS (in HBB), the common mutation that determines the O blood group (in ABO), regulatory variation associated with protection at ATP2B45,8,12 and the structural variant DUP4, which encodes the Dantu blood-group phenotype13; (2) variants that showed suggestive but not conclusive evidence for association with severe malaria in our previous GWAS8; (3) human leukocyte antigen (HLA) alleles and additional glycophorin structural variants that we previously imputed in these samples8,13; and (4) variants near genes that encode human blood-group antigens, which we tested against the subset of P. falciparum variants lying near genes that encode proteins important for the merozoite stage14,15, as these might conceivably interact during host cell invasion by the parasite. Although several factors could confound this analysis in principle—notably, if there was incidental association between human and parasite population structure—the distribution of test statistics suggested that our test was not affected by systematic confounding after including only an indicator of country as a covariate (Supplementary Fig. 2), and we used this approach for our main analysis. The full set of results is summarized in Fig. 1a, Supplementary Table 1.

Fig. 1: Three regions of the P. falciparum genome are associated with HbS.
figure 1

a, Points show the evidence for association between each P. falciparum variant and human genotypes (top row) or between each included human variant and P. falciparum genotypes (bottom row). Association evidence is summarized by averaging the evidence for pairwise association (Bayes factor (BF) for test in n = 3,346 samples) between each variant (points) and all variants in the other organism against which it was tested (log10 (BFavg)). P. falciparum variants are shown grouped by chromosome, and human variants are grouped by inclusion category as described in text and Methods. Dashed lines and variant annotations reflect pairwise tests with BF > 106; only the top signal in each association region pair is annotated (Methods). b, Detail of the association with HbS in the Pfsa1, Pfsa2 and Pfsa3 regions of the P. falciparum genome. Points show evidence for association with HbS (log10 (BFHbS)) for each regional variant. Variants that alter protein coding sequence are denoted by plus, and other variants are denoted by circles. Results are computed by logistic regression including an indicator of country as a covariate and assuming an additive model of association, with HbS genotypes based on imputation from genome-wide genotypes as previously described8. Mixed and missing P. falciparum genotype calls were excluded from the computation. Below, regional genes are annotated, with gene symbols given where the gene has an ascribed name in the PlasmoDB annotation (after removing 'PF3D7_' from the name where relevant); the three genes containing the most-associated variants are shown in red. A corresponding plot using directly typed HbS genotypes is presented in Extended Data Fig. 2.

Full size image

Three P. falciparum loci are associated with HbS

The most prominent finding to arise from this joint analysis of host and parasite variation was a strong association between the sickle haemoglobin allele HbS and three separate regions in the P. falciparum genome (Fig. 1b). Additional associations with marginal levels of evidence were observed at a number of other loci, including a potential association between GCNT2 in the host and PfMSP4 in the parasite and associations involving HLA alleles (detailed in Supplementary Methods, Supplementary Table 1), but here we focus on the association with HbS.

The statistical evidence for association at the HbS-associated loci can be described as follows, focussing on the variant with the strongest association in each region and assuming an additive model of effect of the host allele on parasite genotype on the log-odds scale (Supplementary Table 1). The chr2: 631,190 T>A variant, which lies in PfACS8, was associated with HbS with a Bayes factor (BFHbS) of 1.1 × 1015 (computed under a log F(2,2) prior; Methods) and P value of 4.8 × 10−13 (computed using a Wald test; Supplementary Methods). At a second region on chromosome 2, the chr2: 814,288 C>T variant, which lies in Pf3D7_0220300, was associated with BFHbS = 2.4 × 109 and P = 1.6 × 10−10. At the chromosome 11 locus, the chr11: 1,058,035 T>A variant, which lies in Pf3D7_1127000, was associated with BFHbS = 1.5 × 1017 and P = 7.3 × 10−12. For brevity, we refer to these HbS-associated loci as Pfsa1, Pfsa2 and Pfsa3, respectively—for P. falciparum sickle-associated—and we use + and – signs to refer to alleles that are positively and negatively correlated with HbS, respectively. For example, Pfsa1+ denotes the allele that is positively correlated with HbS at the Pfsa1 locus. All three of the lead variants are nonsynonymous mutations of their respective genes, as are additional associated variants in these regions (Fig. 1, Supplementary Table 1).

The above results are based on HbS genotypes imputed from surrounding haplotype variation8, but we focus below on the larger set of 4,071 cases in which we have previously directly assayed HbS genotypes5 (Extended Data Fig. 1). This includes the majority of samples used in our discovery analysis. The Pfsa1 and Pfsa3 associations were clearly supported in both populations in this dataset, whereas Pfsa2+ appears rare in Gambia (Supplementary Tables 2, 3). We also observed convincing replication of the associations in the additional 825 samples that were not part of our discovery phase, with nominal replication of Pfsa3 in the Gambia (one-tailed P = 0.026, N = 163) and replication of all three loci in the larger sample from Kenya (P < 0.001, N > 540) (Supplementary Table 2). Across the full dataset there is thus very strong evidence of association with HbS at all three loci (BFHbS = 2.0 × 1021 for Pfsa1, 3.7 × 1012 for Pfsa2, and 1.4 × 1024 for Pfsa3; Extended Data Fig. 2) with corresponding large effect size estimates (estimated odds ratio (OR) = 12.8 for Pfsa1+, 7.5 for Pfsa2+ and 21.7 for Pfsa3+). As described above, these estimates assume an additive relationship between HbS and the P. falciparum genotype at each locus, but we also noted that there is greatest evidence for a dominance effect (Supplementary Tables 2, 3).

We further examined the effect of adjusting for covariates in our data, including human and parasite principal components reflecting population structure, year of sampling, clinical type of severe malaria and technical features related to sequencing (Extended Data Fig. 3). Inclusion of these covariates did not substantially affect results with one exception: we found that parasite principal components computed across the whole P. falciparum genome included components that correlated with the Pfsa loci, and including these principal components reduced the association signal, particularly in Kenya. Altering the principal components by removing the Pfsa regions restored the association, indicating that this is not caused by a general population structure effect that is reflected in genotypes across the parasite genome, and we further discuss the reasons for this finding below. Finally, we analysed available data from a set of 32 uncomplicated infections of Malian children ascertained based on HbS genotypes16 (Methods); this provided further replication of the associations with Pfsa1 and Pfsa3 (Supplementary Table 2). Together, these data indicate that there are genuine differences in the distribution of parasite genotypes between infections of HbS and non-HbS genotype individuals.

HbS protection varies with parasite type

The level of protection afforded by HbS against severe malaria can be estimated by comparing its frequency between cases and population controls. As shown in Fig. 2, the vast majority of children with HbS genotype in our data were infected with parasites that carry Pfsa+ alleles. Corresponding to this, our data show little evidence of a protective effect of HbS against severe malaria with parasites of Pfsa1+, Pfsa2+ and Pfsa3+ genotype (estimated relative risk (RR) = 0.83, 95% confidence interval = 0.53–1.30). By contrast, HbS is strongly associated with reduced risk of disease caused by parasites of Pfsa1, Pfsa2 and Pfsa3 genotype (RR = 0.01, 95% confidence interval = 0.007–0.03). These estimates should be interpreted with caution because they are based on just 49 cases of severe malaria that had an HbS genotype, because many of these samples were included in the initial discovery dataset, and because there is some variation evident between populations. However, it can be concluded that the protective effect of HbS is dependent on parasite genotype at the Pfsa loci.

Fig. 2: The estimated relative risk for HbS varies by Pfsa genotype.
figure 2

a, Numbers of cases of severe malaria from the Gambia and Kenya with indicated HbS genotype (columns) and carrying the indicated alleles at the Pfsa1, Pfsa2 and Pfsa3 loci (rows; using n = 4,054 samples with directly typed HbS genotype and non-missing genotype at the three P. falciparum loci). Pfsa alleles positively associated with HbS are denoted + and those negatively associated with HbS are denoted − for the respective loci. Samples with mixed P. falciparum genotype calls for at least one of the loci are shown in the bottom row and further detailed in Extended Data Fig. 4. The first row indicates counts of HbS genotypes in population control samples from the same populations8. b, The estimated relative risk of HbS for severe malaria with Pfsa genotypes (rows) as indicated in a. Relative risks were estimated using a multinomial logistic regression model with controls as the baseline outcome and assuming complete dominance (that is, that HbAS and HbSS genotypes have the same association with parasite genotype) as described in Supplementary Methods; an indicator of country was included as a covariate. Circles reflect posterior mean estimates and horizontal lines reflect the corresponding 95% credible intervals (CI). Estimates based on less than 5 individuals with HbAS or HbSS genotypes are represented by smaller circles. To reduce overfitting we used Stan46 to fit the model assuming a mild regularising Gaussian prior with mean zero and standard deviation of 2 on the log-odds scale (that is, with 95% of mass between 1/50 and 50 on the relative risk scale) for each parameter, and between-parameter correlations set to 0.5.

Full size image

Population genetics of the Pfsa loci

The Pfsa1+, Pfsa2+ and Pfsa3+ alleles had similar frequencies in Kenya (approximately 10–20%) whereas in Gambia Pfsa2+ had a much lower allele frequency than Pfsa1+ or Pfsa3+ (below 3% in all years studied, versus 25–60% for the Pfsa1+ or Pfsa3+ alleles; Fig. 3a). To explore the population genetic features of these loci in more detail, we analysed the MalariaGEN Pf6 open resource, which provides P. falciparum genome variation data for 7,000 worldwide samples11 (Fig. 3b). This showed considerable variation in the frequency of these alleles across Africa, the maximum observed value being 61% for Pfsa3+ in the Democratic Republic of Congo, and indicated that these alleles are rare outside Africa. Moreover, we found that within Africa, population frequencies of the Pfsa+ alleles are strongly correlated with the frequency of HbS (Fig. 3c; estimated using data from the Malaria Atlas Project17).

Fig. 3: The relationship between Pfsa and HbS allele frequencies across populations.
figure 3

a, Bars show the estimated frequency of each Pfsa+ allele in severe cases of malaria from each country. Details of allele frequencies and sample counts across years are presented in Extended Data Fig. 5. b, Estimated frequency of each Pfsa+ allele in worldwide populations from the MalariaGEN Pf6 resource11, which contains samples collected during the period 2008–2015. Only countries with at least 50 samples are shown (this excludes Columbia, Peru, Benin, Nigeria, Ethiopia, Madagascar and Uganda). c, Estimated population-level Pfsa+ allele frequency (as in a, b) against HbS allele frequency in populations from MalariaGEN Pf6 (coloured as in b; selected populations are also labelled). Pfsa+ allele frequencies were computed from the relevant genotypes, after excluding mixed or missing genotype calls. HbS allele frequencies were computed from frequency estimates previously published by the Malaria Atlas Project17 for each country, by averaging over the locations of MalariaGEN Pf6 sampling sites weighted by the sample size. DR, Democratic Republic; PNG, Papua New Guinea.

Full size image

This analysis also revealed a further feature of the Pfsa+ alleles: although Pfsa1 and Pfsa2 are separated by 180 kb, and the Pfsa3 locus is on a different chromosome, they are in strong linkage disequilibrium (LD). This can be seen from the co-occurrence of these alleles in severe cases (Fig. 2), and from the fact that they covary over time in our sample (Extended Data Fig. 5) and geographically across populations (Fig. 3b). We computed LD metrics between the Pfsa+ alleles in each population (Supplementary Table 4) after excluding HbS-carrying individuals to avoid confounding with the association outlined above. Pfsa1+ and Pfsa2+ were strongly correlated in Kenyan severe cases (r = 0.75) and Pfsa1+ and Pfsa3+ were strongly correlated in both populations (r = 0.80 in Kenya; and r = 0.43 in severe cases from the Gambia). This high LD was not explained by population structure or other covariates in our data (Methods), and was also observed in multiple populations in MalariaGEN Pf6 (for example, r = 0.20 between Pfsa1+ and Pfsa3+ in the Gambia; r = 0.71 in Kenya; and r > 0.5 in all other African populations surveyed; Supplementary Table 4), showing that the LD is not purely an artefact of our sample of severe malaria cases.

This observation of strong correlation between alleles at distant loci is unexpected, because the P. falciparum genome undergoes recombination in the mosquito vector and typically shows very low levels of LD in malaria-endemic regions11,18,19. To confirm that this is unusual, we compared LD between the Pfsa loci with the distribution computed from all common biallelic variants on different chromosomes (Fig. 4). In Kenyan samples, the Pfsa loci have the highest between-chromosome LD of any pair of variants in the genome. In Gambia, between-chromosome LD at these SNPs is also extreme, but another pair of extensive regions on chromosomes 6 and 7 also show strong LD. These regions contain the chloroquine resistance-linked genes PfCRT and PfAAT120,21 and contain long stretches of DNA sharing identical by descent, consistent with positive selection of antimalarial-resistant haplotypes22. Moreover, we noted that these signals are among a larger set of HbS-associated and drug-resistance loci that appear to have increased between-chromosome LD in these data (Supplementary Table 4).

Fig. 4: HbS-associated variants show extreme between-chromosome correlation in severe P. falciparum infections.
figure 4

Empirical distribution of absolute genotype correlation (|r|) between pairs of variants on different P. falciparum chromosomes in the Gambia (top) and Kenya (bottom). To avoid capturing direct effects of the HbS association, correlation values are computed after excluding HbS-carrying individuals. All pairs of biallelic variants with estimated minor allele frequency at least 5% and at least 75% of samples having non-missing and non-mixed genotype call are shown (totalling 16,487 variants in the Gambia and 13,766 variants in Kenya). Colours indicate the subset of comparisons between HbS-associated variants in Pfsa regions relevant for the population (red) and between variants in LD with the CRT K76T mutation. Labelled points denote the variant pairs showing the highest and second-highest pairwise correlation in each population after grouping correlated variants into regions; for this purpose regions were defined to include all nearby pairs of correlated variants with minor allele frequency ≥5% and r2 > 0.05, such that no other such pair of variants within 10 kb of the given region boundaries is present (Methods). A longer list of regions showing increased between-chromosome LD is presented in Supplementary Table 5.

Full size image

Combining these new findings with other population genetic evidence from multiple locations across Africa, including observations of frequency differentiation within and across P. falciparum populations11,23,24 and other metrics at these loci indicative of selection22,25,26, it appears likely that the allele frequencies and strong LD between Pfsa1, Pfsa2 and Pfsa3 are maintained by some form of natural selection. However, the mechanism for this is unclear. Given our findings, an obvious hypothesis is that the Pfsa1+, Pfsa2+ and Pfsa3+ alleles are positively selected in hosts with HbS, but since the frequency of HbS carriers5,17 is typically <20% it is not clear whether this alone is a sufficient explanation to account for the high population frequencies or the strong LD observed in non-HbS carriers. Equally, since the Pfsa+ alleles have not reached fixation (Fig. 3) and do not appear to be rapidly increasing in frequency (Extended Data Fig. 5), an opposing force may also be operating to maintain their frequency. However, the above data do not suggest strong fitness costs for Pfsa+-carrying parasites in HbAA individuals (Fig. 2), and the Pfsa2+ allele also appears to be present only in east Africa, further complicating these observations. It thus remains entirely possible that additional selective factors are involved, such as epistatic interactions between these loci, or further effects on fitness in the host or vector in addition to those observed here in relation to HbS.

The genomic context of the Pfsa variants

The biological function of these parasite loci is an area of considerable interest for future investigation. At the Pfsa1 locus, the signal of association includes non-synonymous changes in the PfACS8 gene, which encodes an acyl-CoA synthetase3 that belongs to a gene family that has expanded in the Laverania relative to other Plasmodium species4 and lies close to a paralogue PfACS9 on chromosome 2. PfACS8 has been predicted to localize to the apicoplast27, but it also contains a Plasmodium export element (PEXEL)-like motif28,29,30, which may instead indicate export to the host cytosol where other acyl-CoA synthetase family members have been observed31. The functions of the proteins encoded by PF3D7_0220300 (an exported protein, at the Pfsa2 locus) and PF3D7_1127000 (a putative tyrosine phosphatase, at Pfsa3) are not known; however, the protein encoded by PF3D7_0220300 has been observed to localise to the host membrane and to colocalise with host stomatin32, whereas the protein encoded by PF3D7_1127000 has been observed in the food vacuole33. All three genes appear to be expressed at multiple parasite lifecycle stages (Supplementary Text) in 3D7 parasites and are not essential for in vitro growth34.

We noted two further features that may point to the functional role of the Pfsa+ alleles themselves. The associated variants at Pfsa2 and Pfsa3 each include SNPs immediately downstream of a PEXEL motif (detailed in Supplementary Information), which mediates export through a pathway that involves protein cleavage at the motif35. This process leaves the downstream amino acids at the N terminus of the mature protein, and it is therefore possible that these variants influence successful export36,37. However, another possibility is that the Pfsa+ alleles affect levels of transcription of the relevant genes. In this context, we noted a recent study16 that found that PF3D7_1127000 is among the most differentially over-expressed genes in trophozoite-stage infections of children with HbAS compared with those with HbAA (more than 32-fold increase in transcripts per million (TPM) at the trophozoite stage; n = 12; unadjusted P = 5.6 × 10−22). We reanalysed these data in light of genotypes at the Pfsa loci (Supplementary Table 6), and found that the Pfsa3+ mutations plausibly explain this increased expression. In particular, read ratios at the second-most-associated Pfsa3 SNP (chr11:1,057,437 T > C) (Supplementary Table 1) appear especially strongly correlated with increased expression at trophozoite stage (Extended Data Fig. 6). Further support for this observation comes from an in vitro time-course experiment conducted in the same study16, in which the increased expression was also observed in HbAA erythrocytes infected with a Pfsa+-carrying isolate (Extended Data Figs. 7, 8, Methods). The mechanism of upregulation is not known, but a further relevant observation is that the Pfsa3+ alleles appear to be linked to a neighbouring copy number variant that includes duplication of the 5ʹ end of the small nuclear ribonucleoprotein gene SNRPF, upstream of PF3D7_1127000 (based on analysis of available genome assemblies of P. falciparum isolates38; Extended Data Fig. 9, Supplementary Fig. 3). We caution that these findings are tentative, and the manner in which Pfsa alleles affect genome function is a subject for future research. Understanding this functional role could provide important clues into how HbS protects against malaria and help to distinguish between the various proposed mechanisms, which include enhanced macrophage clearance of infected erythrocytes39, inhibition of intraerythrocytic growth dependent on oxygen levels40, altered cytoadherence of infected erythrocytes41 due to cytoskeleton remodelling42, and immune-mediated mechanisms43.

Discussion

A fundamental question in the biology of host–pathogen interactions is whether the genetic makeup of infections is determined by the genotype of the host. While there is some previous evidence of this in malaria—for example, allelic variants of the PfCSP gene have been associated with HLA type44 and HbS has itself previously been associated with MSP-1 alleles45 (described further in Supplementary Information)—our findings provide clear evidence of an interaction between genetic variants in the parasite and the host. Our central discovery is that among African children with severe malaria there is a strong association between HbS in the host and three loci in different regions of the parasite genome. Based on estimation of relative risk, HbS has no apparent protective effect against severe malaria in the presence of the Pfsa1+, Pfsa2+ and Pfsa3+ alleles. These alleles, which are much more common in Africa than elsewhere, are positively correlated with HbS allele frequencies across populations. However, they are also found in substantial numbers of individuals without HbS, reaching up to 60% allele frequency in some populations. The Pfsa1, Pfsa2 and Pfsa3 loci also show remarkably high levels of long-range between-locus LD relative to other loci in the P. falciparum genome, which is equally difficult to explain without postulating ongoing evolutionary selection. Although it seems clear that HbS has a key role in this selective process, there is a need for further population surveys (that include asymptomatic and uncomplicated cases of malaria) to gain a more detailed understanding of the genetic interactions between HbS and these parasite loci, and how they affect the overall protective effect of HbS against severe malaria.

Methods

Ethics and consent

Sample collection and design of our case-control study8 was approved by Oxford University Tropical Research Ethics committee (OXTREC), Oxford, United Kingdom (OXTREC 020-006). Informed consent was obtained from parents or guardians of patients with malaria, and from mothers for population controls. Local approving bodies were the MRC/Gambia Government Ethics Committee (SCC 1029v2 and SCC670/630) and the KEMRI Research Ethics Committee (SCC1192).

Building a combined dataset of human and P. falciparum genotypes for severe cases

We used Illumina sequencing to generate two datasets jointly reflecting human and P. falciparum genetic variation, using a sample of severe malaria cases from the Gambia and Kenya for which human genotypes have previously been reported5,8. A full description of our sequencing and data processing is given in Supplementary Methods and summarized in Extended Data Fig. 1. In brief, following a process of sequence data quality control and merging across platforms, we generated (1) a dataset of microarray and imputed human genotypes, and genome-wide P. falciparum genotypes, in 3,346 individuals previously identified as without close relationships8; and (2) a dataset of HbS genotypes directly typed on the Sequenom iPLEX Mass-Array platform (Agena Biosciences)5, and genome-wide P. falciparum genotypes, in 4,071 individuals without close relationships8. Parasite DNA was sequenced from whole DNA in samples with high parasitaemia, and using selective whole-genome amplification (SWGA) to amplify P. falciparum DNA in all samples. P. falciparum genotypes were called using an established pipeline11 based on GATK, which calls single nucleotide polymorphisms and short insertion–deletion variants relative to the Pf3D7 reference sequence. This pipeline deals with mixed infections by calling parasite variants as if the samples were diploid; in practice this means that variants with substantial numbers of reads covering reference and alternate alleles are called as heterozygous genotypes.

For the analyses presented in main text, we used the 3,346 samples with imputed human genotypes for our initial discovery analysis, and the 4,071 individuals with directly typed HbS genotypes for all other analysis. The individuals in these two datasets substantially overlap (Extended Data Fig. 1), but a subset of 825 individuals have directly typed for HbS but were not in the discovery data and we used these for replication.

Inference of genetic interaction from severe malaria cases

To describe our approach, we first consider a simplified model of infection in which parasites have a single definite (measurable) genotype, acquired at time of biting, that is relevant to disease outcome—that is, we neglect any effects of within-host mutation, co- and super-infection at the relevant genetic variants. We consider a population of individualswho are susceptible to being been bitten by an infected mosquito. A subset of infections go on to cause severe disease. Among individuals who are bitten and infected with a particular parasite type I = x, the association of a human allele E = e with disease outcome can be measured by the relative risk,

$${\rm{RR}}=\frac{P({\rm{disease}}|E=e,I={\rm{x}})}{P({\rm{disease}}|E=0,I={\rm{x}})}$$
(1)

where we have used E = 0 to denote a chosen baseline human genotype against which risks are measured. If the strength of association further varies between parasite types (say between \(I=x\) and a chosen infection type \(I=0\)) then these relative risks will vary, and thus the ratio of relative risks (RRR) will differ from 1. If the host genotype e confers protection against severe malaria, the ratio of relative risks will therefore capture variation in the level of protection compared between different parasite types.

Although phrased above in terms of a relative risk for human genotypes, the RRR can be equivalently expressed as a ratio of relative risks for a given parasite genotype compared between two human genotypes (Supplementary Methods). It is thus conceptually symmetric with respect to human and parasite alleles, and would equally well capture variation in the level of pathogenicity conferred by a particular parasite type compared between different human genotypes.

The OR for specific human and parasite alleles computed in severe malaria cases is formally similar to the ratio of relative risks but with the roles of the genotypes and disease status interchanged. We show in Supplementary Methods that in fact

$${{\rm{OR}}={\rm{RRR}}\times {\rm{OR}}}^{{\rm{biting}}}$$
(2)

where ORbiting is a term that reflects possible association of human and parasite genotypes at the time of mosquito biting. Thus, under this model and in the absence of confounding factors, \({\rm{OR}}\ne 1\) implies either that host and parasite genotypes are not independent at time of biting, or that there is an interaction between host and parasite genotypes in determining disease risk. The former possibility may be considered less plausible because it would seem to imply that relevant host and parasite genotypes can be detected by mosquitos prior to or during biting, but we stress that this cannot be tested formally without data on mosquito-borne parasites. A further discussion of these assumptions can be found in Supplementary Methods.

Testing for genome-to-genome correlation

We developed a C++ program (HPTEST) to efficiently estimate the odds ratio (equation (2)) across multiple human and parasite variants, similar in principle to approaches that have been developed for human-viral and human-bacterial GWAS47,48,49. HPTEST implements a logistic regression model in which genotypes from one file are included as the outcome variable and genotypes from a second file on the same samples are included as predictors. Measured covariates may also be included, and the model accounts for uncertainty in imputed predictor genotypes using the approach from SNPTEST50. The model is fit using a modified Newton-Raphson with line search method. For our main analysis we applied HPTEST with the parasite genotype as outcome and the host genotype as predictor, assuming an additive effect of the host genotype on the log-odds scale, and treating parasite genotype as a binary outcome (after excluding mixed and missing genotype calls.).

To mitigate effects of finite sample bias, we implemented regression regularised by a weakly informative log F(2,2) prior distribution51 on the effect of the host allele (similar to a Gaussian distribution with standard deviation 1.87; Supplementary Methods). Covariate effects were assigned a log F(0.08,0.08) prior, which has similar 95% coverage interval to a gaussian with zero mean and standard deviation of 40. We summarised the strength of evidence using a Bayes factor against the null model that the effect of the host allele is zero. A P-value can also be computed under an asymptotic approximation by comparing the maximum posterior estimate of effect size to its expected distribution under the null model (Supplementary Methods). For our main results we included only one covariate, an indicator of the country from which the case was ascertained (Gambia or Kenya); additional exploration of covariates is described below.

Choice of genetic variants for testing

For our initial discovery analysis we concentrated on a set of 51,552 P. falciparum variants that were observed in at least 25 individuals in our discovery set, after excluding any mixed or missing genotype calls. These comprised: 51,453 variants that were called as biallelic and passed quality filters (detailed in Supplementary Methods; including the requirement to lie in the core genome52); an additional 98 biallelic variants in the region of PfEBL1 (which lies outside the core genome but otherwise appeared reliably callable); and an indicator of the PfEBA175 'F' segment, which we called based on sequence coverage as described in Supplementary Methods and Supplementary Fig. 6. We included PfEBL1 and PfEBA175 variation because these genes encode known or putative receptors for P. falciparum during invasion of erythrocytes15.

We concentrated on a set of human variants chosen as follows: we included the 94 autosomal variants from our previously reported list of variants with the most evidence for association with severe malaria8, which includes confirmed associations at HBB, ABO, ATP2B4 and the glycophorin locus. We also included three glycophorin structural variants13, and 132 HLA alleles (62 at 2-digit and 70 at 4-digit resolution) that were imputed with reasonable accuracy (determined as having minor allele frequency > 5% and IMPUTE info at least 0.8 in at least one of the two populations in our dataset). We tested these variants against all 51,552 P. falciparum variants described above. We also included all common, well-imputed human variants within 2 kb of a gene determining a blood-group antigen (defined as variants within 2 kb of a gene in the HUGO blood-group antigen family53 and having a minor allele frequency of 5% and an IMPUTE info score of at least 0.8 in at least one of the two populations in our dataset; this includes 39 autosomal genes and 4,613 variants in total). We tested these against all variants lying within 2 kb of P. falciparum genes previously identified as associated or involved in erythrocyte invasion14,15 (60 genes, 1740 variants in total). In total we tested 19,830,288 distinct human-parasite variant pairs in the discovery dataset (Fig. 1a).

Definition of regions of pairwise association

We grouped all associated variant pairs (defined as pairs (v,w) having BF(v,w) > 100, where BF(v,w) is the association test Bayes factor for the variant pair) into regions using an iterative algorithm as follows. For each associated pair (v,w), we found the smallest enclosing regions (Rv, Rw) such that any other associated pair either lay with (Rv, Rw) or lay further than 10 kb from (Rv, Rw) in the host or parasite genomes, repeating until all associated pairs were assigned to regions. For each association region pair, we then recorded the region boundaries and the lead variants (defined as the regional variant pair with the highest Bayes factor), and we identified genes intersecting the region and the gene nearest to the lead variants using the NCBI refGene54 and PlasmoDB v4455 gene annotations. Due to our testing a selected list of variant pairs as described above, in some cases these regions contain a single human or parasite variant. Supplementary Table 1 summarises these regions for variant pairs with BF > 1,000.

Interpretation of association test results

We compared association test P-values to the expectation under the null model of no association using a quantile-quantile plot, both before and after removing comparisons with HbS (Supplementary Fig. 2; HbS is encoded by the 'A' allele at rs334, chr11:5,248,232 T -> A). A simple way to interpret individual points on the QQ-plot is to compare each P-value to its expected distribution under the relevant order statistic (depicted by the grey area in Supplementary Fig. 2); for the lowest P-value this is similar to considering a Bonferron...

Comments

Popular posts from this blog

Силы специальных операций будут выполнять задачи как за ...

Providence says it offered to manage API before state awarded no-bid contract to Wellpath - Anchorage Daily News