Documentation / Reference Databases / gnomAD
gnomAD
Helix Insight uses two separate gnomAD datasets: variant-level population frequencies for assessing how common a variant is in healthy individuals, and gene-level constraint metrics for assessing how tolerant a gene is to different types of mutations. Both are loaded locally and queried without external API calls.
gnomAD Variant Frequencies
Population allele frequencies from 807,162 individuals across 8 genetic ancestry groups. This is the primary source for population frequency evidence in ACMG classification. A variant observed at high frequency in healthy individuals is unlikely to cause a rare genetic disorder.
ACMG Criteria Using Variant Frequencies
Global allele frequency > 5%. This single criterion classifies the variant as Benign regardless of all other evidence.
Frequency higher than expected for the disorder. Inheritance-aware thresholds: >= 0.1% for autosomal dominant (haploinsufficiency score = 3), >= 5% for autosomal recessive.
Observed in > 15 homozygous individuals in the healthy population. Indicates the homozygous state is tolerated.
Absent from controls or at extremely low frequency (global AF < 0.01%). Frequency data must be present (non-NULL) to trigger.
Frequency Columns (6)
Global allele frequency across all populations. Primary field for BA1 (>5%), BS1, and PM2 (<0.01%) criteria.
Global allele count. Number of times the alternate allele was observed across all samples.
Global allele number. Total number of alleles genotyped at this position. Used to assess coverage adequacy.
Global homozygote count. Number of individuals homozygous for the alternate allele. Used for BS2 (>15 homozygotes).
Maximum allele frequency across ancestry groups. Identifies population-specific enrichment that global AF might mask.
Population with the highest allele frequency. Reports which ancestry group shows the highest frequency for this variant.
Ancestry Groups
gnomAD v4.1 categorizes individuals into 8 genetic ancestry groups. The popmax field reports which group shows the highest allele frequency for a given variant, which can be clinically relevant for population-specific disease prevalence.
| Code | Ancestry Group | Approximate Samples |
|---|---|---|
| AFR | African / African American | ~30,000 |
| AMR | Admixed American / Latino | ~16,000 |
| ASJ | Ashkenazi Jewish | ~5,000 |
| EAS | East Asian | ~10,000 |
| FIN | Finnish | ~13,000 |
| MID | Middle Eastern | ~3,000 |
| NFE | Non-Finnish European | ~450,000 |
| SAS | South Asian | ~19,000 |
Note: Non-Finnish European (NFE) represents the majority of the cohort. Variants enriched in underrepresented populations may have less precise frequency estimates.
gnomAD Gene Constraint
A separate gnomAD dataset provides gene-level constraint metrics derived from the same population data. While variant frequencies tell you how common a specific variant is, constraint metrics tell you how tolerant the gene itself is to different types of mutations. A gene with very few observed loss-of-function variants relative to expectation is likely essential for normal function, and loss-of-function variants in that gene are more likely to be pathogenic.
ACMG Criteria Using Gene Constraint
Loss-of-function variant in a gene intolerant to LoF. Requires pLI > 0.9 OR LOEUF < 0.35. The strongest automated pathogenic criterion.
Missense variant in a gene with constraint against missense variation. Requires pLI > 0.5.
Missense variant in a gene tolerant to loss-of-function (pLI < 0.1). If LoF variants are tolerated, missense variants are even less likely to be pathogenic.
Constraint Columns (4)
Probability of Loss-of-function Intolerance. Ranges 0 to 1. Values > 0.9 indicate the gene is highly intolerant to loss-of-function variants. Used for PVS1 (> 0.9), PP2 (> 0.5), and BP1 (< 0.1).
Loss-of-function Observed/Expected Upper bound Fraction. Lower values indicate stronger constraint. Values < 0.35 trigger PVS1 as an alternative to pLI. The preferred constraint metric in recent literature.
Loss-of-function observed/expected ratio. The point estimate of how many LoF variants are observed versus expected. LOEUF is the upper confidence bound of this ratio.
Missense Z-score. Positive values indicate the gene has fewer missense variants than expected (missense-constrained). Used in the Screening Service for gene relevance scoring.
pLI vs. LOEUF
Both pLI and LOEUF measure gene intolerance to loss-of-function variants, but they use different statistical approaches. pLI is a probability (0 to 1) from a discrete classification model. LOEUF is the upper confidence bound of the observed/expected ratio and provides a continuous measure of constraint. Recent literature favors LOEUF because it better captures the spectrum of constraint rather than forcing genes into discrete categories. Helix Insight accepts either metric for PVS1 (pLI > 0.9 OR LOEUF < 0.35) to maximize sensitivity.
Limitations
gnomAD excludes individuals with known Mendelian disease diagnoses, but does not screen for carrier status or late-onset conditions.
Coverage varies across the genome. Low-coverage regions may have NULL frequency data, which prevents PM2 from triggering.
Structural variants and complex rearrangements are not represented in the SNV/indel dataset.
Population ancestry group assignment is based on principal component analysis, not self-reported ethnicity.
Rare variants in underrepresented populations may have inflated or absent frequency estimates due to smaller sample sizes.
Gene constraint metrics reflect population-level observations. A gene with low pLI may still harbor pathogenic variants in specific domains or functional regions.
Constraint metrics are gene-wide averages. They do not capture regional variation in constraint within a gene.
References
Chen S, et al. "A genomic mutational constraint map using variation in 76,156 human genomes." Nature. 2024;625:92-100. PMID: 38057664
Karczewski KJ, et al. "The mutational constraint spectrum quantified from variation in 141,456 humans." Nature. 2020;581:434-443. PMID: 32461654