Helix Insight

Documentation / Reference Databases / gnomAD

gnomAD

Helix Insight uses two separate gnomAD datasets: variant-level population frequencies for assessing how common a variant is in healthy individuals, and gene-level constraint metrics for assessing how tolerant a gene is to different types of mutations. Both are loaded locally and queried without external API calls.

gnomAD Variant Frequencies

Population allele frequencies from 807,162 individuals across 8 genetic ancestry groups. This is the primary source for population frequency evidence in ACMG classification. A variant observed at high frequency in healthy individuals is unlikely to cause a rare genetic disorder.

Versionv4.1.0
Records~759 million variants
Individuals807,162 across 8 ancestry groups
Genome BuildGRCh38
Match Keychromosome + position + reference allele + alternate allele
Sourcegnomad.broadinstitute.org
ProducerBroad Institute of MIT and Harvard

ACMG Criteria Using Variant Frequencies

BA1Stand-alone Benign

Global allele frequency > 5%. This single criterion classifies the variant as Benign regardless of all other evidence.

BS1Strong Benign

Frequency higher than expected for the disorder. Inheritance-aware thresholds: >= 0.1% for autosomal dominant (haploinsufficiency score = 3), >= 5% for autosomal recessive.

BS2Strong Benign

Observed in > 15 homozygous individuals in the healthy population. Indicates the homozygous state is tolerated.

PM2Moderate Pathogenic

Absent from controls or at extremely low frequency (global AF < 0.01%). Frequency data must be present (non-NULL) to trigger.

Frequency Columns (6)

global_afFLOAT

Global allele frequency across all populations. Primary field for BA1 (>5%), BS1, and PM2 (<0.01%) criteria.

global_acINTEGER

Global allele count. Number of times the alternate allele was observed across all samples.

global_anINTEGER

Global allele number. Total number of alleles genotyped at this position. Used to assess coverage adequacy.

global_homINTEGER

Global homozygote count. Number of individuals homozygous for the alternate allele. Used for BS2 (>15 homozygotes).

af_grpmaxFLOAT

Maximum allele frequency across ancestry groups. Identifies population-specific enrichment that global AF might mask.

popmaxVARCHAR

Population with the highest allele frequency. Reports which ancestry group shows the highest frequency for this variant.

Ancestry Groups

gnomAD v4.1 categorizes individuals into 8 genetic ancestry groups. The popmax field reports which group shows the highest allele frequency for a given variant, which can be clinically relevant for population-specific disease prevalence.

CodeAncestry GroupApproximate Samples
AFRAfrican / African American~30,000
AMRAdmixed American / Latino~16,000
ASJAshkenazi Jewish~5,000
EASEast Asian~10,000
FINFinnish~13,000
MIDMiddle Eastern~3,000
NFENon-Finnish European~450,000
SASSouth Asian~19,000

Note: Non-Finnish European (NFE) represents the majority of the cohort. Variants enriched in underrepresented populations may have less precise frequency estimates.

gnomAD Gene Constraint

A separate gnomAD dataset provides gene-level constraint metrics derived from the same population data. While variant frequencies tell you how common a specific variant is, constraint metrics tell you how tolerant the gene itself is to different types of mutations. A gene with very few observed loss-of-function variants relative to expectation is likely essential for normal function, and loss-of-function variants in that gene are more likely to be pathogenic.

Versionv4.1.0
Records~18,200 genes
Match Keygene_symbol
Sourcegnomad.broadinstitute.org/downloads#v4-constraint

ACMG Criteria Using Gene Constraint

PVS1Very Strong Pathogenic

Loss-of-function variant in a gene intolerant to LoF. Requires pLI > 0.9 OR LOEUF < 0.35. The strongest automated pathogenic criterion.

PP2Supporting Pathogenic

Missense variant in a gene with constraint against missense variation. Requires pLI > 0.5.

BP1Supporting Benign

Missense variant in a gene tolerant to loss-of-function (pLI < 0.1). If LoF variants are tolerated, missense variants are even less likely to be pathogenic.

Constraint Columns (4)

pLIFLOAT

Probability of Loss-of-function Intolerance. Ranges 0 to 1. Values > 0.9 indicate the gene is highly intolerant to loss-of-function variants. Used for PVS1 (> 0.9), PP2 (> 0.5), and BP1 (< 0.1).

LOEUF (oe_lof_upper)FLOAT

Loss-of-function Observed/Expected Upper bound Fraction. Lower values indicate stronger constraint. Values < 0.35 trigger PVS1 as an alternative to pLI. The preferred constraint metric in recent literature.

oe_lofFLOAT

Loss-of-function observed/expected ratio. The point estimate of how many LoF variants are observed versus expected. LOEUF is the upper confidence bound of this ratio.

mis_zFLOAT

Missense Z-score. Positive values indicate the gene has fewer missense variants than expected (missense-constrained). Used in the Screening Service for gene relevance scoring.

pLI vs. LOEUF

Both pLI and LOEUF measure gene intolerance to loss-of-function variants, but they use different statistical approaches. pLI is a probability (0 to 1) from a discrete classification model. LOEUF is the upper confidence bound of the observed/expected ratio and provides a continuous measure of constraint. Recent literature favors LOEUF because it better captures the spectrum of constraint rather than forcing genes into discrete categories. Helix Insight accepts either metric for PVS1 (pLI > 0.9 OR LOEUF < 0.35) to maximize sensitivity.

Limitations

gnomAD excludes individuals with known Mendelian disease diagnoses, but does not screen for carrier status or late-onset conditions.

Coverage varies across the genome. Low-coverage regions may have NULL frequency data, which prevents PM2 from triggering.

Structural variants and complex rearrangements are not represented in the SNV/indel dataset.

Population ancestry group assignment is based on principal component analysis, not self-reported ethnicity.

Rare variants in underrepresented populations may have inflated or absent frequency estimates due to smaller sample sizes.

Gene constraint metrics reflect population-level observations. A gene with low pLI may still harbor pathogenic variants in specific domains or functional regions.

Constraint metrics are gene-wide averages. They do not capture regional variation in constraint within a gene.

References

Chen S, et al. "A genomic mutational constraint map using variation in 76,156 human genomes." Nature. 2024;625:92-100. PMID: 38057664

Karczewski KJ, et al. "The mutational constraint spectrum quantified from variation in 141,456 humans." Nature. 2020;581:434-443. PMID: 32461654