Documentation / Reference Databases / dbNSFP
dbNSFP
dbNSFP (database of Non-synonymous Functional Predictions) provides precomputed pathogenicity predictions and conservation scores for all possible non-synonymous single nucleotide variants in the human genome. It serves as the source for both BayesDel_noAF (used in ACMG PP3/BP4 criteria) and individual predictor scores used in screening prioritization.
Database Details
Role in Classification and Screening
dbNSFP data serves two distinct roles in Helix Insight:
ACMG Classification
BayesDel_noAF (included in dbNSFP) is the primary computational predictor for PP3 and BP4 criteria. ClinGen SVI calibrated thresholds allow PP3 to reach Supporting, Moderate, or Strong evidence strength.
Screening Prioritization
Individual predictor scores (SIFT, AlphaMissense, MetaSVM, DANN) and conservation metrics (PhyloP, GERP) contribute to the weighted deleteriousness component of the screening score.
Duplicate Variant Handling
dbNSFP contains approximately 701,000 duplicate variant entries (0.87% of the dataset) due to multi-transcript annotation. Helix Insight resolves these by aggregating to the most pathogenic interpretation: MIN(sift_score) since lower SIFT is more pathogenic, MAX for all other scores, with predictions matched to their corresponding extreme scores.
Columns Loaded (9)
Variants are matched by exact positional coordinates. From the 434 available fields, Helix Insight loads 9 columns covering 4 functional predictors and 2 conservation metrics:
SIFT prediction: "D" (Deleterious) or "T" (Tolerated). Based on sequence homology across related proteins.
SIFT score (0-1). Lower values indicate higher probability of being deleterious. Threshold: < 0.05 = Deleterious.
AlphaMissense prediction: "P" (Pathogenic), "A" (Ambiguous), or "B" (Benign). DeepMind protein structure-based.
AlphaMissense score (0-1). Higher values indicate higher probability of pathogenicity.
MetaSVM prediction: "D" (Damaging) or "T" (Tolerated). Ensemble of 10 individual predictors combined with SVM.
MetaSVM score (continuous). Positive = damaging, negative = tolerated.
DANN score (0-1). Deep neural network pathogenicity score. Higher = more likely pathogenic. Can score any SNV.
PhyloP conservation score across 100 vertebrate species. Positive = conserved, negative = fast-evolving, zero = neutral.
GERP++ rejected substitution score. Higher = more constrained. Scores > 2 suggest constraint, > 4 strong constraint.
Limitations
dbNSFP covers non-synonymous (missense) single nucleotide variants only. Indels, structural variants, and synonymous variants are not included.
Predictor scores may be NULL for variants not covered by specific prediction algorithms.
Different predictors use different training datasets, so their predictions are not fully independent.
Conservation scores reflect evolutionary constraint at a position, not the impact of a specific amino acid substitution.
BayesDel_noAF deliberately excludes allele frequency to avoid circular reasoning with PM2/BA1/BS1 criteria.
Reference
Liu X, et al. "dbNSFP v4: a comprehensive database of transcript-specific functional predictions and annotations for human nonsynonymous and splice-site SNVs." Genome Medicine. 2020;12(1):103. PMID: 33261662.
For details on individual predictors, see the Computational Predictors section.