Documentation / Reference Databases / dbNSFP

dbNSFP

dbNSFP (database of Non-synonymous Functional Predictions) provides precomputed pathogenicity predictions and conservation scores for all possible non-synonymous single nucleotide variants in the human genome. It serves as the source for both BayesDel_noAF (used in ACMG PP3/BP4 criteria) and individual predictor scores used in screening prioritization.

Database Details

Version4.9c

Records~80.6 million variant sites

Total Fields434 (9 loaded by Helix Insight)

Genome BuildGRCh38

Sourcesites.google.com/site/jpaborern/dbNSFP

ProducerLiu X, Li C, Mou C, Dong Y, Tu Y (USF Genomics)

Role in Classification and Screening

dbNSFP data serves two distinct roles in Helix Insight:

ACMG Classification

BayesDel_noAF (included in dbNSFP) is the primary computational predictor for PP3 and BP4 criteria. ClinGen SVI calibrated thresholds allow PP3 to reach Supporting, Moderate, or Strong evidence strength.

Screening Prioritization

Individual predictor scores (SIFT, AlphaMissense, MetaSVM, DANN) and conservation metrics (PhyloP, GERP) contribute to the weighted deleteriousness component of the screening score.

Duplicate Variant Handling

dbNSFP contains approximately 701,000 duplicate variant entries (0.87% of the dataset) due to multi-transcript annotation. Helix Insight resolves these by aggregating to the most pathogenic interpretation: MIN(sift_score) since lower SIFT is more pathogenic, MAX for all other scores, with predictions matched to their corresponding extreme scores.

Columns Loaded (9)

Variants are matched by exact positional coordinates. From the 434 available fields, Helix Insight loads 9 columns covering 4 functional predictors and 2 conservation metrics:

sift_predVARCHAR

SIFT prediction: "D" (Deleterious) or "T" (Tolerated). Based on sequence homology across related proteins.

sift_scoreFLOAT

SIFT score (0-1). Lower values indicate higher probability of being deleterious. Threshold: < 0.05 = Deleterious.

alphamissense_predVARCHAR

AlphaMissense prediction: "P" (Pathogenic), "A" (Ambiguous), or "B" (Benign). DeepMind protein structure-based.

alphamissense_scoreFLOAT

AlphaMissense score (0-1). Higher values indicate higher probability of pathogenicity.

metasvm_predVARCHAR

MetaSVM prediction: "D" (Damaging) or "T" (Tolerated). Ensemble of 10 individual predictors combined with SVM.

metasvm_scoreFLOAT

MetaSVM score (continuous). Positive = damaging, negative = tolerated.

dann_scoreFLOAT

DANN score (0-1). Deep neural network pathogenicity score. Higher = more likely pathogenic. Can score any SNV.

phylop100way_vertebrateFLOAT

PhyloP conservation score across 100 vertebrate species. Positive = conserved, negative = fast-evolving, zero = neutral.

gerp_rsFLOAT

GERP++ rejected substitution score. Higher = more constrained. Scores > 2 suggest constraint, > 4 strong constraint.

Limitations

dbNSFP covers non-synonymous (missense) single nucleotide variants only. Indels, structural variants, and synonymous variants are not included.

Predictor scores may be NULL for variants not covered by specific prediction algorithms.

Different predictors use different training datasets, so their predictions are not fully independent.

Conservation scores reflect evolutionary constraint at a position, not the impact of a specific amino acid substitution.

BayesDel_noAF deliberately excludes allele frequency to avoid circular reasoning with PM2/BA1/BS1 criteria.

Reference

Liu X, et al. "dbNSFP v4: a comprehensive database of transcript-specific functional predictions and annotations for human nonsynonymous and splice-site SNVs." Genome Medicine. 2020;12(1):103. PMID: 33261662.

For details on individual predictors, see the Computational Predictors section.