Reference Databases
Helix Insight uses eight reference databases for variant annotation and ACMG classification. All databases are stored locally on EU-based infrastructure in Helsinki, Finland. No variant data is sent to external APIs during processing.
Database versions are fixed per deployment. Each version undergoes validation testing before production deployment to ensure consistency with expected classification outcomes. The current versions and their roles in ACMG classification are documented below.
Zero External API Calls
During variant processing, Helix Insight makes zero external API calls. All reference databases are stored locally. Ensembl VEP runs with a local cache. No patient data leaves the server at any processing stage.
Database Summary
| Database | Version | Records | Primary Use | ACMG Criteria |
|---|---|---|---|---|
| gnomAD | v4.1.0 | ~759M variants | Population frequencies | BA1, BS1, BS2, PM2 |
| ClinVar | 2025-01 | ~4.1M variants | Clinical significance | PS1, PP5, BP6, ClinVar override |
| dbNSFP | 4.9c | ~80.6M sites | Functional predictions | PP3, BP4 (BayesDel_noAF) |
| SpliceAI | MANE R113 | All coding variants | Splice impact | PP3_splice, BP7 guard |
| gnomAD Constraint | v4.1.0 | ~18.2K genes | Gene-level tolerance | PVS1, PP2, BP1 |
| HPO | Latest release | ~320K associations | Gene-phenotype mapping | PP4 |
| ClinGen | Latest release | ~1.6K genes | Dosage sensitivity | BS1, BP2 |
| Ensembl VEP | Release 113 | All consequences | Variant effect prediction | PVS1, PM1, PM4, BP1, BP3, BP7 |
Annotation Pipeline
Reference data is loaded into each variant record during Stage 4 of the processing pipeline. After annotation, every variant carries all reference columns directly -- no database lookups are needed during classification or clinical review. The annotation order is:
gnomAD v4.1
Population allele frequencies. Positional match on chromosome, position, reference allele, and alternate allele. Loads 6 columns.
ClinVar
Clinical significance assertions. Same positional match. Loads 7 columns including review stars and disease associations.
dbNSFP 4.9c
Functional predictions from SIFT, AlphaMissense, MetaSVM, DANN, BayesDel, and conservation scores. Loads 9 columns with duplicate variant aggregation.
gnomAD Constraint
Gene-level tolerance metrics. Joined on gene symbol. Loads 4 columns: pLI, LOEUF, o/e LoF, and missense Z-score.
HPO
Gene-phenotype associations. Joined on gene symbol with deduplication and aggregation. Loads 6 columns.
ClinGen
Dosage sensitivity scores. Joined on gene symbol. Loads 2 columns: haploinsufficiency and triplosensitivity.
Ensembl VEP runs as a separate stage (Stage 3) before database annotation, providing consequence predictions and transcript selection that the annotation phases then build upon. SpliceAI scores are accessed from precomputed data during VEP annotation.
In This Section
gnomAD
Population allele frequencies from 807,162 individuals across 8 genetic ancestry groups.
ClinVar
Clinical significance assertions from submitting laboratories worldwide.
dbNSFP
Functional predictions and conservation scores for all possible coding SNVs.
HPO
Gene-phenotype associations from the Human Phenotype Ontology.
ClinGen
Gene dosage sensitivity curation from the Clinical Genome Resource.
Ensembl VEP
Variant Effect Predictor for consequence annotation and transcript selection.
SpliceAI Precomputed
Precomputed splice impact delta scores for all coding variants.
Update Policy
How and when reference databases are updated, validated, and versioned.
For details on how these databases are combined during ACMG classification, see the Criteria Reference.