Helix Insight

Reference Databases

Helix Insight uses eight reference databases for variant annotation and ACMG classification. All databases are stored locally on EU-based infrastructure in Helsinki, Finland. No variant data is sent to external APIs during processing.

Database versions are fixed per deployment. Each version undergoes validation testing before production deployment to ensure consistency with expected classification outcomes. The current versions and their roles in ACMG classification are documented below.

Zero External API Calls

During variant processing, Helix Insight makes zero external API calls. All reference databases are stored locally. Ensembl VEP runs with a local cache. No patient data leaves the server at any processing stage.

Database Summary

DatabaseVersionRecordsPrimary UseACMG Criteria
gnomADv4.1.0~759M variantsPopulation frequenciesBA1, BS1, BS2, PM2
ClinVar2025-01~4.1M variantsClinical significancePS1, PP5, BP6, ClinVar override
dbNSFP4.9c~80.6M sitesFunctional predictionsPP3, BP4 (BayesDel_noAF)
SpliceAIMANE R113All coding variantsSplice impactPP3_splice, BP7 guard
gnomAD Constraintv4.1.0~18.2K genesGene-level tolerancePVS1, PP2, BP1
HPOLatest release~320K associationsGene-phenotype mappingPP4
ClinGenLatest release~1.6K genesDosage sensitivityBS1, BP2
Ensembl VEPRelease 113All consequencesVariant effect predictionPVS1, PM1, PM4, BP1, BP3, BP7

Annotation Pipeline

Reference data is loaded into each variant record during Stage 4 of the processing pipeline. After annotation, every variant carries all reference columns directly -- no database lookups are needed during classification or clinical review. The annotation order is:

1

gnomAD v4.1

Population allele frequencies. Positional match on chromosome, position, reference allele, and alternate allele. Loads 6 columns.

2

ClinVar

Clinical significance assertions. Same positional match. Loads 7 columns including review stars and disease associations.

3

dbNSFP 4.9c

Functional predictions from SIFT, AlphaMissense, MetaSVM, DANN, BayesDel, and conservation scores. Loads 9 columns with duplicate variant aggregation.

4

gnomAD Constraint

Gene-level tolerance metrics. Joined on gene symbol. Loads 4 columns: pLI, LOEUF, o/e LoF, and missense Z-score.

5

HPO

Gene-phenotype associations. Joined on gene symbol with deduplication and aggregation. Loads 6 columns.

6

ClinGen

Dosage sensitivity scores. Joined on gene symbol. Loads 2 columns: haploinsufficiency and triplosensitivity.

Ensembl VEP runs as a separate stage (Stage 3) before database annotation, providing consequence predictions and transcript selection that the annotation phases then build upon. SpliceAI scores are accessed from precomputed data during VEP annotation.

In This Section

For details on how these databases are combined during ACMG classification, see the Criteria Reference.