Helix Insight

Documentation / Data and Privacy / No External Calls

No External Calls

During variant processing, Helix Insight makes zero outbound network calls. No patient data, genomic coordinates, variant identifiers, or query parameters are sent to any external service at any point in the analysis pipeline. This is a fundamental architectural decision, not a configuration option.

What Runs Locally

Every component of the variant analysis pipeline operates on local data stored on the Helsinki server. No external API calls are made during processing:

Ensembl VEP

Variant Effect Predictor runs locally with a local offline cache (Release 113, GRCh38). No calls to the Ensembl REST API. Consequence annotation, transcript selection, and impact classification all run on local data.

gnomAD

Population frequency database (v4.1.0, 759 million variants) is stored locally. Allele frequencies, homozygote counts, and population-specific data are queried from local storage.

ClinVar

Clinical significance database (2025-01, 4.1 million variants) is stored locally. Clinical significance, review status, and star ratings are queried without any connection to NCBI.

dbNSFP

Functional prediction scores (4.9c, 80.6 million sites) -- SIFT, AlphaMissense, MetaSVM, DANN, BayesDel, PhyloP, GERP -- all stored and queried locally.

HPO Ontology

Human Phenotype Ontology with 17,000+ terms and 320,000+ gene-phenotype associations. Phenotype matching and semantic similarity computed locally using the pyhpo library.

ClinGen

Dosage sensitivity data (1,600+ genes) for haploinsufficiency and triplosensitivity stored locally.

SpliceAI Precomputed

Splice impact delta scores for MANE transcripts stored locally. No calls to the Illumina API or SpliceAI web interface.

Literature Database

Local PubMed mirror with 2-3 million genetics-relevant publications. Literature search, relevance scoring, and evidence assessment all run against local data. No calls to NCBI PubMed.

Network Architecture

The variant processing pipeline has no outbound network access by design. Network architecture enforces this at the infrastructure level:

Inbound only

The platform accepts VCF uploads and API requests from authenticated users over TLS 1.3. These are the only inbound connections.

No outbound from processing

The variant analysis, phenotype matching, screening, and literature search services have no outbound network routes. They cannot make HTTP requests, DNS queries, or any other network calls to external services.

Separate update channel

Reference database updates (ClinVar quarterly, PubMed daily) are fetched by a separate background process that does not have access to patient data. The update process downloads public data; it never sends any data outbound.

Why This Matters

No data leakage risk

If the processing pipeline cannot make network calls, it cannot leak data -- even in the event of a software vulnerability or misconfiguration.

No third-party dependency

Processing does not depend on external API availability. The platform operates at full capacity even if every external service is offline.

No third-party data access

No external organization receives genomic data, variant identifiers, or query parameters. There is no risk of external services logging, caching, or retaining patient-related data.

Verifiable by design

The network isolation is enforced at the infrastructure level (firewall rules, Docker network configuration), not at the application level. It can be independently verified through network audit.

AI Clinical Assistant

The AI Clinical Assistant is the one component that may use an external language model API. When this occurs, only anonymized variant data is transmitted -- genomic coordinates and classification results, never patient identifiers, sample IDs, or phenotype data. The AI assistant is an optional feature; all core analysis (classification, phenotype matching, screening, literature search) runs entirely locally without any external calls.