Helix Insight

Literature Evidence

The platform maintains a local, pre-processed subset of PubMed focused on clinical genetics literature. Approximately 2-3 million publications are indexed, filtered from the full PubMed corpus of approximately 35 million articles using curated MeSH (Medical Subject Headings) descriptors for genetics relevance.

When a case is analyzed, the platform searches this local database for publications relevant to the patient's genes, variants, and phenotype. Results are ranked by clinical relevance and annotated with evidence strength categories aligned to ACMG criteria.

How It Works

1

Query Construction

The platform identifies genes carrying candidate variants and the patient’s HPO terms. These form the search query against the local literature database.

2

Publication Discovery

Two search strategies run in parallel: direct gene mention lookup (indexed) and full-text search in titles and abstracts. Results are merged into a single candidate set.

3

Evidence Enrichment

Each candidate publication is enriched with extracted data: gene mentions with frequency counts, variant mentions with HGVS notation, and phenotype associations from MeSH descriptors.

4

Relevance Scoring

Publications are scored across six weighted dimensions: phenotype match, publication type, gene centrality, functional data, variant match, and recency. Low-relevance results are filtered out.

5

Clinical Review

Ranked results are presented with evidence strength labels (Strong, Moderate, Supporting, Weak) aligned to ACMG criteria. The geneticist reviews the publications to determine their applicability.

Local Database, No External Queries

All literature searches run against a local database stored on EU infrastructure. No patient data or query parameters are sent to PubMed or any external service during clinical search. The database is updated daily from PubMed's public FTP server as a background process, separate from clinical queries.

Performance

Clinical literature search completes in under 500 milliseconds for a typical query (3-5 genes, 5-10 HPO terms). Results are pre-generated and cached for streaming to the frontend.

In This Section