Helix Insight

Documentation / Literature Evidence / Literature Evidence

Literature Evidence

Literature evidence supports variant classification by identifying published studies that describe the same gene, variant, or phenotype. The platform automates what would otherwise require hours of manual PubMed searching per variant, producing ranked results with transparent scoring in under 500 milliseconds.

What the Platform Extracts

During database ingestion, each genetics-relevant publication is processed to extract structured entities from free text:

Gene Mentions

Gene symbols identified in titles and abstracts, validated against the human protein-coding gene database. Each mention records the gene symbol, frequency count, and surrounding context. Strict validation eliminates false positives from abbreviations (DNA, HIV, PCR).

Variant Mentions

Variant notations extracted using HGVS patterns (c.123A>G, p.Arg248Gln) and legacy notation (R248Q). Each variant is associated with the nearest gene mention in the text and includes the surrounding sentence for clinical context.

Phenotype Associations

Phenotype terms from curated MeSH descriptors assigned by NLM indexers. These provide high-confidence phenotype mappings with HPO, OMIM, and MeSH identifiers.

Search Strategy

When a clinical literature search is triggered, the platform runs two complementary search strategies and merges the results:

Indexed Gene Lookup

Direct query against the gene mentions index. Returns publications where the gene was identified and validated during ingestion. Fast (approximately 50ms) and high-precision.

Full-Text Search

Text search in titles and abstracts to catch mentions not captured by the gene extraction stage. Broader recall but lower precision. Limited to 10,000 results per query to maintain performance.

Both result sets are merged and deduplicated. Each candidate publication is then enriched with all extracted entities (gene mentions, variant mentions, phenotype associations) before relevance scoring.

What Is Reported

For each relevant publication, the platform reports:

Publication metadata: title, authors, journal, date, DOI, PubMed Central ID

Abstract text for quick review without leaving the platform

Six-component relevance score with individual score breakdown

Evidence strength category (Strong, Moderate, Supporting, Weak)

Matched genes, variants, and phenotypes highlighted

Presence of functional study data (animal models, cell assays)

Direct link to the full publication on PubMed

Advisory, Not Diagnostic

Literature evidence categories are advisory. The geneticist must review the actual publications to determine whether they provide sufficient evidence for specific ACMG criteria. The platform identifies relevant papers and estimates their strength; the clinician determines their applicability to the specific case.

For details on how publications are ranked, see Relevance Scoring.