Helix Insight

Documentation / AI Clinical Assistant / Database Queries

Database Queries

Helix AI can query two databases through natural language: the patient's classified variant database and the biomedical literature database. The assistant automatically determines which database to query based on your question, generates SQL, executes it, and incorporates the results into its response.

Two Databases

DatabaseContentScale
Patient VariantsClassified variants with ACMG annotations, population frequencies, functional predictions, phenotype matches, and screening scores.~2.3M variants, 70 columns per variant
Biomedical LiteraturePubMed publications with gene mentions, variant mentions, abstracts, MeSH terms, and publication metadata.1M+ publications, 400K gene mentions, 100K variant mentions

How Queries Work

1

Question Analysis

The assistant determines whether the question requires a database query. Questions about specific patient data trigger a query; general genetics knowledge is answered directly.

2

SQL Generation

The question is sent to a specialized SQL generation module that translates natural language into DuckDB SQL. The generator uses a low temperature (0.1) for precise, deterministic output and has access to the complete database schema.

3

Execution

The SQL query runs against a read-only DuckDB connection with a 30-second timeout. All queries are read-only -- the assistant cannot modify patient data.

4

Result Filtering

For detail queries (specific variants), results are filtered to 20 clinically essential columns out of 70, reducing token usage by approximately 70%. Aggregation queries preserve all columns.

5

Response Integration

The assistant receives the query results and incorporates them into its clinical response, adding visualization suggestions for chart-appropriate data.

Queryable Data

The patient variant database contains 70 columns per variant. The most commonly queried fields include:

CategoryFields
Identitygene_symbol, chromosome, position, hgvs_protein, hgvs_cdna, rsid, transcript_id
Classificationacmg_class, acmg_criteria, confidence_score
Consequenceconsequence, impact, biotype, exon_number, domains
Population Frequencygnomad_af, gnomad_popmax, gnomad_popmax_af, gnomad_hom
ClinVarclinvar_significance, clinvar_review_status, stars
Functional Predictionssift_prediction, alphamissense_prediction, metasvm_prediction, dann_score
Gene Constraintgene_pli, gene_oe_lof, gene_loeuf
Phenotypehpo_terms, hpo_count, hpo_phenotypes
Screeningpriority_score, priority_tier

Query Performance

OperationTypical Latency
SQL generation1-3 seconds
Variant database queryUnder 200 milliseconds
Literature database queryUnder 500 milliseconds
Total (generation + execution)2-4 seconds

Safety

All database access is strictly read-only. The assistant cannot insert, update, or delete any data. Query execution has a 30-second timeout to prevent runaway queries. Results are capped at a safe size limit to maintain responsive conversation flow.