Documentation / AI Clinical Assistant / Database Queries

Database Queries

Helix AI can query two databases through natural language: the patient's classified variant database and the biomedical literature database. The assistant automatically determines which database to query based on your question, generates SQL, executes it, and incorporates the results into its response.

Two Databases

Database	Content	Scale
Patient Variants	Classified variants with ACMG annotations, population frequencies, functional predictions, phenotype matches, and screening scores.	~2.3M variants, 70 columns per variant
Biomedical Literature	PubMed publications with gene mentions, variant mentions, abstracts, MeSH terms, and publication metadata.	1M+ publications, 400K gene mentions, 100K variant mentions

How Queries Work

Question Analysis

The assistant determines whether the question requires a database query. Questions about specific patient data trigger a query; general genetics knowledge is answered directly.

SQL Generation

The question is sent to a specialized SQL generation module that translates natural language into DuckDB SQL. The generator uses a low temperature (0.1) for precise, deterministic output and has access to the complete database schema.

Execution

The SQL query runs against a read-only DuckDB connection with a 30-second timeout. All queries are read-only -- the assistant cannot modify patient data.

Result Filtering

For detail queries (specific variants), results are filtered to 20 clinically essential columns out of 70, reducing token usage by approximately 70%. Aggregation queries preserve all columns.

Response Integration

The assistant receives the query results and incorporates them into its clinical response, adding visualization suggestions for chart-appropriate data.

Queryable Data

The patient variant database contains 70 columns per variant. The most commonly queried fields include:

Category	Fields
Identity	gene_symbol, chromosome, position, hgvs_protein, hgvs_cdna, rsid, transcript_id
Classification	acmg_class, acmg_criteria, confidence_score
Consequence	consequence, impact, biotype, exon_number, domains
Population Frequency	gnomad_af, gnomad_popmax, gnomad_popmax_af, gnomad_hom
ClinVar	clinvar_significance, clinvar_review_status, stars
Functional Predictions	sift_prediction, alphamissense_prediction, metasvm_prediction, dann_score
Gene Constraint	gene_pli, gene_oe_lof, gene_loeuf
Phenotype	hpo_terms, hpo_count, hpo_phenotypes
Screening	priority_score, priority_tier

Query Performance

Operation	Typical Latency
SQL generation	1-3 seconds
Variant database query	Under 200 milliseconds
Literature database query	Under 500 milliseconds
Total (generation + execution)	2-4 seconds

Safety

All database access is strictly read-only. The assistant cannot insert, update, or delete any data. Query execution has a 30-second timeout to prevent runaway queries. Results are capped at a safe size limit to maintain responsive conversation flow.