Helix Insight

Documentation / Phenotype Matching / HPO Term Selection Guide

HPO Term Selection Guide

The quality of phenotype matching depends directly on the HPO terms selected by the geneticist. More specific, comprehensive term sets produce more accurate and discriminating phenotype match scores.

General Principles

Be specific

Use "Focal clonic seizure" (HP:0002266) instead of just "Seizure" (HP:0001250). More specific terms have higher information content and produce more discriminating similarity scores. The platform’s autocomplete helps find the most specific matching term.

Be comprehensive

Include findings from all organ systems, not just the primary complaint. A patient referred for seizures may also have microcephaly, feeding difficulties, and hypotonia -- all of these help narrow the differential diagnosis.

Include negative findings

When clinically significant, include negative findings. The platform’s text extraction handles negation (e.g., "no hearing loss" will identify the term but mark it as negated). Negative findings help exclude conditions.

Aim for 5-15 terms

This range is optimal for most cases. Fewer than 5 terms may miss relevant matches. More than 15 terms can dilute the average score if some are poorly characterized in the HPO database.

Prefer observed over inferred

Select HPO terms for findings that have been directly observed or documented, not suspected diagnoses. "Seizure" is better than "Epilepsy" if the patient has had a seizure but no formal epilepsy diagnosis.

Domain-Specific Guidance

Neurodevelopmental

Include seizure type and age of onset, developmental milestones (sitting, walking, speech), brain MRI findings (e.g., "Cortical dysplasia", "Thin corpus callosum"), EEG patterns if specific, behavioral features (e.g., "Stereotypy", "Self-injurious behavior"), and tone abnormalities (hypotonia or spasticity).

Cardiology

Specify the cardiomyopathy type (dilated, hypertrophic, restrictive), arrhythmia type, ECG findings (e.g., "Prolonged QT interval"), echocardiographic measurements, family history of sudden cardiac death, and any extracardiac features.

Nephrology

Include the specific renal finding (cysts, proteinuria, hematuria), biopsy findings if available, extrarenal manifestations (hearing loss, eye abnormalities), and age of onset. Terms like "Renal insufficiency" are less informative than "Autosomal dominant polycystic kidney disease" when specific.

Metabolic

Include specific metabolites elevated or decreased, enzyme activity levels, organ involvement, and response to treatment. Terms describing biochemical abnormalities (e.g., "Elevated serum lactate") are often more specific than clinical descriptions.

Neonatal

Particularly relevant for newborn screening contexts. Include gestational age-related findings, birth parameters, feeding difficulties, hypotonia, seizure onset age, congenital anomalies, and metabolic screening results. Early-onset findings are often the most diagnostically informative.

Using the Platform Tools

Autocomplete Search

Start typing a clinical finding in the HPO term input field. The platform searches across all 17,000+ HPO terms in real time, including synonyms. Select the most specific term from the suggestions.

Free-Text Extraction

Paste a clinical description or referral letter into the text extraction field. The platform identifies HPO terms mentioned in the text, handles negation (marking terms preceded by "no", "without", "denies", etc.), and returns only positive findings. Review the extracted terms and adjust as needed before running phenotype matching.

Common Mistakes

Using only the chief complaint

Entering only "Seizure" for an epilepsy patient misses the opportunity to differentiate between hundreds of epilepsy-associated genes. Add seizure type, developmental status, MRI findings, and any other relevant features.

Selecting parent terms when children exist

"Abnormality of the nervous system" matches thousands of genes equally. If the patient has seizures, select "Seizure" or an even more specific child term.

Omitting extraprimary findings

A cardiac patient with short stature and lens dislocation should include all three features. The combination may point to a specific syndrome (e.g., Marfan) that individual features would not identify.

Iterative Refinement

Phenotype matching can be re-run at any time with updated HPO terms. If the initial results do not include expected candidate genes, consider adding more specific terms or terms from additional organ systems. The platform preserves previous matching results until a new run replaces them.