Documentation / Phenotype Matching / HPO Overview
HPO Overview
The Human Phenotype Ontology (HPO) is a standardized vocabulary of phenotypic abnormalities observed in human disease. It provides the foundation for computational phenotype comparison in Helix Insight.
What is HPO?
HPO is a structured hierarchy (directed acyclic graph) of clinical terms that describes signs, symptoms, and findings in patients with genetic disease. Each term has a unique identifier (e.g., HP:0001250 for "Seizure"), a definition, and relationships to other terms in the hierarchy.
The hierarchy captures the specificity of clinical findings. "Focal clonic seizure" (HP:0002266) is a child of "Motor seizure" (HP:0020219), which is a child of "Seizure" (HP:0001250), which is a child of "Abnormality of the nervous system" (HP:0000707). More specific terms carry more diagnostic information.
Scale
Why Standardized Terms Matter
Clinical descriptions in medical records use varied language. One physician writes "fits", another "seizures", a third "epileptic episodes". Without standardization, computational comparison across patients, diseases, and databases is impossible. HPO solves this by assigning each clinical concept a unique identifier.
When a geneticist selects "Seizure" (HP:0001250) in Helix Insight, the platform knows exactly which concept is meant and can compute its similarity to every other HPO term in the ontology -- including terms the geneticist did not explicitly select.
How HPO Is Used in Helix Insight
Patient Phenotype Input
The geneticist selects HPO terms describing the patient's clinical presentation. The platform provides autocomplete search across all 17,000+ terms and can extract terms automatically from free-text clinical descriptions.
Gene-Phenotype Database
Every gene in the analysis carries its HPO associations from the reference database. These associations link genes to the clinical features observed in patients with mutations in that gene.
Semantic Comparison
The matching algorithm compares the patient's HPO terms against each gene's HPO profile using semantic similarity. Related terms are recognized through their shared ancestry in the ontology graph.
ACMG PP4 Criterion
When 3 or more patient HPO terms match a gene's phenotype profile (or 2 for highly specific genes), the PP4 criterion is applied during ACMG classification, providing supporting pathogenic evidence for phenotype specificity.
Hierarchy Example
Consider a patient with "Infantile spasms" (HP:0012469) being matched against a gene associated with "Epileptic encephalopathy" (HP:0200134). These terms are not identical, but they share close ancestors in the HPO hierarchy. The semantic similarity algorithm recognizes this relationship and produces a high similarity score. This is the key advantage over simple keyword matching: clinical knowledge encoded in the ontology structure is leveraged automatically.
Reference
Kohler S, et al. "The Human Phenotype Ontology in 2024: phenotypes around the world." Nucleic Acids Research. 2024;52(D1):D1333-D1346. PMID: 37953324.