Documentation / Reference Databases / HPO
HPO
The Human Phenotype Ontology (HPO) provides a standardized vocabulary of phenotypic abnormalities and their associations with genes and diseases. In Helix Insight, HPO data enables genotype-phenotype correlation by matching patient clinical features to gene-associated phenotypes.
Database Details
Role in Classification and Analysis
HPO data supports three analysis functions:
ACMG PP4 Criterion
When patient HPO terms are provided, PP4 triggers if >= 3 patient HPO terms match the gene HPO profile, or >= 2 terms match for a highly specific gene (<= 5 total HPO associations). This provides supporting pathogenic evidence.
Phenotype Matching Service
The dedicated phenotype matching module uses HPO semantic similarity to compute overlap between patient phenotype and gene-associated phenotypes, producing clinical tiers (Tier 1 through Tier 4) for variant prioritization.
Screening Prioritization
When no patient HPO terms are available, the hpo_count field serves as a proxy for clinical relevance. Genes associated with more phenotypes receive higher screening scores, reflecting broader clinical significance.
Data Deduplication
The same HPO term can be associated with a gene through multiple diseases. During annotation, HPO terms are deduplicated per gene before aggregation, ensuring each unique phenotype is counted once. Both hpo_ids and hpo_names are sorted by HPO ID to maintain a reliable 1:1 correspondence between identifiers and names.
Columns Loaded (6)
HPO data is joined on gene symbol. Each variant inherits the complete HPO profile of its associated gene.
Semicolon-separated HPO term identifiers associated with the gene (e.g., "HP:0001250;HP:0001263;HP:0002069"). Sorted by HPO ID to maintain 1:1 correspondence with hpo_names.
Semicolon-separated HPO term names corresponding to hpo_ids (e.g., "Seizure;Global developmental delay;Dementia"). Sorted to match hpo_ids.
Number of unique HPO terms associated with the gene. Used in screening mode as a proxy for clinical breadth when no patient HPO terms are available.
Frequency of each phenotype in the associated condition, when available. Not all gene-phenotype associations have frequency data.
OMIM or Orphanet disease identifiers that link the gene to each HPO phenotype.
HPO internal gene identifier used for cross-referencing within the ontology.
Limitations
HPO coverage varies by disease. Well-studied conditions have comprehensive phenotype profiles, while rare diseases may have minimal HPO annotation.
HPO terms are associated at the gene level, not the variant level. Different variants in the same gene may produce different phenotypes.
Frequency data for phenotype-disease associations is incomplete. The absence of frequency data does not mean the phenotype is rare.
HPO is primarily focused on rare diseases. Common complex conditions may have less comprehensive ontology coverage.
Phenotype matching depends on accurate HPO term selection by the clinician. Overly broad or imprecise terms reduce matching specificity.
Reference
Kohler S, et al. "The Human Phenotype Ontology in 2024: phenotypes around the world." Nucleic Acids Research. 2024;52(D1):D1333-D1346. PMID: 37953324.
For details on phenotype-based analysis, see the Phenotype Matching section.