Documentation / Literature Evidence / Relevance Scoring
Relevance Scoring
Each candidate publication is scored from 0.0 to 1.0 using a six-component weighted system optimized for ACMG evidence assessment. The total score determines the publication's rank in search results.
Six Scoring Components
Measures how many of the patient’s HPO terms are mentioned in the publication. Uses morphological matching (stemming) so that "seizure" matches "seizures" and "epileptic" matches "epilepsy". Score equals the fraction of patient HPO terms found in the title and abstract.
Prioritizes study types with the highest clinical relevance for variant interpretation. Case reports receive the highest score because they directly describe patient phenotypes and variant associations.
Measures how prominently the query gene is discussed in the publication, based on mention frequency. A paper mentioning the gene 20+ times is likely focused on that gene, while a single mention may be incidental.
Detects the presence of functional studies -- animal models (zebrafish, mouse), knockout experiments, cell line assays, and molecular biology techniques. Functional evidence is critical for ACMG PS3 criterion assessment.
Awards a bonus when the exact variant notation is found in the publication. An exact match (1.0) indicates the specific variant has been studied; a gene-only match (0.3) indicates relevance at the gene level.
More recent publications are scored higher using linear decay over 10 years. A 2025 publication scores 1.0; a 2015 publication scores approximately 0.0. This reflects the evolving understanding of variant pathogenicity.
Publication Type Scoring
| Publication Type | Score | Rationale |
|---|---|---|
| Case Report | 1.0 | Directly describes patient phenotypes and variant associations |
| Clinical Trial | 0.9 | Strong clinical evidence with structured methodology |
| Research Article | 0.7 | Original research contributing new findings |
| Journal Article | 0.5 | General scientific publication |
| Review | 0.3 | Secondary source, lower novelty for classification |
Gene Centrality Scoring
| Mention Count | Score |
|---|---|
| 20 or more | 1.0 |
| 10-19 | 0.8 |
| 5-9 | 0.6 |
| 2-4 | 0.4 |
| 1 | 0.2 |
Morphological Matching
Phenotype matching uses stemming (NLTK SnowballStemmer, English) to handle morphological variations in clinical language. The stem of each HPO term name is compared against stemmed words in the publication title and abstract. This ensures that "seizure" matches "seizures", "epileptic" matches "epilepsy", and "developmental" matches "development" without requiring exact string matches.
Score Interpretation
Parallel Scoring
Relevance scoring runs across 16 parallel workers, enabling the platform to score thousands of candidate publications in under 500 milliseconds. Each worker is pre-initialized with the stemming engine to eliminate per-request overhead.