Documentation / Reference Databases / Ensembl VEP
Ensembl VEP
The Ensembl Variant Effect Predictor (VEP) determines the consequence of each variant on genes, transcripts, and protein sequence. It runs as Stage 3 of the processing pipeline, before reference database annotation, and provides the foundational consequence information that guides all downstream classification logic.
Configuration
Impact Classification
VEP assigns each consequence a severity level that determines which ACMG criteria pathways are evaluated:
| Impact | Example Consequences | ACMG Pathway |
|---|---|---|
| HIGH | frameshift_variant, stop_gained, splice_acceptor_variant, splice_donor_variant | PVS1 pathway |
| MODERATE | missense_variant, inframe_insertion, inframe_deletion | PP3/BP4 (BayesDel), PM4, BP3 |
| LOW | synonymous_variant, splice_region_variant | BP7 |
| MODIFIER | intron_variant, upstream_gene_variant, downstream_gene_variant | Typically no ACMG criteria triggered |
Local Execution
VEP runs entirely locally using a pre-downloaded offline cache. No variant data is sent to Ensembl servers. The local FASTA reference file provides sequence context for HGVS notation generation. This ensures both data privacy and processing speed independence from network availability.
Fields Extracted (11)
For each variant, VEP produces annotations across all overlapping transcripts. Helix Insight selects the most severe consequence transcript and extracts the following fields:
HGNC gene symbol from the most severe transcript annotation.
Ensembl gene identifier (ENSG format).
Ensembl transcript identifier (ENST format) from the most severe consequence.
Genomic HGVS notation (e.g., NC_000001.11:g.12345A>G).
cDNA-level HGVS notation relative to the transcript (e.g., NM_000123.4:c.456A>G).
Protein-level HGVS notation (e.g., NP_000114.1:p.Arg152Gly). NULL for non-coding variants.
Sequence Ontology consequence term(s). Examples: missense_variant, frameshift_variant, splice_donor_variant.
Impact severity: HIGH, MODERATE, LOW, or MODIFIER. Determines which ACMG criteria paths are evaluated.
Transcript biotype (e.g., protein_coding, nonsense_mediated_decay).
Exon number within the transcript, if applicable.
Protein domain annotations (e.g., "Pfam:PF00533,InterPro:IPR011364"). Used for PM1 and PM4 criteria.
ACMG Criteria Dependent on VEP
VEP consequence and domain annotations directly determine which ACMG criteria are evaluated:
Impact = HIGH + specific consequence types (frameshift, stop_gained, splice_acceptor, splice_donor)
Domains field contains Pfam annotation
In-frame indel consequence + Pfam domain + not repetitive region
Missense consequence type
Missense consequence + MODERATE impact
In-frame indel + repetitive region or no Pfam domain
Synonymous consequence + not splice region
Limitations
VEP selects one canonical transcript per gene. Variants with different consequences across alternative transcripts may have clinically relevant effects not captured by the primary annotation.
VEP indel representation differs from VCF format. The platform reconciles both formats during annotation matching, but complex multi-allelic sites may require manual review.
Domain annotations depend on Pfam coverage. Novel or uncharacterized protein domains are not represented.
VEP does not predict gain-of-function effects. All consequence annotations reflect loss or disruption of normal function.
Reference
McLaren W, et al. "The Ensembl Variant Effect Predictor." Genome Biology. 2016;17(1):122. PMID: 27268795.