Helix Insight

Documentation / Reference Databases / Ensembl VEP

Ensembl VEP

The Ensembl Variant Effect Predictor (VEP) determines the consequence of each variant on genes, transcripts, and protein sequence. It runs as Stage 3 of the processing pipeline, before reference database annotation, and provides the foundational consequence information that guides all downstream classification logic.

Configuration

VersionEnsembl Release 113
Genome BuildGRCh38
CacheLocal offline cache (no external API calls)
ProcessingParallelized across chromosomes (up to 48 workers)
Sourceensembl.org/vep
ProducerEuropean Molecular Biology Laboratory (EMBL-EBI)

Impact Classification

VEP assigns each consequence a severity level that determines which ACMG criteria pathways are evaluated:

ImpactExample ConsequencesACMG Pathway
HIGHframeshift_variant, stop_gained, splice_acceptor_variant, splice_donor_variantPVS1 pathway
MODERATEmissense_variant, inframe_insertion, inframe_deletionPP3/BP4 (BayesDel), PM4, BP3
LOWsynonymous_variant, splice_region_variantBP7
MODIFIERintron_variant, upstream_gene_variant, downstream_gene_variantTypically no ACMG criteria triggered

Local Execution

VEP runs entirely locally using a pre-downloaded offline cache. No variant data is sent to Ensembl servers. The local FASTA reference file provides sequence context for HGVS notation generation. This ensures both data privacy and processing speed independence from network availability.

Fields Extracted (11)

For each variant, VEP produces annotations across all overlapping transcripts. Helix Insight selects the most severe consequence transcript and extracts the following fields:

gene_symbol

HGNC gene symbol from the most severe transcript annotation.

gene_id

Ensembl gene identifier (ENSG format).

transcript_id

Ensembl transcript identifier (ENST format) from the most severe consequence.

hgvs_genomic

Genomic HGVS notation (e.g., NC_000001.11:g.12345A>G).

hgvs_cdna

cDNA-level HGVS notation relative to the transcript (e.g., NM_000123.4:c.456A>G).

hgvs_protein

Protein-level HGVS notation (e.g., NP_000114.1:p.Arg152Gly). NULL for non-coding variants.

consequence

Sequence Ontology consequence term(s). Examples: missense_variant, frameshift_variant, splice_donor_variant.

impact

Impact severity: HIGH, MODERATE, LOW, or MODIFIER. Determines which ACMG criteria paths are evaluated.

biotype

Transcript biotype (e.g., protein_coding, nonsense_mediated_decay).

exon_number

Exon number within the transcript, if applicable.

domains

Protein domain annotations (e.g., "Pfam:PF00533,InterPro:IPR011364"). Used for PM1 and PM4 criteria.

ACMG Criteria Dependent on VEP

VEP consequence and domain annotations directly determine which ACMG criteria are evaluated:

PVS1

Impact = HIGH + specific consequence types (frameshift, stop_gained, splice_acceptor, splice_donor)

PM1

Domains field contains Pfam annotation

PM4

In-frame indel consequence + Pfam domain + not repetitive region

PP2

Missense consequence type

BP1

Missense consequence + MODERATE impact

BP3

In-frame indel + repetitive region or no Pfam domain

BP7

Synonymous consequence + not splice region

Limitations

VEP selects one canonical transcript per gene. Variants with different consequences across alternative transcripts may have clinically relevant effects not captured by the primary annotation.

VEP indel representation differs from VCF format. The platform reconciles both formats during annotation matching, but complex multi-allelic sites may require manual review.

Domain annotations depend on Pfam coverage. Novel or uncharacterized protein domains are not represented.

VEP does not predict gain-of-function effects. All consequence annotations reflect loss or disruption of normal function.

Reference

McLaren W, et al. "The Ensembl Variant Effect Predictor." Genome Biology. 2016;17(1):122. PMID: 27268795.