Continuous Notation¶
All continuous notation follows a similar pattern that is loosely defined as:
<feature>:<prefix>.<pos><type><seq>
The reference feature
would be the gene (chromosome, transcript, etc.) name that the variant
occurs on. The prefix denotes the coordinate type (see prefixes). The range is the
position or positions of the variant. For a deletion, this is the range that is deleted. For an
insertion, this is the two positions the sequence is inserted between. The sequence element will
depend on the type of variant being described, but often this is the untemplated/inserted sequence.
The sequence element is often optional. For all notation types there are general and more specific versions of notating the same event. Where possible more specificity is preferred. But it is recognized that notation coming from outside sources may not always provide all information. For each variant, the different equivalent notation options are shown below in order of increasing specificity.
Examples¶
Substitution¶
Genomic/CDS substitution variants differ from protein substitution variants. Therefore examples of both will be given.
A protein missense mutation where G is replaced with D
KRAS:p.G12D
A genomic substitution from A to C
chr11:g.1234A>C
Indel¶
A protein deletion of amino acids GH and insertion of three amino acids TTA
EGFR:p.G512_H513delins
EGFR:p.G512_H513delins3
EGFR:p.G512_H513delGHins
EGFR:p.G512_H513delGHins3
EGFR:p.G512_H513delinsTTA
EGFR:p.G512_H513delGHinsTTA
Insertion¶
Insertions must be a range to specify between which two coordinates the insertion occurs. This avoids the problem when only a single coordinate is given of which side it is inserted on.
An protein insertion of four amino acids between G123 and H124. The sequence element here is optional and can also be described as a number if the number of bases inserted is known but the sequence is not given.
EGFR:p.G123_H124ins
EGFR:p.G123_H124ins4
EGFR:p.G123_H124insCCST
Deletion¶
The reference sequence is optional when denoting a deletion. For example the same deletion could be notated both ways as shown below.
EGFR:p.R10_G14del
EGFR:p.R10_G14del5
EGFR:p.R10_G14delRSTGG
If the reference sequence is known, it is always better to provide more information than less.
Duplication¶
Four amino acids are duplicated. Once again, the sequence element is optional
EGFR:p.R10_G14dup
EGFR:p.R10_G14dup5
EGFR:p.R10_G14dupRSTGG
Frameshift¶
Frameshifts are only applicable to variants denoted with protein coordinates. Frameshift notation follows the pattern below
<feature>:p.<pos><first alternate AA>fs*<position of next truncating AA>
The first alternate AA
, and position of next truncating AA
are both optional elements. For
example the protein frameshift variant might be noted multiple ways
PTEN:p.G123fs
PTEN:p.G123fs*10
PTEN:p.G123Afs
PTEN:p.G123Afs*10