Expression Variants¶

In an effort to be extensible and quick to use there are only two required fields for expression variants: the name of the gene (gene) and whether the expression of that gene is up-regulated or down-regulated (kbCategory).

Field	Type	Example	Description
gene	string	KRAS	the gene name (or source identifier)
kbCategory	string	increased expression	the graphkb expression variant vocabulary term this variant belongs to. One of: increased expression, reduced expression

The kbCategory field is how IPR knows this row/entry should be treated as a variant/outlier or just expression information input for context. This way the thresholds and cut-off values used are determined by the users uploading the reports.

Computing the various expression metrics is largely optional and done prior by the user prior to upload/report-creation. These metrics are displayed to the analyst reviewing the case along with the variant status. The standard fields we provide input for are listed below.

As with the other variants, these should be passed to the IPR python adapter in the main report content JSON.

{
    "expressionVariants": [
        // variants
    ]

}

Each variant is an object which may contain any of the following fields (in addition to the required fields). Examples of how these fields are calculated can be found in the scripting examples section.

Field	Type	Description
biopsySiteFoldChange	`number?`	the fold change with respect to the median of the biopsy site expression comparator cohort
biopsySitePercentile	`number?`	the percentile with respect to the biopsy site expression comparator cohort
biopsySiteQC	`number?`
biopsySiteZScore	`number?`	the zscore with respect to the biopsy site expression comparator cohort
biopsySitekIQR	`number?`	the kIQR with respect to the biopsy site expression comparator cohort
diseaseFoldChange	`number?`	the fold change with respect to the median of the disease expression comparator cohort
diseasePercentile	`number?`	the percentile with respect to the disease expression comparator cohort
diseaseQC	`number?`
diseaseZScore	`number?`	the zscore with respect to the disease expression comparator cohort
diseasekIQR	`number?`	the kIQR with respect to the disease expression comparator cohort
expressionState	`string`	Overloads the kb-category just for display purposes (does not affect matching)
histogramImage	`string?`	path to the expression density/histogram plot
primarySiteFoldChange	`number?`	the fold change with respect to the median of the primary site expression comparator cohort
primarySitePercentile	`number?`	the percentile with respect to the primary site expression comparator cohort
primarySiteQC	`number?`
primarySiteZScore	`number?`	the zscore with respect to the primary site expression comparator cohort
primarySitekIQR	`number?`	the kIQR with respect to the primary site expression comparator cohort
rnaReads	`number?`
rpkm	`number?`	reads per kilobase of transcript, per million mapped reads
tpm	`number?`	transcript per million

Metrics¶

Z-Score¶

The z-score (A.K.A. standard score) is a metric used to describe a data point relative to a distribution with respect to the variance within that distribution. Formally it is defined as

\[ z = \frac{\mu - x}{\sigma} \]

k-IQR¶

The k interquartile range is a used to compare a point against a distribution. It is defined as the distance of a given point from the median scaled by the interquartile range

\[ k = \frac{Q_2 - x}{Q_3 - Q_1} \]

This metric is similar to the z-score but more robust to outliers.

Percentile¶

The percentile rank is a non-parametric way of measuring a given data point relative to the distribution.

Comparators¶

All of the standard expression metrics in IPR are expected to be calculated against a reference distribution of expression samples. To this end, IPR provides a number of fields to record which distributions were used. This ensures that the final result is interperable and reproducible. A complete list of comparators can be found in the comparators section of the user manual.

Expression comparators fall into three main groups: disease, primary site, and biopsy site.

Disease¶

The reference distribution that most closely matches the diagnosis of the current sample

Primary Site¶

The reference distribution that most closely matches the non-diseased/normal expression of the primary site tissue

Biopsy Site¶

The reference distribution that most closely matches the non-diseased/normal expression of the biopsy site tissue. This is important for metastatic samples where the biopsy site and primary site differ.

Images¶

The use can optionally include expression density plots to allow the user to view the relative expression of the current sample compared to a specific distribution.

Info

These will be passed to the report upload function via the images section of the JSON input

key: expDensity\.(\S+)

In the above the pattern is expected to be expDensity.<gene name> where the gene name matches the gene name(s) used for the expression variant definitions. Where these plots are included for the genes listed as variants they will be shown along with the expression data in the expression variants section.

expression density plot

In the interface these will appear in the actions tab where available.

image action

This will bring up the expression density plot in a popup

image popup