GraphKB Variant Matching Tutorial¶
This tutorial is an interactive notebook which can be run using google colab or a local jupyter server (recommended if matching patient data). This tutorial will cover basic matching of variants using the python GraphKB adapter against an instance of the GraphKB API.
Users must first have login credentials to an instance of GraphKB API (or use the demo server). Note for users using the demo credentials and server, the data is limited and more complete annotations would be expected for a production instance of GraphKB.
For the purposes of this tutorial we will be matching the known KRAS variant p.G12D
to the demo instance of GraphKB. You can adjust the API instance by changing the setup variables below
To run this locally, download this file and start the server from the command line as follows
jupyter notebook notebook.ipynb
You should now be able to see the notebook by opening http://localhost:8888
in your browser
!pip3 install graphkb
from graphkb import GraphKBConnection
GKB_API_URL = 'https://pori-demo.bcgsc.ca/graphkb-api/api'
GKB_USER = 'colab_demo'
GKB_PASSWORD = 'colab_demo'
graphkb_conn = GraphKBConnection(GKB_API_URL, use_global_cache=False)
graphkb_conn.login(GKB_USER, GKB_PASSWORD)
Matching Variants¶
Now you are ready to match variants
from graphkb.match import match_positional_variant
variant_name = 'KRAS:p.G12D'
variant_matches = match_positional_variant(graphkb_conn, variant_name)
print(f'{variant_name} matched {len(variant_matches)} other variant representations')
print()
for match in variant_matches:
print(variant_name, 'will match', match['displayName'])
We can see above that the KRAS protein variant has been matched to a number of other less specific mentions (ex. KRAS:p.G12mut) and also genomic equivalents (chr12:g.25398284C>T). Note that the results here will be dependent on the instance of GraphKB you are accessing.
Annotating Variants¶
Now that we have matched the variant we will fetch the related statements to annotate this variant with its possible relevance
from graphkb.constants import BASE_RETURN_PROPERTIES, GENERIC_RETURN_PROPERTIES
from graphkb.util import convert_to_rid_list
# return properties should be customized to the users needs
return_props = (
BASE_RETURN_PROPERTIES
+ ['sourceId', 'source.name', 'source.displayName']
+ [f'conditions.{p}' for p in GENERIC_RETURN_PROPERTIES]
+ [f'subject.{p}' for p in GENERIC_RETURN_PROPERTIES]
+ [f'evidence.{p}' for p in GENERIC_RETURN_PROPERTIES]
+ [f'relevance.{p}' for p in GENERIC_RETURN_PROPERTIES]
+ [f'evidenceLevel.{p}' for p in GENERIC_RETURN_PROPERTIES]
)
statements = graphkb_conn.query(
{
'target': 'Statement',
'filters': {'conditions': convert_to_rid_list(variant_matches), 'operator': 'CONTAINSANY'},
'returnProperties': return_props,
}
)
print(f'annotated {len(variant_matches)} variant matches with {len(statements)} statements')
print()
for statement in statements[:5]:
print(
[c['displayName'] for c in statement['conditions'] if c['@class'].endswith('Variant')],
statement['relevance']['displayName'],
statement['subject']['displayName'],
statement['source']['displayName'] if statement['source'] else '',
[c['displayName'] for c in statement['evidence']],
)
Categorizing Statements¶
Something we often want to know is if a statement is therapeutic, or prognostic, etc. The naive approach is to base this on a list of known terms or a regex pattern. In GraphKB we can leverage the ontology structure instead.
In this example we will look for all terms that would indicate a therapeutically relevent statement.
To do this we pick our 'base' terms. These are the terms we consider to be the highest level of the ontology tree, the most general term for that category.
from graphkb.vocab import get_term_tree
BASE_THERAPEUTIC_TERMS = 'therapeutic efficacy'
therapeutic_terms = get_term_tree(graphkb_conn, BASE_THERAPEUTIC_TERMS, include_superclasses=False)
print(f'Found {len(therapeutic_terms)} equivalent terms')
for term in therapeutic_terms:
print('-', term['name'])
We can filter the statements we have already retrieved, or we can add this to our original query and filter before we retrive from the API
statements = graphkb_conn.query(
{
'target': 'Statement',
'filters': {
'AND': [
{'conditions': convert_to_rid_list(variant_matches), 'operator': 'CONTAINSANY'},
{'relevance': convert_to_rid_list(therapeutic_terms), 'operator': 'IN'},
]
},
'returnProperties': return_props,
}
)
for statement in statements:
print(
[c['displayName'] for c in statement['conditions'] if c['@class'].endswith('Variant')],
statement['relevance']['displayName'],
statement['subject']['displayName'],
statement['source']['displayName'] if statement['source'] else '',
[c['displayName'] for c in statement['evidence']],
)