Query Basics¶

Documentation for the API can be seen via its OpenAPI specification at /api/spec. Here we will cover just the query endpoint which is the most commonly used endpoint as it is used for all searches. The /query endpoint accepts a JSON body in a POST request. This is how the user passes filters and other search-related parameters. We will define a few of the important fields and concepts that are used below.

All the running examples below use the python GraphKB adapter. This assumes the user has aready initialized the connector and logged in as shown below (using the demo database and credentials).

First install the adapter

!pip install graphkb

Collecting graphkb
  Downloading graphkb-1.5.4-py3-none-any.whl (33 kB)
Requirement already satisfied: requests<3,>=2.22.0 in /usr/local/lib/python3.7/dist-packages (from graphkb) (2.23.0)
Requirement already satisfied: typing-extensions<4,>=3.7.4.2 in /usr/local/lib/python3.7/dist-packages (from graphkb) (3.7.4.3)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests<3,>=2.22.0->graphkb) (1.24.3)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests<3,>=2.22.0->graphkb) (2021.5.30)
Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests<3,>=2.22.0->graphkb) (3.0.4)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests<3,>=2.22.0->graphkb) (2.10)
Installing collected packages: graphkb
Successfully installed graphkb-1.5.4

Then set up the connector

from graphkb import GraphKBConnection

GKB_API_URL = 'https://pori-demo.bcgsc.ca/graphkb-api/api'
GKB_USER = 'colab_demo'
GKB_PASSWORD = 'colab_demo'

graphkb_conn = GraphKBConnection(GKB_API_URL)

graphkb_conn.login(GKB_USER, GKB_PASSWORD)

Important Fields and Concepts¶

Query Target¶

The target is the class/table that the users wishes to query. If it is at the top level of the request body then it is also the type of record which will be returned. For example to get a list of all publications in GraphKB. We limit this to the first 3 publications for the purposes of this demo

graphkb_conn.query({
    'target': 'Publication'
}, paginate=False, limit=3)

[{'@class': 'Publication',
  '@rid': '#38:0',
  'alias': False,
  'createdAt': 1612980878029,
  'createdBy': '#14:0',
  'deprecated': False,
  'displayName': 'pmid:25500544',
  'journalName': 'oncogene',
  'name': 'the landscape and therapeutic relevance of cancer-associated transcript fusions.',
  'source': '#17:21',
  'sourceId': '25500544',
  'updatedAt': 1612980878029,
  'updatedBy': '#14:0',
  'url': 'https://pubmed.ncbi.nlm.nih.gov/25500544',
  'uuid': '1294db97-ee26-4bd4-9b50-d122436905be',
  'year': 2015},
 {'@class': 'Publication',
  '@rid': '#38:1',
  'alias': False,
  'createdAt': 1612981149054,
  'createdBy': '#14:0',
  'deprecated': False,
  'displayName': 'pmid:16081687',
  'journalName': 'blood',
  'name': 'the jak2v617f activating mutation occurs in chronic myelomonocytic leukemia and acute myeloid leukemia, but not in acute lymphoblastic leukemia or chronic lymphocytic leukemia.',
  'source': '#17:21',
  'sourceId': '16081687',
  'updatedAt': 1612981149054,
  'updatedBy': '#14:0',
  'url': 'https://pubmed.ncbi.nlm.nih.gov/16081687',
  'uuid': '52e0d70f-07f8-48b9-b59d-b0258d60b9ae',
  'year': 2005},
 {'@class': 'Publication',
  '@rid': '#38:2',
  'alias': False,
  'createdAt': 1612981150038,
  'createdBy': '#14:0',
  'deprecated': False,
  'displayName': 'pmid:15146165',
  'journalName': 'laboratory investigation; a journal of technical methods and pathology',
  'name': 'a great majority of gists with pdgfra mutations represent gastric tumors of low or no malignant potential.',
  'source': '#17:21',
  'sourceId': '15146165',
  'updatedAt': 1612981150038,
  'updatedBy': '#14:0',
  'url': 'https://pubmed.ncbi.nlm.nih.gov/15146165',
  'uuid': '9e600c55-3247-4e24-bcd8-4a20bb0fb794',
  'year': 2004}]

Filters¶

Any field that is accessible with the current users permissions level can be queried via this endpoint. Most commonly users want to filter on this like a records name or source ID (ID in the external database it was imported from). Continuing our example from above let's search for publications with the word "cancer" in them.

Note: The current full text index only searches on word and word prefixes. Future iterations will support a full lucene index.

graphkb_conn.query({
    'target': 'Publication',
    'filters': {'name': 'cancer', 'operator': 'CONTAINSTEXT'}
}, paginate=False, limit=3)

[{'@class': 'Publication',
  '@rid': '#38:0',
  'alias': False,
  'createdAt': 1612980878029,
  'createdBy': '#14:0',
  'deprecated': False,
  'displayName': 'pmid:25500544',
  'journalName': 'oncogene',
  'name': 'the landscape and therapeutic relevance of cancer-associated transcript fusions.',
  'source': '#17:21',
  'sourceId': '25500544',
  'updatedAt': 1612980878029,
  'updatedBy': '#14:0',
  'url': 'https://pubmed.ncbi.nlm.nih.gov/25500544',
  'uuid': '1294db97-ee26-4bd4-9b50-d122436905be',
  'year': 2015},
 {'@class': 'Publication',
  '@rid': '#38:19',
  'alias': False,
  'createdAt': 1612981162739,
  'createdBy': '#14:0',
  'deprecated': False,
  'displayName': 'pmid:24666267',
  'journalName': 'acta oncologica (stockholm, sweden)',
  'name': 'the predictive value of kras, nras, braf, pik3ca and pten for anti-egfr treatment in metastatic colorectal cancer: a systematic review and meta-analysis.',
  'source': '#17:21',
  'sourceId': '24666267',
  'updatedAt': 1612981162739,
  'updatedBy': '#14:0',
  'url': 'https://pubmed.ncbi.nlm.nih.gov/24666267',
  'uuid': '06981f31-59d0-439e-b5cd-71f503f9c50e',
  'year': 2014},
 {'@class': 'Publication',
  '@rid': '#38:22',
  'alias': False,
  'createdAt': 1612981164833,
  'createdBy': '#14:0',
  'deprecated': False,
  'displayName': 'pmid:21030459',
  'journalName': 'cancer research',
  'name': 'the neuroblastoma-associated f1174l alk mutation causes resistance to an alk kinase inhibitor in alk-translocated cancers.',
  'source': '#17:21',
  'sourceId': '21030459',
  'updatedAt': 1612981164833,
  'updatedBy': '#14:0',
  'url': 'https://pubmed.ncbi.nlm.nih.gov/21030459',
  'uuid': '4f088978-1736-4e3e-83fb-6d2c70e88873',
  'year': 2010}]

You can also filter on multiple conditions. To do this we nest filters in an object which uses a single AND/OR property with a list of regular conditions. For example if we want to find diseases with the name "cancer" or "carcinoma"

graphkb_conn.query({
    'target': 'Disease',
    'filters': {
        'OR': [
            {'name': 'cancer'},
            {'name': 'carcinoma'},
        ]
    },
})

[{'@class': 'Disease',
  '@rid': '#43:99077',
  'alias': False,
  'createdAt': 1612926192944,
  'createdBy': '#14:0',
  'deprecated': True,
  'displayName': 'carcinoma',
  'name': 'carcinoma',
  'out_DeprecatedBy': ['#29:1400'],
  'source': '#17:19',
  'sourceId': 'doid:2428',
  'updatedAt': 1612926192944,
  'updatedBy': '#14:0',
  'uuid': '2c61fd80-43fb-4cd2-941c-889a020cbbde'},
 {'@class': 'Disease',
  '@rid': '#43:99076',
  'alias': False,
  'createdAt': 1612926192912,
  'createdBy': '#14:0',
  'deprecated': True,
  'displayName': 'carcinoma',
  'name': 'carcinoma',
  'out_DeprecatedBy': ['#29:1399'],
  'source': '#17:19',
  'sourceId': 'doid:6570',
  'updatedAt': 1612926192912,
  'updatedBy': '#14:0',
  'uuid': 'baedee00-47a8-4d78-8ff5-7d9d8fadc03f'},
 {'@class': 'Disease',
  '@rid': '#43:99072',
  'alias': False,
  'createdAt': 1612926192816,
  'createdBy': '#14:0',
  'deprecated': False,
  'description': 'A cell type cancer that has_material_basis_in abnormally proliferating cells derives_from epithelial cells.',
  'displayName': 'carcinoma',
  'history': '#43:107676',
  'in_AliasOf': ['#26:160594', '#26:160595', '#26:160596'],
  'in_DeprecatedBy': ['#29:1399', '#29:1400'],
  'in_SubClassOf': [],
  'name': 'carcinoma',
  'out_CrossReferenceOf': ['#28:37788'],
  'out_SubClassOf': ['#33:13951'],
  'source': '#17:19',
  'sourceId': 'doid:305',
  'subsets': ['doid#do_flybase_slim',
   'doid#ncithesaurus',
   'doid#do_cancer_slim'],
  'updatedAt': 1612980618003,
  'updatedBy': '#14:0',
  'uuid': '1775c2a3-b923-49f4-9e28-d5ccfcb32bc3'},
 {'@class': 'Disease',
  '@rid': '#43:68962',
  'alias': True,
  'createdAt': 1612863235878,
  'createdBy': '#14:0',
  'deprecated': False,
  'displayName': 'cancer [c9305]',
  'name': 'cancer',
  'out_AliasOf': ['#26:137530'],
  'source': '#17:0',
  'sourceId': 'c9305',
  'updatedAt': 1612863235878,
  'updatedBy': '#14:0',
  'uuid': 'ed0fffc2-31ef-435b-ae11-6efd6b193dd3'},
 {'@class': 'Disease',
  '@rid': '#43:100548',
  'alias': False,
  'createdAt': 1612926228848,
  'createdBy': '#14:0',
  'deprecated': False,
  'description': 'A disease of cellular proliferation that is malignant and primary, characterized by uncontrolled cellular proliferation, local cell invasion and metastasis.',
  'displayName': 'cancer',
  'history': '#43:107678',
  'in_AliasOf': ['#26:161531', '#26:161532', '#26:161533'],
  'in_SubClassOf': ['#33:5210', '#33:5268'],
  'name': 'cancer',
  'out_CrossReferenceOf': ['#28:37957'],
  'out_ElementOf': ['#30:82543'],
  'out_SubClassOf': ['#33:7594'],
  'source': '#17:19',
  'sourceId': 'doid:162',
  'subsets': ['doid#do_flybase_slim',
   'doid#ncithesaurus',
   'doid#do_cancer_slim',
   'doid#do_agr_slim',
   'doid#do_gxd_slim'],
  'updatedAt': 1612980640268,
  'updatedBy': '#14:0',
  'uuid': '6a051270-4611-4af2-a5ff-1b31a872b4e0'}]

The operator can be omitted here since = is the default operator. We can also combine conditions with AND

graphkb_conn.query({
    'target': 'Disease',
    'filters': {
        'AND': [
            {'name': 'cancer', 'operator': 'CONTAINSTEXT'},
            {'name': 'pancreatic', 'operator': 'CONTAINSTEXT'},
        ]
    },
}, paginate=False, limit=3)

[{'@class': 'Disease',
  '@rid': '#43:1683',
  'alias': True,
  'createdAt': 1612854000193,
  'createdBy': '#14:0',
  'deprecated': False,
  'displayName': 'recurrent pancreatic neuroendocrine cancer [c115433]',
  'name': 'recurrent pancreatic neuroendocrine cancer',
  'out_AliasOf': ['#26:5496'],
  'source': '#17:0',
  'sourceId': 'c115433',
  'updatedAt': 1612854000193,
  'updatedBy': '#14:0',
  'uuid': 'a0831763-7bcb-4c68-a9dc-7aee6b3795c3'},
 {'@class': 'Disease',
  '@rid': '#43:8254',
  'alias': True,
  'createdAt': 1612855453528,
  'createdBy': '#14:0',
  'deprecated': False,
  'displayName': 'pancreatic cancer by ajcc v6 and v7 stage [c134902]',
  'name': 'pancreatic cancer by ajcc v6 and v7 stage',
  'out_AliasOf': ['#26:18191'],
  'source': '#17:0',
  'sourceId': 'c134902',
  'updatedAt': 1612855453528,
  'updatedBy': '#14:0',
  'uuid': '21e9a523-1982-4bf0-824d-c50bbe9b11b9'},
 {'@class': 'Disease',
  '@rid': '#43:8255',
  'alias': True,
  'createdAt': 1612855453548,
  'createdBy': '#14:0',
  'deprecated': False,
  'displayName': 'exocrine and endocrine pancreatic cancer by ajcc v6 and v7 stage [c134902]',
  'name': 'exocrine and endocrine pancreatic cancer by ajcc v6 and v7 stage',
  'out_AliasOf': ['#26:18192'],
  'source': '#17:0',
  'sourceId': 'c134902',
  'updatedAt': 1612855453548,
  'updatedBy': '#14:0',
  'uuid': 'd66fdfc2-578c-49cf-b3d7-cfe693fae104'}]

The above will look for diseases that have both 'cancer' and 'pancreatic' in the name.

Subquery Filters¶

Sometimes we would like to filter records on a linked field (essentially a foreign key). We can do this with subquery filters.

graphkb_conn.query({
    'target': 'Disease',
    'filters': {
        'source': {'target': 'Source', 'filters': {'name': 'disease ontology'}}
    },
}, paginate=False, limit=3)

[{'@class': 'Disease',
  '@rid': '#43:72269',
  'alias': True,
  'createdAt': 1612924874193,
  'createdBy': '#14:0',
  'deprecated': False,
  'displayName': 'von reklinghausen disease',
  'name': 'von reklinghausen disease',
  'out_AliasOf': ['#26:145059'],
  'source': '#17:19',
  'sourceId': 'doid:8712',
  'updatedAt': 1612924874193,
  'updatedBy': '#14:0',
  'uuid': '5647ae2d-8837-48fd-8b46-7669ec046e8e'},
 {'@class': 'Disease',
  '@rid': '#43:72251',
  'alias': True,
  'createdAt': 1612924873047,
  'createdBy': '#14:0',
  'deprecated': False,
  'displayName': 'other named variants of lymphosarcoma and reticulosarcoma involving lymph nodes of axilla and upper limb',
  'name': 'other named variants of lymphosarcoma and reticulosarcoma involving lymph nodes of axilla and upper limb',
  'out_AliasOf': ['#26:145052'],
  'source': '#17:19',
  'sourceId': 'doid:8716',
  'updatedAt': 1612924873047,
  'updatedBy': '#14:0',
  'uuid': '9586920c-fc10-4c5d-9cdb-f92d234d7cb3'},
 {'@class': 'Disease',
  '@rid': '#43:72256',
  'alias': True,
  'createdAt': 1612924873474,
  'createdBy': '#14:0',
  'deprecated': False,
  'displayName': 'other named variants of lymphosarcoma and reticulosarcoma involving intrapelvic lymph nodes',
  'name': 'other named variants of lymphosarcoma and reticulosarcoma involving intrapelvic lymph nodes',
  'out_AliasOf': ['#26:145057'],
  'source': '#17:19',
  'sourceId': 'doid:8716',
  'updatedAt': 1612924873474,
  'updatedBy': '#14:0',
  'uuid': 'e8cf5e88-9624-4900-8986-177e607fd95a'}]

Above we are only returning disease records that have been imported from the disease ontology.

Return Properties (Fields)¶

The return fields property allows the user to specify what they would like to return. This can mean returning a subset of fields for a large query to improve the speed of the client digesting the data, or it can be used to de-nest fields. By default the query will return only the immediate properties of the class being queries. This means that linked fields will be listed as their record ID. De-nesting these fields allows you to return them without additional queries.

graphkb_conn.query({
    'target': 'Disease',
    'filters': {
        'AND': [
            {'source': {'target': 'Source', 'filters': {'name': 'disease ontology'}}},
            {'name': 'cancer'}
        ],
    },
})

[{'@class': 'Disease',
  '@rid': '#43:100548',
  'alias': False,
  'createdAt': 1612926228848,
  'createdBy': '#14:0',
  'deprecated': False,
  'description': 'A disease of cellular proliferation that is malignant and primary, characterized by uncontrolled cellular proliferation, local cell invasion and metastasis.',
  'displayName': 'cancer',
  'history': '#43:107678',
  'in_AliasOf': ['#26:161531', '#26:161532', '#26:161533'],
  'in_SubClassOf': ['#33:5210', '#33:5268'],
  'name': 'cancer',
  'out_CrossReferenceOf': ['#28:37957'],
  'out_ElementOf': ['#30:82543'],
  'out_SubClassOf': ['#33:7594'],
  'source': '#17:19',
  'sourceId': 'doid:162',
  'subsets': ['doid#do_flybase_slim',
   'doid#ncithesaurus',
   'doid#do_cancer_slim',
   'doid#do_agr_slim',
   'doid#do_gxd_slim'],
  'updatedAt': 1612980640268,
  'updatedBy': '#14:0',
  'uuid': '6a051270-4611-4af2-a5ff-1b31a872b4e0'}]

We probably are not interested in all of these fields so let's pick a few to return.

graphkb_conn.query({
    'target': 'Disease',
    'filters': {
        'AND': [
            {'source': {'target': 'Source', 'filters': {'name': 'disease ontology'}}},
            {'name': 'cancer'}
        ],
    },
    'returnProperties': ['name', 'source', 'sourceId', 'alias', 'deprecated']
})

[{'alias': False,
  'deprecated': False,
  'name': 'cancer',
  'source': '#17:19',
  'sourceId': 'doid:162'}]

The new return looks much more reasonable. However the source field right now is a seperate record ID. This means with the current query we would have to fetch that record separately if we want to see details about it. This can be done in a single query with the nested return properties. Simply delimit properties and sub-properties with a period.

graphkb_conn.query({
    'target': 'Disease',
    'filters': {
        'AND': [
            {'source': {'target': 'Source', 'filters': {'name': 'disease ontology'}}},
            {'name': 'cancer'}
        ],
    },
    'returnProperties': ['name', 'source.name', 'sourceId', 'alias', 'deprecated']
})

[{'alias': False,
  'deprecated': False,
  'name': 'cancer',
  'source': {'name': 'disease ontology'},
  'sourceId': 'doid:162'}]