Loading Data¶

We have provided a number of modules to automate loading external resources into GraphKB. Users can pick and choose which resources they would like to load or use the snakemake pipeline to load them all (see instructions here). This will download and load content by default into your newly created GraphKB instance.

Popular Resources¶

Most popular resources which have pre-built loaders provided for GraphKB are listed below. However, for an exhaustive list of all possible loaders, please see the loader project itself.

Cancer Genome Interpreter¶

https://www.cancergenomeinterpreter.org/home

CC BY-NC 4.0

This is an external knowledge base which can be imported as statements into GraphKB.
ChEMBL¶

https://www.ebi.ac.uk/chembl

CC BY-SA 3.0

Drug definitions and relationships can be loaded from ChEMBL via their REST API.
CIViC¶

https://civicdb.org

CC0 1.0

This is an external knowledge base which can be imported as statements into GraphKB.
ClinicalTrials.gov¶

https://clinicaltrials.gov/ct2/home

Attribution

Contains details for clinical trials around the world. Where possible the drugs and disease terms associated with the trial are matched and linked to the trial when the data is loaded.
COSMIC¶

https://cancer.sanger.ac.uk/cosmic

Non-commercial

Catalogue of Somatic Mutations in Cancer. Loaders are written for importing both the resistance mutations as well as recurrent fusions information.
DGIdb¶

https://www.dgidb.org

Open Access

Loads Gene-Drug Interactions into GraphKB. These are used in exploring novel mutation targets.
Disease Ontology¶

https://disease-ontology.org

CC0 1.0 Universal

Disease definitions and relationships are loaded from Data files provided by the Disease Ontology.
DoCM¶

http://docm.info

CC BY 4.0

This is an external knowledge base which can be imported as statements into GraphKB.
DrugBank¶

https://go.drugbank.com

Attribution-NonCommercial 4.0 International

Drug Definitions and relationships along with cross references to the FDA drugs list are loaded from the XML database dumps of DrugBank.
Ensembl¶

https://uswest.ensembl.org/index.html

No Restrictions

Gene, Transcript, and Protein definitions as well as cross-mappings to RefSeq versions.
Entrez API¶

https://www.ncbi.nlm.nih.gov/books/NBK25501

No Restrictions

Module used in other loaders for fetching publications (PubMed, PMC); genes (Entrez gene); RS IDs (snp), etc. from the NCBI Entrez API utitlies.
FDA Approval Announcements¶

https://www.fda.gov/drugs/resources-information-approved-drugs/hematologyoncology-cancer-approvals-safety-notifications

Parses Oncology Approval Announcements from the FDA site, stores as evidence items.
FDA SRS¶

https://precision.fda.gov/uniisearch

The FDA global substance registration system contains drug definitions and names.
GraphKB Ontology JSON¶

https://github.com/bcgsc/pori_graphkb_loader/tree/master/src/ontology

This loads a simple JSON format describing a set of ontology terms. We have included some examples and helpful ontology JSON files in the data folder of the corresponding repository.
HGNC¶

https://www.genenames.org

No Restrictions

Gene names and definitions as well as cross-mappings to several other gene resources such as ensembl and entrez.
MOAlmanac¶

https://moalmanac.org

ODbL v1.0

A collection of putative alteration/action relationships identified in clinical, preclinical, and inferential studies.
NCIt¶

https://ncithesaurus.nci.nih.gov/ncitbrowser

CC BY 4.0

NCI Thesaurus which contains therapies, anatomical entities, and disease definitions.
OncoKB¶

https://www.oncokb.org

Restricted

This is a legacy loader. It is written to load the actionability JSON files provided by OncoKB. As this is not an open data resource, using this loader will require licensing specific to your user/instance. This is an external knowledge base which can be imported as statements into GraphKB.
Uberon¶

https://uberon.github.io

CC BY 3.0

The uberon ontology contains anatomical entity definitions.

Custom Content¶

If you have your own instance of GraphKB and would like to transform your existing knowledge base to load it into GraphKB please look at the other knowledge base loaders for examples. There are some commonly used helper modules and functions available in the code base to make this process simpler. You can see documentation for individual loaders grouped with their loader (See their corresponding README.md).

src/
`--loader/
  |-- index.js
  `-- README.md

If you have any issues or questions please make an issue in the loaders repo.

Loading Content¶

For convenience, a snakemake workflow is included to run all available loaders in an optimal order to initialize the content in a new instance of GraphKB. This is done via python snakemake. To set up snakemake in a virtual environment run the following

python3 -m venv venv
source venv/bin/activate
pip install -U pip setuptools wheel
pip install snakemake

Then the workflow can be run as follows (single core by default but can be adjusted depending on your server settings)

snakemake -j 1

default workflow

You will want to pass snakemake the specific GraphKB instance you are working with as well as the credentials of the user that will be uploading. If you have followed the docker install demo instructions this might looks something like this

snakemake -j 1 \
  --config gkb_user='graphkb_importer' \
  gkb_pass='secret' \
  gkb_url='http://localhost:8080/api'

The COSMIC and DrugBank options require licensing and are therefore not run by default. If you have a license to use them then you can include one or both of them by providing email and password as config parameters

snakemake -j 1 \
  --config drugbank_email="YOUR EMAIL" \
  drugbank_password="YOUR PASSWORD" \
  cosmic_email="YOUR EMAIL" \
  cosmic_password="YOUR PASSWORD"

full workflow

Loading Data¶

Popular Resources¶

Cancer Genome Interpreter¶

ChEMBL¶

CIViC¶

ClinicalTrials.gov¶

COSMIC¶

DGIdb¶

Disease Ontology¶

DoCM¶

DrugBank¶

Ensembl¶

Entrez API¶

FDA Approval Announcements¶

FDA SRS¶

GraphKB Ontology JSON¶

HGNC¶

MOAlmanac¶

NCIt¶

OncoKB¶

Uberon¶

Custom Content¶

Loading Content¶