Loading Data¶
We have provided a number of modules to automate loading external resources into GraphKB. Users can pick and choose which resources they would like to load or use the snakemake pipeline to load them all (see instructions here). This will download and load content by default into your newly created GraphKB instance.
Popular Resources¶
Most popular resources which have pre-built loaders provided for GraphKB are listed below. However, for an exhaustive list of all possible loaders, please see the loader project itself.
-
Cancer Genome Interpreter¶
https://www.cancergenomeinterpreter.org/home
This is an external knowledge base which can be imported as statements into GraphKB.
-
ChEMBL¶
Drug definitions and relationships can be loaded from ChEMBL via their REST API.
-
CIViC¶
This is an external knowledge base which can be imported as statements into GraphKB.
-
ClinicalTrials.gov¶
https://clinicaltrials.gov/ct2/home
Contains details for clinical trials around the world. Where possible the drugs and disease terms associated with the trial are matched and linked to the trial when the data is loaded.
-
COSMIC¶
https://cancer.sanger.ac.uk/cosmic
Catalogue of Somatic Mutations in Cancer. Loaders are written for importing both the resistance mutations as well as recurrent fusions information.
-
DGIdb¶
Loads Gene-Drug Interactions into GraphKB. These are used in exploring novel mutation targets.
-
Disease Ontology¶
Disease definitions and relationships are loaded from Data files provided by the Disease Ontology.
-
DoCM¶
This is an external knowledge base which can be imported as statements into GraphKB.
-
DrugBank¶
Attribution-NonCommercial 4.0 International
Drug Definitions and relationships along with cross references to the FDA drugs list are loaded from the XML database dumps of DrugBank.
-
Ensembl¶
https://uswest.ensembl.org/index.html
Gene, Transcript, and Protein definitions as well as cross-mappings to RefSeq versions.
-
Entrez API¶
https://www.ncbi.nlm.nih.gov/books/NBK25501
Module used in other loaders for fetching publications (PubMed, PMC); genes (Entrez gene); RS IDs (snp), etc. from the NCBI Entrez API utitlies.
-
FDA Approval Announcements¶
Parses Oncology Approval Announcements from the FDA site, stores as evidence items.
-
FDA SRS¶
https://precision.fda.gov/uniisearch
The FDA global substance registration system contains drug definitions and names.
-
GraphKB Ontology JSON¶
https://github.com/bcgsc/pori_graphkb_loader/tree/master/src/ontology
This loads a simple JSON format describing a set of ontology terms. We have included some examples and helpful ontology JSON files in the data folder of the corresponding repository.
-
HGNC¶
Gene names and definitions as well as cross-mappings to several other gene resources such as ensembl and entrez.
-
MOAlmanac¶
A collection of putative alteration/action relationships identified in clinical, preclinical, and inferential studies.
-
NCIt¶
https://ncithesaurus.nci.nih.gov/ncitbrowser
NCI Thesaurus which contains therapies, anatomical entities, and disease definitions.
-
OncoKB¶
This is a legacy loader. It is written to load the actionability JSON files provided by OncoKB. As this is not an open data resource, using this loader will require licensing specific to your user/instance. This is an external knowledge base which can be imported as statements into GraphKB.
-
Uberon¶
The uberon ontology contains anatomical entity definitions.
Custom Content¶
If you have your own instance of GraphKB and would like to transform your existing knowledge base to load it into GraphKB please look at the other knowledge base loaders for examples. There are some commonly used helper modules and functions available in the code base to make this process simpler. You can see documentation for individual loaders grouped with their loader (See their corresponding README.md).
src/
`--loader/
|-- index.js
`-- README.md
If you have any issues or questions please make an issue in the loaders repo.
Loading Content¶
For convenience, a snakemake workflow is included to run all available loaders in an optimal order to initialize the content in a new instance of GraphKB. This is done via python snakemake. To set up snakemake in a virtual environment run the following
python3 -m venv venv
source venv/bin/activate
pip install -U pip setuptools wheel
pip install snakemake
Then the workflow can be run as follows (single core by default but can be adjusted depending on your server settings)
snakemake -j 1
You will want to pass snakemake the specific GraphKB instance you are working with as well as the credentials of the user that will be uploading. If you have followed the docker install demo instructions this might looks something like this
snakemake -j 1 \
--config gkb_user='graphkb_importer' \
gkb_pass='secret' \
gkb_url='http://localhost:8080/api'
The COSMIC and DrugBank options require licensing and are therefore not run by default. If you have a license to use them then you can include one or both of them by providing email and password as config parameters
snakemake -j 1 \
--config drugbank_email="YOUR EMAIL" \
drugbank_password="YOUR PASSWORD" \
cosmic_email="YOUR EMAIL" \
cosmic_password="YOUR PASSWORD"