What is the DRKG?
Drug Repurposing Knowledge Graph (DRKG) is a comprehensive biological knowledge graph relating genes, compounds, diseases, biological processes, side effects and symptoms. DRKG includes information from six existing databases including DrugBank, Hetionet, GNBR, String, IntAct and DGIdb, and more. It includes 97,238 entities belonging to 13 entity-types; and 5,874,261 triplets belonging to 107 edge-types. These 107 edge-types show a type of interaction between one of the 17 entity-type pairs (multiple types of interactions are possible between the same entity-pair), as depicted in the figure below.
With DRKG you are able to score potential relations between 2 previously unconnected entities.
How does the DRKG work?
Graphs are made up of head, tails and relations between the heads and tails. In the case of DRKG, heads and tails are represented by the following formats:
Where Entity-type can be any of the 13 types shown in the top image i.e. Side-effect, Atc, Pharmacologic class, Pathway, Compound, Symptom, Disease, Molecular function, Gene, Anatomy, Biological process, Tax, Cellular component.
The paper found at the bottom of this section has the following to say about the IDs of heads and tails:
“Data sources use one of several ID spaces to represent genes, compounds, diseases and others. For example, the same chemical compound may be represented in the drugbank compound ID space in DrugBank and in the chembl compound ID space in the DGIdb. To ensure that information from different sources in integrating correctly, we map biological entities to a common ID space using the following rules:
These rules are applied to the biological entities per database to map the entities to the common ID space. Finally, in order to avoid relations for which we do not have enough data to train good embeddings, we exclude relations types that have less than 50 edges”
Relations between head and tails are formatted:
<Database>::<Relation type>::<Head Category>:<Tail Category>
e.g. GNBR::Sa::Compound:Disease or INTACT::UBIQUITINATION REACTION::Gene:Gene
The possible databases are GNBR, Drugbank, Hetionet, STRING, IntAct, DGIdb, bioarx. Each database will have its own relation type:
This is the score of the head, tail and relation that represents how likely the given head, tail, relation triplet is. The score will be calculated for every h,r,t combination. It is calculated by the following:
$$ \mathbf{d} = \gamma - ||\mathbf{h}+\mathbf{r}-\mathbf{t}||_{2} $$
$$ \mathbf{score} = \log\left(\frac{1}{1+\exp(\mathbf{-d})}\right) $$
$\mathbf{h}$, $\mathbf{r}$, $\mathbf{t}$ are embeddings of the head, relation and tail calculated by maximising the following model for h,r,t triplets which do exist and minimizing for triplets that don’t exist:
$$ \textup{min} \sum_{\mathbf{h},\mathbf{r},\mathbf{t} \in\mathbb{D}^+\cup \mathbb{D}^-} \textup{log}(1+\textup{exp}(-y\times f(\mathbf{h},\mathbf{r},\mathbf{t})) $$
All scores will be less than 0. The closer the score is to 0 the stronger $\mathbf{h}$ will have $\mathbf{r}$ with $\mathbf{t}$
How to use our DRKG app?
Our DRKG app takes 3 file inputs. Note, all input entities and relations must already be in the DRKG (see entities.csv and relations.csv below):
A csv file of head entities e.g:
A csv file of tail entities e.g:
A csv file of relations e.g:
The examples included above are the same as the example data in the DRKG app.
Files containing all possible entities and all possible relations:
Or view them in browser here (note: may take a few seconds to load when scrolling through database):
The DRKG app will return a list of scores for all head, relation, tails triplets available to download. For example:
In the results page of the DRKG app,
If the DRKG app receives input entities or relations it cannot find in the knowledge graph, it will return these not found in a table in the results page.