Discovering drug hypotheses through open biomedical literature

Open-source infrastructure for literature-based drug discovery.

Example extracted claim
{
  "subject": "disulfiram",
  "subject_type": "drug",
  "predicate": "inhibits",
  "object": "ALDH1A3",
  "object_type": "gene",
  "context": "glioblastoma stem cells",
  "polarity": "positive",
  "confidence": 0.85
}

The Problem

Over 4,000 biomedical papers are published every day. Knowledge stays fragmented across journals, disciplines, and languages. The connections that would lead to new treatments often already exist in the published literature, but no human team can find them systematically.

Approach

Robertium reads open biomedical literature, extracts structured claims about drugs, genes, and diseases, and connects them into a knowledge graph. From this graph, it surfaces contradictions, gaps, and reasoning chains that point to new drug repurposing hypotheses.

The pipeline is domain-agnostic. The first domain is glioblastoma. The second will be epilepsy.

Read

Open biomedical literature from OpenAlex, PubMed, bioRxiv

Extract

Structured claims about drugs, genes, diseases via language models

Connect

Knowledge graph reveals contradictions, gaps, and reasoning chains

Open

Robertium is open-source under the MIT license. The code, the extracted claims, and the knowledge graph are all freely available.

This is intentional. Drug discovery infrastructure should belong to the scientific community, not to private platforms.

MIT License Open data Active development

Current Status

9,933
papers
ingested
4,628
passed
filtering
13,000+
structured
claims
In progress
knowledge
graph

The first domain — glioblastoma — is being processed. This page is updated as work progresses.

Roadmap

Done

Foundation

Glioblastoma corpus processed. Knowledge graph operational. First contradictions identified.

Active

Second domain

Epilepsy added. Pipeline validated as domain-agnostic.

Planned

Publication

Preprint on bioRxiv describing methodology and findings.

Planned

Expansion

Additional therapeutic areas where treatment is incomplete.

Get in Touch