Knowledge emergence - what we learn from 100K monthly publications

Artur Saudabayev
published on October 30, 2018

Publication and citation counts today are among the very few metrics measuring scholars' output and its value. Growing competition in academia pushes people to publish more subjecting them to the "Publish or Perish" phenomenon. The trend is highly controversial, with an increasing number of sceptics pointing at lowering quality of papers, let alone of bogus publications and "predatory journals". While the existing concerns are not far-fetched, one obvious fact is often overlooked amidst the controversy: the amount of impact research is growing naturally and is growing fast.

Years of increasing popularization and democratization of science, growing government and private sector investment into research, geographic expanse of “major publishing” countries all had its effect. More people doing science in more countries around the world with more resources leads to more findings. And more findings pave the way for even more research.

Since the early days of Causaly, we are trying to solve a very personal problem – a need for navigating vast amount of literature faster and better. As an academic in my past (if academia can really stay in one's past), I had to stay on top of the information flow in multiple domains: Short and systematic literature reviews, quick keyword searches to validate a hypothesis or to generate a new one were an inherent part of work and demanded a substantial amount of my time.

While the average human reading has its clear stated limits, there is hardly an end to scientific publishing growth. To address this imbalance and the growing gap, we have been building a machine reading platform, the one which processes millions of papers with superhuman speed and distills the knowledge (evidence) out of it into a knowledge graph.

A point of evidence can be defined as a relationship between two entities or events expressed in a statement such as "BRCA1 gene is linked to the breast cancer disease mechanism." There are tens of millions of biomedical publications which connect the entire world of biomedical concepts and events together. More evidence is becoming available daily, yet it’s becoming hardly reachable at scale.

Every month we process more than 100,000 scientific documents, adding several million relationships to our knowledge graph.
While the majority of relationships existed before, there is a vast amount of new knowledge discovered.

Starting October 2018, we are compiling all emerging knowledge into a report with the ability to scope it by domain of interest with the idea to provide periodic digests for people within their areas of expertise and interest.

As of the end of October, on the example of Liver Carcinoma we have more than 10,000 relationships for the "affectors" of Liver Carcinoma in our graph:


Within just one month of new publications, we discovered more than 200 new relationships "affecting" and "affected by" Liver Carcinoma as visualized below:


While the sole fact of a relationship between two entities can be generally known, the type of the connection can be re-discovered or re-defined:
There is a well established association between tripartite motif (TRIM) family of proteins and cancer cell growth (including the context of Liver Carcinoma). But among the new knowledge we discovered a work which linked the specific member of the TRIM family, TRIM52 (tripartite motif containing 52), with proliferation, migration and invasion of Hepatocellular Carcinoma Cells.

We believe our new feature can find a place in every researcher's professional life and help them cope with the ever increasing amounts of evidence published in academic journals.

Why poor target validation is costing pharmaceutical businesses millions

It has been well documented that only 1 out of 10 compounds that enter clinical trials makes it to market¹. That is an astonishing 90%...

  • Point of View

Watch Causaly’s panel discussion from Discovery Europe

On June 9th – 10th 2022, over 400 pharma leaders came to Berlin to take part in Discovery Europe...

  • General

Webinar - Knowledge discovery reimagined: finding new hypotheses with Causaly Cloud

It is well recognized among scientists that target selection can improve clinical trial success rates (1). Being able to stay up to date...

  • use case

Sign up for Causaly newsletter