Knowledge emergence - what we learn from 100K monthly publications

Artur Saudabayev
published on October 30, 2018

Publication and citation counts today are among the very few metrics measuring scholars' output and its value. Growing competition in academia pushes people to publish more subjecting them to the "Publish or Perish" phenomenon. The trend is highly controversial, with an increasing number of sceptics pointing at lowering quality of papers, let alone of bogus publications and "predatory journals". While the existing concerns are not far-fetched, one obvious fact is often overlooked amidst the controversy: the amount of impact research is growing naturally and is growing fast.

Years of increasing popularization and democratization of science, growing government and private sector investment into research, geographic expanse of “major publishing” countries all had its effect. More people doing science in more countries around the world with more resources leads to more findings. And more findings pave the way for even more research.

Since the early days of Causaly, we are trying to solve a very personal problem – a need for navigating vast amount of literature faster and better. As an academic in my past (if academia can really stay in one's past), I had to stay on top of the information flow in multiple domains: Short and systematic literature reviews, quick keyword searches to validate a hypothesis or to generate a new one were an inherent part of work and demanded a substantial amount of my time.

While the average human reading has its clear stated limits, there is hardly an end to scientific publishing growth. To address this imbalance and the growing gap, we have been building a machine reading platform, the one which processes millions of papers with superhuman speed and distills the knowledge (evidence) out of it into a knowledge graph.

A point of evidence can be defined as a relationship between two entities or events expressed in a statement such as "BRCA1 gene is linked to the breast cancer disease mechanism." There are tens of millions of biomedical publications which connect the entire world of biomedical concepts and events together. More evidence is becoming available daily, yet it’s becoming hardly reachable at scale.

Every month we process more than 100,000 scientific documents, adding several million relationships to our knowledge graph.
While the majority of relationships existed before, there is a vast amount of new knowledge discovered.

Starting October 2018, we are compiling all emerging knowledge into a report with the ability to scope it by domain of interest with the idea to provide periodic digests for people within their areas of expertise and interest.

As of the end of October, on the example of Liver Carcinoma we have more than 10,000 relationships for the "affectors" of Liver Carcinoma in our graph:


Within just one month of new publications, we discovered more than 200 new relationships "affecting" and "affected by" Liver Carcinoma as visualized below:


While the sole fact of a relationship between two entities can be generally known, the type of the connection can be re-discovered or re-defined:
There is a well established association between tripartite motif (TRIM) family of proteins and cancer cell growth (including the context of Liver Carcinoma). But among the new knowledge we discovered a work which linked the specific member of the TRIM family, TRIM52 (tripartite motif containing 52), with proliferation, migration and invasion of Hepatocellular Carcinoma Cells.

We believe our new feature can find a place in every researcher's professional life and help them cope with the ever increasing amounts of evidence published in academic journals.

Preclinical safety analysis using Artificial Intelligence on the example of Alzheimer’s Disease.

Preclinical safety analysis using Artificial Intelligence on the example of Alzheimer’s Disease.

Causaly AI enables researchers to identify safety-relevant information in medical literature regarding a drug candidate. Preclinical experts can include this data in the preclinical study design to minimize the risks of unforeseen toxicities and increase chances of approval.

AI-supported Target identification for Systemic Lupus Erythematosus.

AI-supported Target identification for Systemic Lupus Erythematosus.

In the field of SLE, with over 2,000 scientific papers published in 2020 alone, target identification experts need to stay on top of recent advancements. Causaly facilitates this process and enables potential target identification, investigation of underlying mechanism of action and druggability.

How Causaly AI is transforming translational research: Interview with Imad Yassin

How Causaly AI is transforming translational research: Interview with Imad Yassin

An interview with Imad Yassin on how AI is transforming translational research and why Causaly is one of the biggest game-changers in the field he's seen to date.

Sign up for Causaly newsletter