Knowledge emergence - what we learn from 100K monthly publications

Artur Saudabayev
published on October 30, 2018

Publication and citation counts today are among the very few metrics measuring scholars' output and its value. Growing competition in academia pushes people to publish more subjecting them to the "Publish or Perish" phenomenon. The trend is highly controversial, with an increasing number of sceptics pointing at lowering quality of papers, let alone of bogus publications and "predatory journals". While the existing concerns are not far-fetched, one obvious fact is often overlooked amidst the controversy: the amount of impact research is growing naturally and is growing fast.

Years of increasing popularization and democratization of science, growing government and private sector investment into research, geographic expanse of “major publishing” countries all had its effect. More people doing science in more countries around the world with more resources leads to more findings. And more findings pave the way for even more research.

Since the early days of Causaly, we are trying to solve a very personal problem – a need for navigating vast amount of literature faster and better. As an academic in my past (if academia can really stay in one's past), I had to stay on top of the information flow in multiple domains: Short and systematic literature reviews, quick keyword searches to validate a hypothesis or to generate a new one were an inherent part of work and demanded a substantial amount of my time.

While the average human reading has its clear stated limits, there is hardly an end to scientific publishing growth. To address this imbalance and the growing gap, we have been building a machine reading platform, the one which processes millions of papers with superhuman speed and distills the knowledge (evidence) out of it into a knowledge graph.

A point of evidence can be defined as a relationship between two entities or events expressed in a statement such as "BRCA1 gene is linked to the breast cancer disease mechanism." There are tens of millions of biomedical publications which connect the entire world of biomedical concepts and events together. More evidence is becoming available daily, yet it’s becoming hardly reachable at scale.

Every month we process more than 100,000 scientific documents, adding several million relationships to our knowledge graph.
While the majority of relationships existed before, there is a vast amount of new knowledge discovered.

Starting October 2018, we are compiling all emerging knowledge into a report with the ability to scope it by domain of interest with the idea to provide periodic digests for people within their areas of expertise and interest.

As of the end of October, on the example of Liver Carcinoma we have more than 10,000 relationships for the "affectors" of Liver Carcinoma in our graph:


Within just one month of new publications, we discovered more than 200 new relationships "affecting" and "affected by" Liver Carcinoma as visualized below:


While the sole fact of a relationship between two entities can be generally known, the type of the connection can be re-discovered or re-defined:
There is a well established association between tripartite motif (TRIM) family of proteins and cancer cell growth (including the context of Liver Carcinoma). But among the new knowledge we discovered a work which linked the specific member of the TRIM family, TRIM52 (tripartite motif containing 52), with proliferation, migration and invasion of Hepatocellular Carcinoma Cells.

We believe our new feature can find a place in every researcher's professional life and help them cope with the ever increasing amounts of evidence published in academic journals.

AI for Clinical Decision Support – What conditions cause female infertility?
use case

AI for Clinical Decision Support – What conditions cause female infertility?

Top causes of female infertility using AI supported clinical decision systems

Can AI enhance traditional clinical literature research methods?

Can AI enhance traditional clinical literature research methods?

The process of finding and evaluating existing clinical research is central to all areas of biomedicine, providing the foundations upon...

Understanding Clinical Outcomes of Spinal Muscular Atrophy
use case

Understanding Clinical Outcomes of Spinal Muscular Atrophy

The objective of this study was to evaluate all possible symptoms of SMA to identify relevant research articles and to define SMA prevalence comprehensively. We asked the question: What are the disorders and syndromes associated with SMA?

Be the first to know

Sign up for Causaly Newsletter