Knowledge emergence - what we learn from 100K monthly publications

Artur Saudabayev
published on October 30, 2018

Publication and citation counts today are among the very few metrics measuring scholars' output and its value. Growing competition in academia pushes people to publish more subjecting them to the "Publish or Perish" phenomenon. The trend is highly controversial, with an increasing number of sceptics pointing at lowering quality of papers, let alone of bogus publications and "predatory journals". While the existing concerns are not far-fetched, one obvious fact is often overlooked amidst the controversy: the amount of impact research is growing naturally and is growing fast.

Years of increasing popularization and democratization of science, growing government and private sector investment into research, geographic expanse of “major publishing” countries all had its effect. More people doing science in more countries around the world with more resources leads to more findings. And more findings pave the way for even more research.

Since the early days of Causaly, we are trying to solve a very personal problem – a need for navigating vast amount of literature faster and better. As an academic in my past (if academia can really stay in one's past), I had to stay on top of the information flow in multiple domains: Short and systematic literature reviews, quick keyword searches to validate a hypothesis or to generate a new one were an inherent part of work and demanded a substantial amount of my time.

While the average human reading has its clear stated limits, there is hardly an end to scientific publishing growth. To address this imbalance and the growing gap, we have been building a machine reading platform, the one which processes millions of papers with superhuman speed and distills the knowledge (evidence) out of it into a knowledge graph.

A point of evidence can be defined as a relationship between two entities or events expressed in a statement such as "BRCA1 gene is linked to the breast cancer disease mechanism." There are tens of millions of biomedical publications which connect the entire world of biomedical concepts and events together. More evidence is becoming available daily, yet it’s becoming hardly reachable at scale.

Every month we process more than 100,000 scientific documents, adding several million relationships to our knowledge graph.
While the majority of relationships existed before, there is a vast amount of new knowledge discovered.

Starting October 2018, we are compiling all emerging knowledge into a report with the ability to scope it by domain of interest with the idea to provide periodic digests for people within their areas of expertise and interest.

As of the end of October, on the example of Liver Carcinoma we have more than 10,000 relationships for the "affectors" of Liver Carcinoma in our graph:

Screen-Shot-2018-10-30-at-11.04.05-AM

Within just one month of new publications, we discovered more than 200 new relationships "affecting" and "affected by" Liver Carcinoma as visualized below:

new_screenshot

While the sole fact of a relationship between two entities can be generally known, the type of the connection can be re-discovered or re-defined:
There is a well established association between tripartite motif (TRIM) family of proteins and cancer cell growth (including the context of Liver Carcinoma). But among the new knowledge we discovered a work which linked the specific member of the TRIM family, TRIM52 (tripartite motif containing 52), with proliferation, migration and invasion of Hepatocellular Carcinoma Cells.

We believe our new feature can find a place in every researcher's professional life and help them cope with the ever increasing amounts of evidence published in academic journals.

Not all evidence is created equal: Machine-Reading in Biomedicine
technology

Not all evidence is created equal: Machine-Reading in Biomedicine

Teaching computers how to read and understand biomedical publications for cause and effect relationships is a challenging task. This is especially true for it might not be intuitive what we mean by "read" and "understand".

How is Obesity related to Breast Cancer ? Insights from 140,000 articles.
use case

How is Obesity related to Breast Cancer ? Insights from 140,000 articles.

The underlying query machine-reads 143,548 articles within < 2 seconds and returns 53 hormones as potential mediators for the relationship (Obesity)->(Breast Cancer).

How to read 35,000 articles in 1 minute
use case

How to read 35,000 articles in 1 minute

Rapid search for causes of anxiety at super human speed using the Causaly machine-reading platform

Never miss an update

Subscribe to our newsletter