Not all evidence is created equal: Machine-Reading in Biomedicine

Yiannis Kiachopoulos
published on November 19, 2018

Teaching computers how to read and understand biomedical publications for cause and effect relationships is a challenging task. This is especially true for it might not be intuitive what we mean by "read" and "understand".

By Reading we broadly are referring to extracting all the relevant information from a sentence for understanding an affect-relationship. This task is concerned with syntactical and semantic understanding i.e. what is the Subject-Predicate-Object agreement, what is the event, where is the action taking place, is it hypothetical or not etc. Let's look at an example sentence that is coming out an academic publication.
causaly-example-sentence_annotated

Already this relatively common sentence indicates the immense complexity of natural language - things that feel easy for a human reader such as the indicative nature of the statement medical procedures can lead to stress, are difficult to comprehend for machines. At Causaly we are developing algorithms with precisely this task. On the example above we would extract the following three affect-relationships:
causaly-example-sentence-tuple

We can immediately see that a statement of evidence cannot be reduced to just a relationship (A)-->(B): It is not the hospital setting but the lack of control over it that leads to stress. Likewise the statement of (Stress) --> (Anxiety) is referring to hospitalized children in this context and not in general to all population groups. In addition, linguistic statements can be expressed in hypothetical or definite terms. The amount of different forms of expression and their combinations is staggeringly high.

Our knowledge graph contains more than 110 million statements of affect-relationships in Biomedicine that we yield after "Reading" close to 20 million publications. But how can we make sense of the diversity of statements?

This is where the "Understanding" part of our platform comes into place. We have developed a hierarchical form of classifying evidence from very strong to weak, from hypothetical to definitive.

causaly-knowledge-hierarchy

However, even with this classification we are only addressing the linguistic validity of evidence. A human reader on the other hand, would (again intuitively) look for more context. In particular it makes a difference whether the publication was a Randomized Control Trial or a case report, whether the statement is coming out of the Conclusion section of an article or from the introduction, whether it was published in a peer-reviewed journal or not, and more. These and several more parameters are being computed on our platform for each of the 110 million statements.

We intend to write a dedicated blog post for this in the future - stay tuned. However, here is a preview to what is possible when evidence from the whole of Pubmed has been synthesized and evaluated:
causaly-smoking-COPD
The chart above denotes the amount of evidence out of academic publications over time for (smoking)-->(COPD). It is a big picture view on what the scientific community evidenced over the past 40 years.

With Causaly, our goal is to give researchers and decision-makers the tools to drill down to each point of evidence to the desired level of detail as discussed above.

Knowledge emergence - what we learn from 100K monthly publications
Point of View

Knowledge emergence - what we learn from 100K monthly publications

Every month we process more than 100,000 scientific documents. New knowledge is emerging every month across thousands of scientific disciplines

How is Obesity related to Breast Cancer ? Insights from 140,000 articles.
use case

How is Obesity related to Breast Cancer ? Insights from 140,000 articles.

The underlying query machine-reads 143,548 articles within < 2 seconds and returns 53 hormones as potential mediators for the relationship (Obesity)->(Breast Cancer).

How to read 35,000 articles in 1 minute
use case

How to read 35,000 articles in 1 minute

Rapid search for causes of anxiety at super human speed using the Causaly machine-reading platform

Never miss an update

Subscribe to our newsletter