Not all evidence is created equal: Machine-Reading in Biomedicine

Yiannis Kiachopoulos
published on November 19, 2018

Teaching computers how to read and understand biomedical publications for cause and effect relationships is a challenging task. This is especially true for it might not be intuitive what we mean by "read" and "understand".

By Reading we broadly are referring to extracting all the relevant information from a sentence for understanding an affect-relationship. This task is concerned with syntactical and semantic understanding i.e. what is the Subject-Predicate-Object agreement, what is the event, where is the action taking place, is it hypothetical or not etc. Let's look at an example sentence that is coming out an academic publication.

Already this relatively common sentence indicates the immense complexity of natural language - things that feel easy for a human reader such as the indicative nature of the statement medical procedures can lead to stress, are difficult to comprehend for machines. At Causaly we are developing algorithms with precisely this task. On the example above we would extract the following three affect-relationships:

We can immediately see that a statement of evidence cannot be reduced to just a relationship (A)-->(B): It is not the hospital setting but the lack of control over it that leads to stress. Likewise the statement of (Stress) --> (Anxiety) is referring to hospitalized children in this context and not in general to all population groups. In addition, linguistic statements can be expressed in hypothetical or definite terms. The amount of different forms of expression and their combinations is staggeringly high.

Our knowledge graph contains more than 110 million statements of affect-relationships in Biomedicine that we yield after "Reading" close to 20 million publications. But how can we make sense of the diversity of statements?

This is where the "Understanding" part of our platform comes into place. We have developed a hierarchical form of classifying evidence from very strong to weak, from hypothetical to definitive.


However, even with this classification we are only addressing the linguistic validity of evidence. A human reader on the other hand, would (again intuitively) look for more context. In particular it makes a difference whether the publication was a Randomized Control Trial or a case report, whether the statement is coming out of the Conclusion section of an article or from the introduction, whether it was published in a peer-reviewed journal or not, and more. These and several more parameters are being computed on our platform for each of the 110 million statements.

We intend to write a dedicated blog post for this in the future - stay tuned. However, here is a preview to what is possible when evidence from the whole of Pubmed has been synthesized and evaluated:
The chart above denotes the amount of evidence out of academic publications over time for (smoking)-->(COPD). It is a big picture view on what the scientific community evidenced over the past 40 years.

With Causaly, our goal is to give researchers and decision-makers the tools to drill down to each point of evidence to the desired level of detail as discussed above.

Why poor target validation is costing pharmaceutical businesses millions

It has been well documented that only 1 out of 10 compounds that enter clinical trials makes it to market¹. That is an astonishing 90%...

  • Point of View

Watch Causaly’s panel discussion from Discovery Europe

On June 9th – 10th 2022, over 400 pharma leaders came to Berlin to take part in Discovery Europe...

  • General

Webinar - Knowledge discovery reimagined: finding new hypotheses with Causaly Cloud

It is well recognized among scientists that target selection can improve clinical trial success rates (1). Being able to stay up to date...

  • use case

Sign up for Causaly newsletter