Not all evidence is created equal: Machine-Reading in Biomedicine

Yiannis Kiachopoulos
published on November 19, 2018

Teaching computers how to read and understand biomedical publications for cause and effect relationships is a challenging task. This is especially true for it might not be intuitive what we mean by "read" and "understand".

By Reading we broadly are referring to extracting all the relevant information from a sentence for understanding an affect-relationship. This task is concerned with syntactical and semantic understanding i.e. what is the Subject-Predicate-Object agreement, what is the event, where is the action taking place, is it hypothetical or not etc. Let's look at an example sentence that is coming out an academic publication.
causaly-example-sentence_annotated

Already this relatively common sentence indicates the immense complexity of natural language - things that feel easy for a human reader such as the indicative nature of the statement medical procedures can lead to stress, are difficult to comprehend for machines. At Causaly we are developing algorithms with precisely this task. On the example above we would extract the following three affect-relationships:
causaly-example-sentence-tuple

We can immediately see that a statement of evidence cannot be reduced to just a relationship (A)-->(B): It is not the hospital setting but the lack of control over it that leads to stress. Likewise the statement of (Stress) --> (Anxiety) is referring to hospitalized children in this context and not in general to all population groups. In addition, linguistic statements can be expressed in hypothetical or definite terms. The amount of different forms of expression and their combinations is staggeringly high.

Our knowledge graph contains more than 110 million statements of affect-relationships in Biomedicine that we yield after "Reading" close to 20 million publications. But how can we make sense of the diversity of statements?

This is where the "Understanding" part of our platform comes into place. We have developed a hierarchical form of classifying evidence from very strong to weak, from hypothetical to definitive.

causaly-knowledge-hierarchy

However, even with this classification we are only addressing the linguistic validity of evidence. A human reader on the other hand, would (again intuitively) look for more context. In particular it makes a difference whether the publication was a Randomized Control Trial or a case report, whether the statement is coming out of the Conclusion section of an article or from the introduction, whether it was published in a peer-reviewed journal or not, and more. These and several more parameters are being computed on our platform for each of the 110 million statements.

We intend to write a dedicated blog post for this in the future - stay tuned. However, here is a preview to what is possible when evidence from the whole of Pubmed has been synthesized and evaluated:
causaly-smoking-COPD
The chart above denotes the amount of evidence out of academic publications over time for (smoking)-->(COPD). It is a big picture view on what the scientific community evidenced over the past 40 years.

With Causaly, our goal is to give researchers and decision-makers the tools to drill down to each point of evidence to the desired level of detail as discussed above.

Causaly vs PubMed®: 2x as many relevant articles identified by Causaly using the same data
use case

Causaly vs PubMed®: 2x as many relevant articles identified by Causaly using the same data

Causaly AI finds more relevant articles than PubMed alone, using its advanced machine-reading technology.

Target identification and validation using AI for literature-based insights: Causaly & Pierre Fabre Joint Webinar

Target identification and validation using AI for literature-based insights: Causaly & Pierre Fabre Joint Webinar

Causaly and Pierre Fabre joint webinar Causaly and Pierre Fabre co-hosted a joint webinar on the 28th of October, addressing how...

Full-text vs Abstract advantage: Causaly identifies 3x as many relevant articles by machine-reading the full-text
Application

Full-text vs Abstract advantage: Causaly identifies 3x as many relevant articles by machine-reading the full-text

Causaly enables regulatory experts to reduce time spent scanning research literature, while at the same time increase the yield from full-text articles which typically are not selected due to unsuspecting abstracts.

Sign up for Causaly newsletter