AI-supported epidemiology evidence in orphan drug applications: Angelman syndrome case study

David Reeves
published on December 05, 2019


A disease is considered rare if it affects less than 5 in 10,000 of the general population. There are at least 7,000 known rare diseases and around five new rare diseases are described in the medical literature every week. Diseases with such low prevalence present a number of challenges to healthcare practitioners, public health bodies and biomedical researchers. There is often a lack of fundamental knowledge concerning the epidemiology of the disease, and the patient populations are usually very limited, heterogeneous and spread over large geographical distances, making studies logistically difficult and expensive.

Limited patient numbers represent a considerable disincentive for pharmaceutical companies to invest in new drug research and development, meaning that despite the high burden of rare disease on individual patients, treatment options are often limited. Both the US Food and Drug Administration (FDA) and the European Medicines Agency (EMA) can grant orphan designation to drugs treating rare diseases, as a means to encourage pharmaceutical companies to invest in rare disease research. An orphan drug designation can be very lucrative for a research company: if a drug is approved for orphan status by the regulator, the pharmaceutical company can benefit from tax incentives, research subsidies and enhanced patent protection and marketing rights. Such assistance and incentives can also often boost investments into the company. Studies indicate that orphan drugs often have greater profitability than non-orphan drugs thanks to these financial incentives, smaller clinical trial sizes, shorter clinical trial times and higher rates of regulatory success (1, 2).

Complete and accurate applications for orphan drug designation are therefore crucially important to pharmaceutical companies. A key aspect of orphan designation applications is the requirement to prove rarity using established prevalence figures from within the medical literature. This is a time and labour-intensive task, requiring researchers to conduct systematic literature searches of the biomedical bibliographic databases, an especially difficult process for rare diseases. Prevalence figures may often be hidden within full text articles, meaning that literature searches that rely solely on indexing metadata can easily miss important information.

In this article, we will explore how researchers can use Causaly to significantly speed up the process of locating epidemiology data in the biomedical literature, transforming this part of the orphan drug application into a more efficient undertaking. We will use the example of Angelman Syndrome, a rare genetic disorder for which a number of therapies have very recently been awarded orphan status by both the FDA and EMA.

Using Causaly to speed up the orphan drug application process

This case study will use the EMA orphan drug application procedure as a template, and we will use the epidemiology evidence requirements of the EMA as the basis for this case study. For orphan drug applications, the EMA requires evidence-based epidemiology data specifically relating to the EU member state area (geographic variation) as well as data which is recent at the time of the application (temporal variation). Causaly can provide researchers with this information in seconds.

From the Causaly home-screen, we can navigate to the epidemiology module and then conduct a simple search for Angelman Syndrome using the search box.


This search returns all epidemiology data that Causaly has found for Angelman Syndrome from within the biomedical literature corpus. The initial information provides an overview of the evidence in this area and illustrates how epidemiology data is distributed throughout the literature. We can see in our example that there are 52 articles containing 65 epidemiology evidence points. Scrolling down brings us directly to this data:


The 52 results listed represents every article that contains epidemiology data, and this data is automatically extracted in sentence and value form, for easy readability.

Since we are completing a hypothetical EMA orphan drug application, we are only interested in recent literature that shows prevalence in EU member states. To find this information, we can use the Causaly filter system, filtering the results to ‘Europe’ by using the ‘Geographical Area’ filter, and by choosing ‘Last 10 Years in ‘Publication Year’ filter:


This gives us one relevant article containing the necessary data, which could be used to populate the EMA application. In this instance, we have a prevalence of 1 in 24,500 in Denmark in 2013. If further data is needed, researchers can simply amend the filters to widen the results, perhaps to include articles without defined geographical areas or areas that are analogous to the EU (USA for example).


Causaly was able to find reliable, high quality epidemiology data for rare diseases in a matter of seconds, as compared to the many hours normally required to conduct a database literature search, and to screen the results. This can be used to make the orphan drug application process more cost- and time efficient for pharmaceutical companies, and frees up researchers to engage in more creative research and development activities.


  1. Meeking, K.N., Williams, C.S.M., Arrowsmith, J.E. (2012), "Orphan drug development: an economically viable strategy for biopharma R&D", Drug Discovery Today, 17 (13–14): 660–664
  2. Gaze, L., Breen, J (2012), "The Economic Power of Orphan Drugs" , Thomson Reuters,
Causaly raises $17 million to accelerate biomedical research and discovery of scientific breakthroughs

Causaly raises $17 million to accelerate biomedical research and discovery of scientific breakthroughs

Causaly, the London-based company that allows researchers and specialists to intuitively map and navigate the intricate landscape of biomedical research, has raised $17 million from investors to grow its team and expand into new markets.

The Causaly Machine-Reading Platform: From finding documents to finding evidence - ODSC

The Causaly Machine-Reading Platform: From finding documents to finding evidence - ODSC

Causaly ODSC Seminar Causaly's CTO, Artur Saudabayev, hosted an ODSC seminar on the 30th of March addressing the problem of the rapidly...

Causaly vs PubMed®: 2x as many relevant articles identified by Causaly using the same data
use case

Causaly vs PubMed®: 2x as many relevant articles identified by Causaly using the same data

Causaly AI finds more relevant articles than PubMed alone, using its advanced machine-reading technology.

Sign up for Causaly newsletter