News & Events

ENLIGHT funded research publishes an accessible, open, global dataset of pandemic- and epidemic-prone disease outbreaks in Springer Nature’s Scientific Data

The COVID-19 pandemic has widely demonstrated the hazard that infectious diseases can pose to global public health and development. According to the latest available estimates from the World Health Organization (WHO), as of February 2023 it has been confirmed to have affected over 750 million people worldwide, having caused more than 6.8 million deaths.

In this context, enhancing the scientific understanding of this phenomenon is a sine qua non condition for ameliorating the negative effects of infectious diseases. To that end, the first step is to have statistical information that is relevant, accurate, reliable, accessible, and coherent. Moreover, this information should be freely and openly available with a timeliness that matches the needs for the decision-making processes.

Unfortunately, existing data on the matter exclusively cover a limited number of infectious diseases, are specific to a population, country, or region, or are based on unofficial information, which may contain incorrect information or disinformation from false reports, or are not publicly available, hampering their reuse and utilization.

With this in mind and as part the activities within the ENLIGHT initiative, an international team of researchers from the University of Göttingen -Prof. Inmaculada Martínez-Zarzoso, Juan Armando Torres Munguía, and Luis Rodrigo Díaz Pavez-, the University of Groningen -Prof. Konstantin M. Wacker- and the University of Bordeaux -Prof. Florina Cristina Badarau-, created an original database that is statistically sound for research purposes. Their work was published in Springer Nature’s Scientific Data.

The team collected the information from more than 2700 epidemiological, clinical, and laboratory investigations conducted by the official public health authorities, institutions, and research networks of the WHO and its partners all over the world. By using data- and text-mining techniques, they created a dataset containing information on 2227 disease outbreaks which occurred over the period from January 1996 to March 2022. According to the authors, in comparison with existing data on the matter, this dataset provides five key advantages. First, a wide geographic coverage of 233 countries and territories around the world. Second, an extensive coverage of 70 infection diseases. Third, the utilization of standardized concepts and definitions, for which we used the codes of the International Standard Organization for countries and territories (ISO-3166)23, and the tenth revision of the International Statistical Classification of Diseases and related Health Problems (ICD-10)24. Fourth, for transparency, replicability, and reproducibility purposes, the researchers made the data, metadata, and the code to create these data publicly available. Finally, the data are interoperable, i.e. they can be easily integrated with other datasets.

Juan Armando Torres Munguía emphasizes the importancy of ENLIGHT for him and the research team: "Inmaculada was my PhD supervisor and also established the connection with Bordeaux, another ENLIGHT partner. The ENLIGHT funds allowed me to work on the data and coding aspects of the project during the final stage of my PhD. We had online meetings at the start and towards the end of the project where we discussed conceptual questions and the publication strategy. Framing the paper and thoroughly describing the data was then a joint effort towards the end, where all of us took sequential turns."

Original publication: Torres Munguía, J.A., Badarau, F.C., Díaz Pavez, L.R., Martínez-Zarzoso, I. & Wacker, K.M. A global dataset of pandemic- and epidemic-prone disease outbreaks. Sci Data 9, 683 (2022). https://doi.org/10.1038/s41597-022-01797-2

Pin It