SARS-ANI: a global open access dataset of reported SARS-CoV-2 events in animals


The data for this dataset was collected and integrated from two major animal health databases: i) the Emerging Disease Surveillance Program (ProMED-mail) (, a program of the International Society for Infectious Diseases (ISID, and ii) the World Animal Health Information System (WAHIS) of the World Organization for Animal Health (WOAH, formerly OIE) (

Step 1: Integration of ProMED mail reports

ProMED mail ( is the largest publicly available system for reporting global outbreaks of infectious diseases (outbreak denotes the occurrence of one or more cases in an epidemiological unit). It provides reports (so-called “posts”) about outbreaks and occurrences of diseases. The flow of information leading to the publication of ProMED-mail reports is as follows: A disease event to be sent is selected from daily outbreak notifications received via email, searching the Internet and traditional media, and searching official and unofficial websites. All incoming information is reviewed and filtered by an editor or deputy editor, who then passes it on to a multidisciplinary team of moderators of subject matter experts who assess the accountability and accuracy of the information, interpret, comment and cross-reference to previous ProMED media. E-mail reports and the scientific literature35. A ProMED Mail report, identified by a unique report identifier, can represent a single or multiple health events.

The integration of the interest messages from ProMED-mail took place in two steps:

i) Selection of ProMED mail reports

Through the “Search Posts” feature on the ProMED-mail website, we have identified reports describing SARS-CoV-2 events in animals, ie presenting at least one single case of SARS-CoV-2 in an animal. We used the keywords “animal” and “COVID-19” (which are used in the “subject” of ProMED mail posts to report information related to SARS-CoV-2 in animals) to naturalize the reports and experimental infections or vaccination tests in animals, and general discussions of SARS-CoV-2 in animals (Note: Although COVID-19 refers to the disease caused by SARS-CoV-2 in humans and should not be used in animals, ProMED- mail conveniently uses this keyword for both humans and animals). Reports describing naturally occurring infection (meaning the presence of the virus is demonstrated by laboratory method(s)) or exposure (meaning the presence of antibodies to SARS-CoV-2 is demonstrated by laboratory method(s)) of an individual or group of Individuals were manually filtered and included for data extraction. At the time of submission (June 22, 2022), the ProMED-mail database contained 232 reports of SARS-CoV-2 in animals.

ii) Link to previous reports

If a health event is ongoing, ProMED-mail publishes follow-up reports that link to previous ProMED-mail reports (at the end of the report or in the “See also” section at the end of the article). We used this information to identify the potential relationship of each reported event to a previous one (e.g. clinical follow-up, further spread of the virus and treatment outcome) and entered this data into the final dataset.

Step 2: Integration of WAHIS reports

WAHIS ( is a web-based computer system that processes animal disease data in real time. The WAHIS data reflects the information collected by the veterinary services of WOAH members (formerly OIE) and non-member countries and territories on WOAH-listed domestic, wild animal, emerging and zoonotic diseases. In accordance with the WOAH Terrestrial Animal Health Code36does the detection of an infection with SARS-CoV-2 in animals meet the criteria for reporting to the WOAH as an emerging infection ( Only authorized users, ie WOAH member country delegates and their authorized representatives, can enter data into the WAHIS platform to inform the WOAH of relevant animal disease information.

A WAHIS report, identified by a unique report identifier, may contain a single or multiple outbreaks, each identified by a unique outbreak identifier. All information is publicly available on the WAHIS interface.

The WAHIS messages of interest were integrated in two steps:

i) Selection of WAHIS reports

We used the WAHIS dashboard for animal disease events ( to extract cases of SARS-CoV-2 infection in animals reported by WOAH member and non-member countries. WAHIS publishes instant notifications (INs) and follow-up reports (FURs), recognizable by the prefix “IN” and “FUR” in their respective names. Instant reports provide information about newly reported events, while FURs generally provide updates about previously reported ongoing events (e.g. number of newly infected animals and new deaths, new control measures introduced).

We applied filters to the DISEASE field (“SARS-CoV-2 in animals (inf. mit)”) and REPORT DATE to report SARS-CoV-2 events from January 1st to April 1stfirst December 2019 to present. The reports can be viewed online or downloaded as a single PDF or Excel file, with each file corresponding to a country report (ie multiple outbreaks can be included in one report). At the time of filing (June 22, 2022), the WAHIS dashboard contained 311 reports related to SARS-CoV-2.

ii) Identification of gaps and completion of the data set

ProMED-mail searches a wide variety of information sources, including WAHIS reports. The ProMED Mail posts mention the event ID of the WAHIS reports used as the information source, allowing the original source to be consulted on the WAHIS dashboard. Therefore, we decided to first identify SARS-CoV-2 events in animals in the ProMED mail database. In a second step, we used the WAHIS dashboard to identify gaps, i.e sibling events) and find other events not reported in ProMED-mail (Fig. 1).

Fig. 1

Schematic overview of the methodology: report integration and validation steps.

For each country (using the “COUNTRY/TERRITORY” filter on the WAHIS dashboard), we identified sibling events by comparing the WAHIS reports to all of the country’s previously entered ProMED Mail reports, using information on species, subnational Administration and date of laboratory confirmation (a buffer of ±7 days was considered due to possible discrepancies in terms of confirmation by different laboratories) or date of publication if the date of laboratory confirmation was missing (in this case a buffer of 30 days was considered due to the date of the publication is highly database dependent). We have not used city information here as reports may inconsistently refer to the city/village of outbreak occurrence due to privacy concerns.

Although this strategy was time consuming, it was consistently applied throughout the data extraction process to ensure comprehensive collection of information for each outbreak, data accuracy, and method reproducibility.

data extraction

ProMED-mail provides detailed, text-based (narrative) reports on health events. This data is unstructured, while WAHIS uses both semi-structured (.pdf file divided into sections including free text) and structured data (.xlsx format) to display the reports. Each selected report has undergone a manual review by a veterinarian, ensuring a full understanding of the content and context. Information was extracted manually and coded by hand.

The following event information was extracted (if available) and entered into a structured template in a dedicated .csv file:

  • – animal host: common name (ie most specific English language designation of the source(s)) and scientific name as mentioned in the source(s) (scientific names are harmonized so that only the first letter of the genus is capitalized) ;

  • – Geographical location: country, sub-national government, city;

  • – SARS-CoV-2 variant;

  • – Dates: when the case was confirmed in the laboratory, reported by WAHIS and published;

  • – Metrics: number of cases, number of deaths, number of susceptible animals.

In addition, the following animal patient/case information was extracted to populate the dataset:

  • – Age;

  • – sex;

  • – Living conditions;

  • – Main reason for the test;

  • – suspected source of infection;

  • – Symptoms: The main reported clinical signs allegedly associated with SARS-CoV-2 have been summarized with one to more keywords mentioned in the text. Several symptoms have been separated by the “and” operator.

The extracted data described above was entered into the dataset as mentioned in the report and no information was subjected to any interpretation prior to entry. To make the data easier to understand, integrate with other sources, and analyze, we’ve also added the following five patient attributes:

  • – The common and scientific name (resolved to the species or subspecies level, depending on available information) of the animal host, harmonized with the National Center for Biotechnology Information (NCBI) taxonomic backbone.37;

  • – The host’s colloquial name, ie the name used in technical jargon to identify the animal (e.g. “tiger” for “Sumatran tiger”);

  • – The scientific name of the host resolved to the species level;

  • – The higher taxonomy (ie family) of the animal host, taken from the report, expert knowledge or the literature.

Finally, for each SARS-CoV-2 event captured in the dataset, we have provided the primary and secondary source of information i.e. source name (ProMED-Mail or WAHIS) and link to the online report, as well as the original source of information given by the primary source. A copy of each report used during the data extraction process was downloaded and saved as a PDF file. We put a timestamp in the saved file (ProMED Mail reports) or the download date was specified in the filename (it was not possible to put a timestamp on WAHIS reports).

The data documenting each event corresponds to the information available in the ProMED mail and/or the WAHIS report when the report is consulted (see timestamp or download date). Any subsequent editions or modifications of the report by ProMED-mail and/or WAHIS were not taken into account.


Use of the data from the WAHIS platform requires mention of the following statement: “The World Organization for Animal Health (WOAH) bears no responsibility for the integrity or accuracy of the data contained herein, including but not limited to, deletion, manipulation or reformatting of data that may have occurred beyond its control”.


Comments are closed.