Covid-19: HTTP API for German case numbers

Landing page: https://covid19-germany.appspot.com

The Robert Koch-Institut is certainly a cool organization, but I doubt they understand the role of (HTTP) APIs for data exchange. I believe that government institutions still vastly underestimate the power of collaboration on data.

Who would have believed that during a pandemic in 2020 we communicate current numerical data such as case counts via PDF documents or complex websites that can only be scraped with brittle tooling and headless browsers?

I closely monitored the situation for days, asked people, asked organizations. Nothing.

Now I have buit an HTTP API, providing the currently confirmed case numbers of Covid-19 infections in Germany:

https://covid19-germany.appspot.com/now

The primary concerns are:

  • convenience (easy to consume for you in your tooling!)
  • interface stability
  • data credibility
  • availability
$ curl https://covid19-germany.appspot.com/now 2> /dev/null | jq
{
  "current_totals": {
    "cases": 9348,
    "deaths": 25,
    "recovered": 72,
    "tested": "unknown"
  },
  "meta": {
    "contact": "Dr. Jan-Philip Gehrcke, jgehrcke@googlemail.com",
    "source": "zeit.de (aggregates data from individual ministries of health in Germany)",
    "time_source_last_consulted_iso8601": "2020-03-18T00:11:24+00:00",
    "time_source_last_updated_iso8601": "2020-03-17T21:22:00+01:00"
  }
}

This is served by Google App Engine in Europe. The code can be found here: https://github.com/jgehrcke/covid-19-germany-gae

I plan to

  • add time series data
  • add more localized data for individual states (Bundesländer)
  • enhance caching

Feel free to use this. Feedback welcome.

Huge shoutout to zeit.de for doing the work of aggregating the numbers published by individual ministries of health.

For historical data, by all means and purposes as of today I recommend consuming https://github.com/CSSEGISandData/COVID-19. For getting the current state, use the zeit.de data exposed via the HTTP API described above.

For now, I am sure that the current case count as provided by zeit.de is the best in terms of credibility and freshness. The actual underlying data sources are all official: these are the individual ministries of health.

The individual ministries publish their numbers usually once or twice during different times of the day. The journalists from zeit.de try to incorporate these data points as quickly as possible right after publication, also during the afternoon and evening. In contrast to that, the Robert Koch-Institut (RKI) may incorporate a specific update from a specific health ministry only after 1-2 days.

The RKI also doesn’t do what I call an atomic sum, but instead seems to sum numbers published by different health ministries at vastly different times: the RKI tries to find one number per day, and that number is not found during the evening (after “all data has come in” from the individual states), but seemingly at some unfortunate mid-day point in time where some individual ministries of health have just delivered a fresh update for the day, and others didn’t yet. Non-atomic.

This explains why, for example, the RKI’s official number for March 17 was ~7000 confirmed cases, whereas zeit.de already reported ~9300 at the same time (biggest contributor here is specifically that the last update from Nordrhein-Westfalen from March 17 didn’t make it into RKI’s sum for March 17).

 

Update: an official statement of the RKI about the delays in data processing, in German:

In Deutschland übermitteln die rund 400 Gesundheitsämter mindestens einmal täglich (in der aktuellen Lage noch häufiger) pseudonymisierte Daten zu bestätigten COVID-19-Fällen auf Grundlage des Infektionsschutzgesetzes elektronisch an die Bundesländer. Die wiederum übermitteln die Daten zu den COVID-19-Fällen elektronisch an das RKI. Für die Berichterstattung wird seit 18.03.2020 täglich der Datenstand 00:00 Uhr verwendet.

Zwischen dem Bekanntwerden eines Falls vor Ort, der Meldung an das Gesundheitsamt, der Eingabe der Daten in die Software, der Übermittlung an die zuständige Landesbehörde und von dort an das RKI liegt eine gewisse Zeitspanne. Die kann gemäß den Vorgaben im Infektionsschutzgesetz zwei bis drei Arbeitstage lang sein. In der aktuellen Lage erfolgt die Übermittlung deutlich schneller als im Routinebetrieb, weil Daten schneller verarbeitet werden. Dass einige Fälle mit etwas Verzögerung im Gesundheitsamt elektronisch erfasst werden, liegt auch daran, dass die Gesundheitsämter zunächst Ermittlungen zu den einzelnen Fällen und deren Kontaktpersonen durchführen und prioritär Infektionsschutzmaßnahmen ergreifen müssen, was die Ressourcen der Gesundheitsämter bereits stark in Anspruch nimmt. Ebenso werden die Daten am RKI validiert, um verlässliche Daten zu veröffentlichen. Auch innerhalb dieses Prozesses kann es zu geringen Verzögerungen kommen.

(source: https://www.rki.de/DE/Content/InfAZ/N/Neuartiges_Coronavirus/Fallzahlen.html)