Coronavirus Trends

Interpretation

The trend in cases may be out of proportion, due to increased testing. The trend in deaths is a more accurate description of the trend (this is also confirmed by research on seropositivity tests—testing blood samples from a random sample of the population). For excellent research on excess deaths globally, visit https://www.economist.com/graphic-detail/coronavirus-excess-deaths-estimates .

Existing vaccines are still effective against the new mutations at present, however, natural selection will begin to favour mutations which evade some of the vaccines. The development of new vaccines to counter the coronavirus pandemic will continue.

About

The Data Science for Social Impact Research Group at the University of Pretoria maintains a database on Github, called the Coronavirus COVID-19 (2019-nCoV) Data Repository for South Africa. Using these data, I run the R scripts below.

Source code for faceted province graphs (R script).

Source code for individual regions (R script).

Hodrick-Prescott filter

There is a lot of noise (variance) in the reported number of new cases daily (this could be due to testing laboratories not working on weekends, for example). Therefore, one needs to smooth out the noise, in order to get a good estimate for how the rate of infection is actually trending in reality. I use a Hodrick-Prescott (HP) filter in my graphs, which is a statistical function that removes the noise from a time series, leaving the underlying trend. This trend curve is smoother than a rolling average, so it is simpler to look at and understand, when observing the graphs.

Note that, understandably, the estimated HP-filter trend on the end-point can be slightly inaccurate, due to the future reported cases being unknown. I have solved this problem by forecasting the trends, then cutting off the HP filter. The HP filter is however a good estimate of the true trend in the rate of cases—in comparison, the latest daily numbers you see being reported elsewhere on the internet are inaccurate because of the high variance of that statistic. When one reads my graphs, one can simply use the y-axis to read what the latest daily rate of infections is, but overall, the focus of my graphs is more on the shapes of the trends of infection rates over time.

Feedback?

Twitter; Facebook; LinkedIn.

News24 article: Tom Moultrie on the limitations of aggregator dashboards.