Coronavirus Trends in South Africa


The trend in cases may be out of proportion, due to increased testing. The trend in deaths is a more accurate description of the trend (this is also confirmed by research on seropositivity tests—testing blood samples from a random sample of the population).

However, the second wave is more widespread than the first. The novel 501.V2 mutation is driving the second wave in South Africa. The existing vaccines are still effective against the new mutations at present, however, natural selection will begin to favour mutations which evade some of the vaccines. The development of new vaccines to counter the coronavirus pandemic will continue.


The Data Science for Social Impact Research Group at the University of Pretoria maintains a database on Github, called the Coronavirus COVID-19 (2019-nCoV) Data Repository for South Africa. Using these data, my computer runs the R scripts below, once a day.

Source code for faceted province graphs (R script).
Branch: 1st year only.

Source code for individual regions (R script).

Hodrick-Prescott filter

There is a lot of noise (variance) in the reported number of new cases daily (this could be due to testing laboratories not working on weekends, for example). Therefore, one needs to smooth out the noise, in order to get a good estimate for how the rate of infection is actually trending in reality. I use a Hodrick-Prescott (HP) filter in my graphs, which is a statistical function that removes the noise from a time series, leaving the underlying trend. This trend curve is smoother than a rolling average, so it is simpler to look at and understand, when observing the graphs.

Note that, understandably, the estimated HP-filter trend on the end-point is slightly inaccurate, due to the future reported cases being unknown. Therefore, the trend curve on the end-point may change a little from day to day, as the reported cases in future days become known. The HP filter is however a good estimate of the true trend in the rate of cases—in comparison, the latest daily numbers you see being reported elsewhere on the internet are inaccurate because of the high variance of that statistic. When one reads my graphs, one can simply use the y-axis to read what the latest daily rate of infections is, but overall, the focus of my graphs is more on the shapes of the trends of infection rates over time.


Twitter; Facebook; LinkedIn.

News24 article: Tom Moultrie on the limitations of aggregator dashboards.