BETA
This is a BETA experience. You may opt-out by clicking here

More From Forbes

Edit Story

How You Should Be Looking At Covid-19 Data

Following
This article is more than 3 years old.

Covid-19 is a contagious disease. This means that it grows (or declines) multiplicatively. A disease with a reproduction number, designated R, around two will grow in four generations from one case (the first generation or index cases) to two cases (the second general) to four cases (the third generation) to eight cases (the fourth generation). A disease with a reproduction number around three will grow much faster from one case (the first generation) to three cases (the second general) to nine cases (the third generation) to twenty-seven cases (the fourth generation). The total size of the epidemic is the sum of the number of cases in each generation, i.e. 1+3+9+27=40 in the example. This multiplicative property is often referred to as exponential growth and has implications for how you should be looking at Covid-19 data.

In most places, Covid-19 has a reproduction number somewhere between two and three, although in some places it has been estimated to be as high as four or five. The generation time of Covid-19 has been estimated to be around 5 days.

By now, most people are used to seeing data on the Covid-19 epidemic presented like this. This plot, from the University of Georgia Covid-19 Portal, shows the number of cases reported each day in the US from February 1 to November 29.

There’s a problem with looking at data like this that results from that multiplicative property. Namely, because the epidemic is prone to “blow up” due to exponential growth, it’s very hard to see how the growth rate has changed over time. In the example above, the epidemic grew from one to three cases (a difference of just two) between the first and second generations and from nine to twenty-seven between the third and fourth generations (a difference of eighteen), although in both cases the increase occurred at the same rate. 

This is easily fixed with a small change of perspective. Instead of looking at changes in the number of cases, we should look at changes in the ratios of cases. A logarithmic scale allows us to see ratios more clearly. On a graph with a logarithmic y-axis, increases by the same factor (say, times 3) always have the same relative height, so that an increase from 1 to 3 looks the same size as an increase from 3 to 9 or from 100 to 300. If a section of the graph has the same slope (or steepness) as another section, then they have the same rate of increase (3 in this example).  On a logarithmic scale, exponential growth looks like a straight line, and the slope of that straight line tells you about the rate of increase.

For a more mathematical explanation, we can recall a logarithm is the inverse of exponentiation. Thus, for instance, we have the product 3×3×3x3=81 which can also be written as 34. The number 3 is referred to as the base and the number 4 the exponent. Since logarithms are the inverse function to exponentiation, we may think of them as “undoing” this relationship. That is, the base 3 logarithm of 81 is four, written log3(81)=4. The logarithm of a number is the "exponent to raise the base to" in order to get that number.  The reproduction number is just the exponent.

So, to find the size of the fifth generation, we only need to take our initial size (1) and multiple by the reproduction number (3) once for each new generation (4), i.e. 1×3×3×3×3=81, or 1×34=81.

This works even if we have reproduction numbers that aren’t integers. For instance, suppose we have R=2.5. Then the fifth generation is 1×2.5×2.5×2.5×2.5=39.0625, or 1×2.54=39.0625.

But, of course, we’re really more interested in the reproduction number itself, (the number that we have been using as the base), which we have been pretending to know, but in fact we don't. But, that’s the beauty of logarithms. If we’re only interested in the relative growth rate, it doesn’t actually matter what base we use. As the following plot shows, we can use base 3, base 10, base 2.5 or anything else. The first panel shows the raw data from our example — i.e., the series of numbers {1, 3, 9, 27} — while the next two show the logarithms of those numbers, first with base three and then with base 10. The point is that the two logarithmic plots look identical. The only thing that is different is what numbers are found on the associated y-axis, or the scale. No matter what scale we use, the slope of the line indicates the rate of transmission.

Now, most people do not think naturally in terms of logarithms. If I told you that the base 10 logarithm of the number of cases reported in the US on Saturday was 5.18, using a calculator you could probably figure out that meant there were about 150,000 new cases reported. (In fact, the New York Times reported 151,245 new cases.) But, you probably don’t think in logarithms. At best, you may realize that since 5.18 is between 5 and 6, then the number of cases was in the hundreds of thousands, and not millions, since 105=100,000 and 106=1,000,000.

Since it’s not easy to think in logarithms, why not present the data on a logarithmic scale, where the growth rates between subsequent generations are fairly presented, but label the y-axis with natural numbers using a semilog plot?

Recalling that the slope of this line is what indicates the speed of spread (steep slopes means fast spread), this plot enables us to much more readily discern the key features of the Covid-19 epidemic in the US.

  • During March, the rate at which new Covid-19 cases were reported in the US was far greater than at any other time since. In interpreting this figure, one should keep in mind that there is generally a lag of a week or two between when transmission occurs and when that new infection is reported. Thus, this increase throughout March actually corresponds to transmission in late February and the first half of March, i.e. before shelter-in-place orders were widely in effect.
  • Interventions taken throughout the spring actually reversed the course of the epidemic and cases started going down, although this achievement was short lived and cases began going up again in June, although at a much lower rate than during the early spring.
  • This increased transmission resulted in a second “summer wave” that peaked in July. Cases started falling again at roughly the same rate that they fell during the time the country was mostly sheltering in place.
  • But, as predicted, cooler weather and the return to school for both children and university students brought new increases in transmission in September. The differences between this third wave and the first two are (1) it has been more sustained, and (2) it started with a much larger initial number of cases. Consequently, the US is currently reporting nearly 200,000 new cases and 1,500 or so deaths per day.

This semilog plot — and many others for US states, counties, territories, and other countries can be viewed and downloaded from the University of Georgia Covid-19 Portal by selecting the data of interest and setting the “Y-scale” option to “Logarithmic”.

In conclusion, to better understand the shape of the Covid-19 pandemic, you should probably be looking at semilog plots, i.e. plots where the data are shown as a logarithm of the number of reported cases or deaths. Such a plot much more readily allows one to see how the growth rate of the epidemic has gone up or down over time by comparing slopes. The truly large number of cases reported in the last few weeks are indeed a result of the still sizable growth rates. But, it is not just that the growth rate of the epidemic is particularly high right now — in fact, in many places it is smaller than it has been at numerous other points over the spring, summer and fall — but also that the starting point is a very large number of active cases.

Full coverage and live updates on the Coronavirus

Follow me on Twitter or LinkedInCheck out my website