A Gentle Introduction on Data-Enhanced SIRD Model of COVID

Traditionally, SIRD model assumes the entire population has four possible patient states of a disease:

Susceptible: the number of patients being potentially infected;
Infected: the number of patients who tested positive;
Recovered: the number of infected patients who recovered to negative status after being infected;
Deceased: the number of infected patients who died after being infected.

SIRD model defines the stated transition of patients as in this figure.

SIRD Patient Status Transition

Furthermore, SIRD model assumes that the rate of patients transiting to other states are governed by the system of differential equations as in the left column of this table.

Comparison of Traditional SIRD Model and Enhanced SIRD Model

Here, N is the number of the entire population and M is mobility data, which will be covered later. Given available and observable data, I propose an enhanced SIRD model as in the right column of the table above.

Specifically,

For the susceptible population, the traditional SIRD model assumes that the newly susceptible cases per day is proportional to the infection rate (I/S), meaning that the number of patients to become newly susceptible scales linearly with the severity of the disease. This assumption implicitly requires that I have access to the true status of every person, which is overly expensive to exactly know in terms of medical resources. In fact, the publicly available COVID dataset by John Hopkins University (JHU) only contains the number of people taking COVID tests, which can be either a batch of biased or unbiased samples from the entire population. Given that the record of these samples, I gently re-define the people taking COVID tests as the susceptible population. In fact, it is observed that the growth of this newly-defined susceptible population is mostly quadratic for almost all states regardless of disease severity. As a result, I simply fit a quadratic function to the cumulative number of susceptible cases for each state, and naturally the daily increase of susceptible cases is linear with respect to time. Result of this procedure is covered in a later blog.
For the infected population, the same unidentifiable component is assumed in the traditional SIRD model. However, the number of test-positive cases is available in JHU COVID dataset. Similarly, I re-defined the people taking COVID test and being tested as positive as the infected population. Together with the recorded number of people taking the test, the proportion of infection (or infection rate), P_I, can be estimated by dividing the daily test-positive cases with daily test cases. Furthermore, given that there are not yet COVID vaccines deployed yet, it is widely accepted that the major COVID control policy is to quarantine. One measurement of quarantine level is mobility, and it is believed that quarantine would reduce infection rate. Therefore, I propose a regression model that predicts daily infection rate given current infection data and mobility data over all states, hoping to identify some general pattern. Details of the model are covered in a later blog.
For recovered and deceased populations, I use the same fixed rates as in the traditional SIRD model. Concretely, I use the most updated cumulative recovered and deceased cases divided by the cumulative test-positive case as recovered rate and deceased rate.

Note that in the traditional SIRD model, the property that the sum of the four rates is zero holds. However, for the proposed model, this property does not hold, which implies that the number of the entire population is not constant. Despite that U.S. has the largest number of COVID patients globally and outpaces the second place (Brazil) by an observable margin according to the WHO dashboard, the absolute magnitude of deceased cases is considered as non-trivial compared with the number of entire population in the U.S., and I choose to ignore this inconsistency here.

Footer