8 - Calibrating the model

Epidemiological modelling and its use to manage COVID-19

Insights into mechanistic models, by the DYNAMO team

Over the next few weeks, we will present some key elements of epidemiological modelling through short educational articles. These articles will help you to better understand and decipher the assumptions underlying the epidemiological models that are currently widely used, and how these assumptions can impact predictions regarding the spread of pathogens, particularly SARS-CoV-2. The objective is to discover the advantages and limitations of mechanistic modelling, an approach that is at the core of the DYNAMO team's work. The examples of models will be inspired by models used in crisis, but sometimes simplified to make them accessible.

#8 - The link to data: how are models calibrated?

The relevance of the predictions of epidemiological models is often limited by the uncertainty of their parameter values. But how are these parameters estimated? And how is the onset of the epidemic defined?

Different information sources can be used :

  • the scientific literature,
  • experimental and historical data,
  • monitoring data on the current epidemic dynamics (number of individuals tested positive, number of individuals developing severe clinical signs, number of deaths due to the disease). Since the beginning of the COVID-19 epidemic, all of this information has been collected and centralised.

Observable processes (such as duration of the symptomatic state or at the hospital) are readily available. More uncertain, often unobservable parameters (transmission rate, latency duration) need to be estimated. For this purpose, various inference methods exist, each with their advantages and disadvantages. Inference methods that maximise the likelihood of the model are often used (for example, see this link). However, the likelihood of a model cannot always be assessed, especially when the model is dynamic, with a large number of variables, and stochastic, or if the available data are spatiotemporal, incomplete, censored, or imperfect. Likelihood-free methods have been developed to overcome this methodological problem.

Although in our example, maximizing likelihood is possible, let us use the likelihood-free methods to explain their use. These methods are preferred in the DYNAMO team because the epidemiological systems we usually study are complex. Here we used the ABC-SMC (Approximate Bayesian Computation - Sequential Monte Carlo) method. ABC methods are quite intuitive: (1) sets of parameters are generated by drawing parameter values in prior distributions; (2) a simulation of the model is carried out for each set of parameters and compared to the real data via summary statistics (i.e. a simplified representation of the data); (3) the sets of parameters with a distance between simulated and observed summary statistics below a tolerance threshold are selected — the smaller the distance, the more likely the set of parameters; (4) the parameter values are then estimated (posterior distributions). A disadvantage of these methods is the very large number of simulations to be carried out, resulting in substantial calculation costs.

Let's take again the model of article #6 with lockdown from 16 March, and estimate 4 of its parameters: β (the transmission rate), σ (the multiplying factor reducing the excretion of Ip, Ia, Ips), 1/ε (the average duration of latency), and the date of introduction (which should be seen here as the date of perennial establishment of the infection in the population). The data used correspond to the hospital data of the COVID-19 epidemic, more specifically the number of deaths over time, represented in the model by the M-state. The amount of data available may impacts the estimates, especially if there are few data. For the example, we estimated the parameters by mobilizing the available data at 3 dates: one week before lockdown (t = 68 days), the day before lockdown (t = 75 days), and one week after lockdown (t = 82 days). Post-lockdown data were not used. These scenarios lead to parameter estimates that are sufficiently different to predict contrasting epidemic dynamics, clearly illustrating the need to update the models very regularly, especially if they are used for health management purposes.

Values of estimated parameters (mean and 90% confidence interval)

Scenario

β

σ

1/ε

date of intro.

t = 68

1.89 [1.09 ; 2.76]

0.48 [0.35 ; 0.79]

3.3 [1.2 ; 4.7]

34 [33 ; 35]

t = 75

1.48 [0.77 ; 2.33]

0.43 [0.12 ; 0.84]

3.3 [1.0 ; 4.9]

19 [16 ; 21]

t = 82

1.48 [0.85 ; 2.40]

0.47 [0.17 ; 0.85]

3.3 [1.1 ; 4.8]

22 [20 ; 24]

In addition to the model parameters, the initial conditions of the model can also be estimated, highlighting in our example an establishment of infection in the population about one month before the first deaths.

Distribution de la date d'introduction du virus, estimée en fonction de 3 jeux de données

A posterior distributions of the date of establishment of the infection according to the amount of usable data.

Using data available as of March 10 (in blue) gives an estimate of the introduction date as of February 5 (t=34 days). With more data available (in green then yellow), the date of introduction is estimated to be around January 24 (t=22 days).

Nombre de nouveaux décès par jour
Nombre de décès cumulés par jour

Model predictions in number of new deaths (left) and cumulative number of deaths (right), depending on the amount of data that can be used to estimate model parameters. The other parameters and simulation conditions are the same as in article #6.

By using the data available one week after containment (t=82 days, in yellow) to estimate the model input parameters, the simulations give a result closer to the observed data than in other cases (where less data are available).

We hope that this article has convinced you that it is essential that models and observational data work together! Updating the models in real time when new cases occur is a necessary challenge, especially at the beginning of an epidemic, to improve the predictive quality of the models and thus the confidence in their predictions. However, inference methods are not infallible. The results also depend on the assumptions (structure of the model), the type and quality of the data... Moreover, the more parameters to be estimated, the more complex it becomes. Calibrating a model is a process that can take a long time, and must be coupled with analyses to verify that the available data actually enable to estimate the desired parameters.

Article #9 will take a step back from the previous articles to discuss why (and how) to mobilize a mechanistic modelling approach in epidemiology.