10 - Analysis of a model

Epidemiological modelling and its use to manage COVID-19

Insights into mechanistic models, by the DYNAMO team

Over the next few weeks, we will present some key elements of epidemiological modelling through short educational articles. These articles will help you to better understand and decipher the assumptions underlying the epidemiological models that are currently widely used, and how these assumptions can impact predictions regarding the spread of pathogens, particularly SARS-CoV-2. The objective is to discover the advantages and limitations of mechanistic modelling, an approach that is at the core of the DYNAMO team's work. The examples of models will be inspired by models used in crisis, but sometimes simplified to make them accessible.

#10 – Why and how analyse the behaviour of a model ?

Before being used to predict what can happen in the modelled system under certain conditions (control scenarios, changes in practices, climate change, etc.), a simulation model must go through numerical analyses. They essentially aim to : (1) quantify the reliability of model predictions according to the different possible sources of uncertainty (propagation of uncertainties; figure on the left); (2) understand the behaviour of the model, in particular the impact of variations in its inputs on the variation of its outputs (i.e. its predictions) (model sensitivity; figure on the right).

post10_en
post10_en2

Uncertainty analysis is based on the identification of sources of uncertainty (in yellow in the figure) throughout the modelling process (in blue), followed by their quantification and propagation, in order to assess the reliability of model predictions.

The model sensitivity analysis, which binds the variability of the model's outputs to the variability of its inputs, aims to answer 4 questions:

  • Does the model produce relevant predictions? This involves checking whether the predictions are consistent with potentially available observations, and with the expected behaviour of the system.
  • Which input factors contribute most to the variability of the outputs? Identifying these factors helps to update the need for new knowledge and even to identify potential control points in the system.
  • Which factors contribute the least to output variability? In particular, it allows the value of these factors to be arbitrarily set within their range of variation without affecting the results, or even to simplify the model.
  • Which factors interact with each other? The behaviour of the model may indeed differ depending on whether one or more factors are modified. The interacting factors should be studied / observed together.

Different sensitivity analysis methods can be used depending on the complexity of the model (number of parameters, state variables, monotonous or non-monotonous variation of outputs, etc.) and its simulation performance (number of scenarios that can be numerically explored in a reasonable amount of time):

Méthodes d'analyse de sensibilité selon la complexité du modèle et le nombre de répétitions nécessaires à sa bonne utilisation
  • Screening methods are relatively basic and explore variations from outputs to variations from inputs one by one. They therefore do not allow the impact of interactions between inputs to be assessed. On the other hand, they are inexpensive from a numerical point of view.
  • Global analysis methods based on variance explore the impact of variations of several inputs with the following approach:
  1. Select the input factors (or categories of factors) and outputs to be considered.
  2. Define the range of variation of the input factors (minimum, maximum, most likely values).
  3. Define the experimental or sampling design. Sampling must be done in the space of possible values, with discrete values (complete or fractional factorial designs (FFD)) or probability distributions (Fourier Amplitude Sensitivity Test (FAST), Sobol decomposition, etc.).
  4. Simulate for each combination of inputs.
  5. Analyze and interpret the results, looking at the range of variation and distribution of the simulated outputs to interpret only what is interpretable, calculating sensitivity indices (e.g. with methods based on variance decomposition) and ranking the influential factors.

However, a few questions remain unanswered :

  • How to analyze the sensitivity of a stochastic model? What is the place of each stochastic repetition in the sampling design?
  • How to analyze the variation of temporal (e.g. epidemic curve) or spatial (e.g. cluster location) model outputs? At present, analyses often focus on aggregate simulated data (date of epidemic peak, cumulative number of cases over a period, analysis of some representative pixels, etc.), which leads to a great loss of information on the behaviour of the model.

In article #11, we will illustrate such a sensitivity analysis with an example for the COVID19 propagation model.