How does a pandemic start?

By Pier Paolo Ippolito

Pandemics, usually develops thanks to a disease ability to spread at an exponential rate. In the case of COVID-19, the number of cases from one day to the next was in fact equal to the number of nowadays cases times some constant between 1.25-1.5 (depending on factors such as population density and restrictions in place). The change in the number of cases from a day to another, can then be defined by the following equation: $$ \Delta N_{d} = E \times p \times N_{d} $$ Where E represents the average number of people we are exposed to every day, p represents the probability that an exposure might lead to an infection and Nd is the number of cases as of today. Therefore, in this type of situation, the only possible way to try to slow down our exponential trend is by decreasing E and p.

In case the images are not completely visible, please approprietly zoom in or out.

In order to make this possible, different techniques such as track and trace, social distancing and travel restrictions can be applied. Although, even if no intervention at all is done, an exponential trend is destined to convert to a logistic curve once a large number of the population gets infected by the disease (in fact, the probability that an exposure can lead to an infect automatically decreases if the majority of the population and the people we meet are already infected). Applying any type of restriction, would then help us in making possible to reach our inflection point between these two trends as soon as possible.

Clinical Trials: A/B Testing

Clinical trials are one of the main example of A/B testing application. Gathering results from an A/B test, we can then be able to infer causal relationships about what can be the potential effects of a treatment. In this setting, the Causal question we are asking ourselves is: Does a certain treatment decrease COVID-19 mortality rate? This question, can then be formulated in statistical terms as a null and alternative hypothesis.

In the null hypothesis, applying the treatment would not lead to any major change in the mortality rate and therefore both the treatment and control groups will be quite similar. Instead, in the alternative hypothesis, the treatment would cause a statistically significant change between the two groups. In our case, patients affected by COVID-19 would be considered as our population and our intervention (providing experimental medical treatment), would then be compared to no intervention. Variation in mortality rate could then be used our our metrics to asses the results. Patients should then be randomly assigned to either groups so that to avoid introduction of any form of bias (e.g. patients age, comorbidities, geographical location).

If bias is unconsciously introduced, then this could lead to some form of confounding bias, which would then make really difficult to disconnect what are effects due to the intervention and which ones are instead caused a flow in the randomization process. Following on with our example, the number of times an intervention leads to a substantial difference compared to the control group, can then be summarised over a number of trials as a Binomial Distribution. In this distribution, the X axis will represent the count of possible outcomes, while the Y axis will represent the probability associated with an outcome. Although, according to the Central Limit Theorem, as we would increase our sample size, we would then end up with a Gaussian Distribution for each group in the experiment.

In the adjacent image, is shown a possible outcome for our example. As can be see from the diagram, four different areas are present: True Positive (TP), True Negative (TN), False Positive (FP), False Negative (FN). In the TP case, we can affirm our treatment is beneficial since it managed to pass our test for the hypothesis (e.g. decreasing the mortality rate). In the case of the TN area, we can instead affirm that our treatment is not beneficial. While in the FP area, we might be deceived to believe our intervention was beneficial while it wasn’t (the opposite holds true instead in the FN area). In these last two cases, it is then vital to look for any possible form of bias interference.
Depeding on how much data we currently have available, this could then potentially help us in better defining our distributions and reducing the uncertainty in the measuraments (e.g. the bigger it is our sample size and the more likely it is that it can better represent the whole population).

In this way, we could reduce our likelihood to commit a mistake and our distributions will look narrower (FP and FN areas become smaller).


This interactive online presentation was realised using D3.js and the Scroller design by Cuthbert Chow and Jim Vallandingham.

Additional insights are available on the Reveal.js presentation and the Epidemic Modelling dashboard.