# Computational Physics Lab

## Modeling COVID-19

You'll be spending a lot of this quarter solving differential equations. Since there is a topic of current interest that can be modeled pretty nicely using differential equations, I thought to add in a bit of extra material this week in which we examine the disease process.

Let's construct a differential equation describing the number of infected and susceptible people. We are looking to find the rate of change of the number of infected people $I$: \begin{align} \frac{dI}{dt} &= \,\,??? \end{align} where $I$ is the number of infected people. Now this rate must be proportional to the number of infected people. We could simply write that \begin{align} \frac{dI}{dt} &= R_II \end{align} where $R_I$ is an unknown rate with dimensions of inverse time which defines how frequently infected people infect other people. This differential equation predicts an exponential growth in the number of infected people. This matches our experience here in the US:

Fact
As as I write this (during spring break) in the United States, COVID-19 deaths are doubling every 3 days. Deaths are a more reliable measure of the rate of infection than the number of people who test positive, because it does not so strongly depend on the availability of tests. Note that this rate probably corresponds to the growth of infection about a month ago (since the average death happens about 30 days after infection), well before several states instituted stay-at-home orders.

However, this exponential growth cannot continue indefinitely, or there will be more infected people than there are total people! So we need a correction such as:

\begin{align} \frac{dI}{dt} &= R_I\frac{N-I}{N}I \end{align} where $N$ is the total population. This correction introduces the notion that you can only infect someone who has not yet had the virus. We do not yet know if that is true, it depends on how rapidly the virus mutates. But we certainly hope it is true.

However, there is one additional factor that we haven't accounted for, which is that people who get sick recover after a couple of weeks, and then they aren't able to infect anyone else. So if we introduce the number of recovered people $R$, we have a pair of differential equations: \begin{align} \frac{dI}{dt} &= R_I\frac{N-I-R}{N}I - R_R I \\ \frac{dR}{dt} &= R_RI \end{align} I added a term in the first differential equation indicating that some fraction of infected people recover each day, and that those recovered people are no longer susceptible to infection (hopefully). The recovery rate $R_R$ is another constant with dimensions of inverse time indicating the rate at which people tend to recover. I also added a second differential equation to track the number of recovered people $R$. (Note that we don't include death in the model. Fortunately, that is an easy fix. We can just assume some fraction of $R$ die.)

We can solve these differential equations using a simple finite difference approach. We sadly cannot use a centered finite difference for this, so we just use \begin{align} \frac{dI}{dt} &= \frac{I(t+\Delta t)-I(t)}{\Delta t} \end{align} Given our finite difference equations, we can solve for the "future" number of infected and recovered people: \begin{align} \frac{I(t+\Delta t)-I(t)}{\Delta t} &= R_I\frac{N-I(t)-R(t)}{N}I - R_R I(t) \\ \frac{R(t+\Delta t)-R(t)}{\Delta t} &= R_RI(t) \\ I(t+\Delta t) &= I(t) + R_I\Delta t\frac{N-I(t)-R(t)}{N}I(t) - R_R\Delta t I(t)\label{eq:I} \\ R(t+\Delta t) &= R(t) + R_R\Delta tI(t)\label{eq:R} \end{align} We now have the two finite difference equations we will need in order to model the progression of this disease.

1. Create 1D arrays to hold $I(t)$ and $R(t)$. Pick values for $\Delta t$, $N$, $R_I$, and $R_R$. We will adjust the rates later, so don't spend much time making an informed guess for those. But please do get the total population correctly.

2. Write a loop to go through the arrays, setting each element based on Equations \ref{eq:I} and \ref{eq:R} and the previous values of $I$ and $R$.

3. Set the initial conditions to 300k (which is a guess at the current number of infected people in the United States) and no recovered people. Run your code and visualize the results. If your results don't match your expectations, first attempt to debug the code yourself. When your code seems to be working, or you can't see how to fix it, please raise your hand for a TA or myself to look at your results.

4. Set $R_I$ to zero, and start out with everyone infected. Adjust your recover rate until you find that half of your patients have recovered in 14 days.

A rate of 0.0?/day works out about right.

5. Now set your initial number of infected people to 300k, which is a guess at the current number of infected people in the United States, and adjust $R_I$ until you find that the number of infected people doubles in 3 days. How long does it take for the number of infected people to peak? What fraction of the population ends up infected?

A rate of 0.3/day works out for this. The peak of infections is in about ??? days or ??? months. Everyone ends up infected.

6. The recovery rate is hard for us to change, but the infection rate is adjustable, which is why we are meeting over zoom. What happens to your results when the infection rate is cut in half? How low do we need to drop the infection rate in order to protect most of the population?

Cutting the infection rate by half moves the peak infections to four months out, and cuts lets almost 20% of the population not get sick. Cutting by a third gives us almost a year before peak number of infected people and only about half the population is infected. Cutting the rate of infection by a factor of five causes basically no more people to get sick. Wouldn't that be nice.

Extra fun
Our model in effect assumes that some people recover instantly from getting sick, and that there is some possibility that a person may stay contagious for several weeks. Neither of these is medically likely. Create a new model in which everyone stops being contagious after 14 days. How much does this change your results?
We have to tweak the infection rate down a bit, and I find that the peak happens around two months, with about about 10% of the population never being infected. Cutting the infection rate in half gives us five or so months to the peak of infection, and we end up with maybe half of the country getting sick. Details matter here. I am not an epidemiologist, and I trust the professionals are doing a better job.
More fun
You could also throw in a period during which an infected person is not contagious, say two or three days, to see how this changes things. But if you're going this far, you could make the infection rate be itself a function of how long a person has been sick (as in reality it must be). We have no way to guess this function, but you can still set up a differential equation for the course of the epidemic as a function of this rate curve. Some approaches can change the shape of this curve, e.g. if you have people with a fever stay home that doesn't affect the infection rate early in the disease process (before a fever is detected).

# What we left out

The above model is manifestly extremely simplistic. The extra fun tasks hinted at a few improvements that could be made to account for the progress of the disease over time. However, even with that changed, our differential equation entirely omits geographical location. A person in New York City is less likely to infect me than to infect someone else in NYC, a distinction our differential equation fails to make.

We could recover this spatial dependence by making our quantity $I$ (and similarly $R$, and even $N$) into a scalar field that is the area density of infected individuals. We would then turn our ordinary differential equation into a partial differential equation, which could account for the locality of the infection process. This would look something like \begin{align} \frac{\partial I}{\partial t} &= R_I\frac{N-I-R}{N}I - R_R I + DN\nabla^2 (I/N) \end{align} where $D$ is a diffusion constant with dimensions of distance squared per time that describes how much people tend to wander around. This partial differential equation would cause cities to be infected separately, and if the population density were uniform would result in the infection gradually spreading across the country.

However, this differential equation is also totally unrealistic. People don't just diffuse around. We travel from city to city using airplanes or cars, rarely stopping in between. So treating geography is a scalar field is not really a good idea. Rather than geometrical distance between people, the "social distance" between people is what matters. How many links (of physical contact) does it take to get from one person to another.

My elderly neighbor is extremely unlikely to get coronavirus from me, because he and I intentionally don't come close to one another. I am far more likely to get the coronavirus from Lori in the office (who we can imagine lives far from me), since she and I are entering the same building when I need to enter Weniger 328 to teach this class.

To treat the pandemic more accurately, we need to model the network of contacts between people, which is far trickier than you might imagine. Social networks tend to be "scale free" networks, meaning that you can't accurately describe a network by the average number of contacts that a person has. This is because some people tend to have far more contacts than others.

The same scale-free properties hold for airports (which I find easier to reason about). Most airports may be small and only have flights to a few other airports. If you assumed that each airport had the average number of connections, you would find that many travelers would need to make four or more connections to reach their destination. But in reality, a few airports have far more than the average number of connections, with the result that it is unusual to have even two stops on a flight, and vanishingly rare to require more than that. Sadly, the same reasoning applies to social networks, which is how the coronavirus got everywhere so quickly.