Understanding COVID-19 as a Digital Analyst

Understanding COVID-19 as a Digital Analyst

We are being inundated these days with numbers, charts, and forecasts on a daily basis as everyone from politicians to journalists to public health professionals try to track the status of the COVID-19 pandemic. It’s hard to find an article, social media post, or news broadcast about COVID-19 that doesn’t include some measure of cases, deaths, testing rates, hospital capacity, PPE availability, etc, etc, etc.

All this data can be a lot to take in for the average person, but as a digital analyst, making sense of data is something you do every day! Though you may not have expertise in medicine, public health, or epidemiology, your skills at interpreting and communicating data are valuable for helping yourself and others understand what’s going on in this rapidly-changing environment.

Here are 3 issues that have become apparent in tracking COVID-19 that are likely familiar to you as a digital analyst:

Choosing the Right KPI

An important step for any business or organization is to establish key metrics to assess performance and measure the success of their initiatives. When those key performance indicators (KPIs) are not well-defined or unclear, it becomes anyone’s guess as to how well things are going.

There are a variety of “KPIs” being used to measure the spread of and assess our response to COVID-19. The widely-cited dashboard and dataset from Johns Hopkins University (reportedly receiving 1 billion hits per day) presents total cases and total deaths as its primary metrics. Most other sources have taken a similar approach, focusing on cumulative cases and deaths. Some, like the Government of Canada, are also reporting new daily cases and deaths as KPIs.

Across the spectrum of reporting now available from governments, health agencies, and media organizations, you will find a variety of additional indicators being used to track COVID-19, including active cases, recovered cases, 7-day average cases, hospitalizations and ICU admissions, number of tests per capita, and the list goes on.

So, what’s the right KPI? Well, as always, the right KPI depends on what you’re aiming to measure or what question you need to answer.

There has been much debate for instance about the use of cumulative cases vs. cases per capita (or per some factor of population size) as the right KPI for tracking COVID-19. As explained by John Burn-Murdoch of the Financial Times, the population size of a country does not affect how quickly the virus spreads, so charting absolute counts is a better measure of how effectively a country is preventing the spread.

However, if you’re instead trying to assess the strain being placed on a country’s resources, calculating cases per capita or cases per million could be a more effective measure than an absolute count. For example, comparing the expected severe cases per million to ICU beds per million can help assess a region’s capacity to treat patients with severe symptoms.

Perhaps the most popularized objective in response to COVID-19 has been the effort to “flatten the curve”. Although it’s been explained visually through charts and animations, it’s less clear how we determine whether our efforts to flatten the curve are being successful. What’s our KPI?

Flattening the curve is aimed at slowing the rate of spread of the virus so that our health care system does not get pushed beyond capacity. So, it would make sense to monitor the available capacity of hospitals, in particular critical care services. If we don’t exceed 100% usage of ICU and ventilator capacity for the duration of this pandemic, then we will have successfully “flattened the curve.”

Data is Sampled

Any seasoned Google Analytics user should be familiar with the trouble caused by sampling. If you are conducting analysis based on only a subset of your data, your conclusions are possibly far from reality, especially when the sample is not representative of the entire population. It’s not uncommon to see a conversion rate on sampled data be double or triple the actual rate.

We can see similar sampling problems in the tracking of COVID-19. Particularly in Canada and the United States, only a subset of people are currently being tested for the virus. That subset is also nowhere near representative of the entire population as it is biased towards patients with severe symptoms due to the testing criteria currently being followed.

An analysis conducted by Dr. Ben Fine at Trillium Health Partners observes that the rate of positive tests in Wuhan, China, based on a large testing study, was around 0.5%. In Canada, we are currently seeing positive test rates of over 6%. So either COVID-19 is just everywhere or (more likely) we are under-testing by a factor of at least 10. As we expand testing capacity and increase the sample size, the rate of positive tests will likely be much lower.

The problem of sampling also applies to measuring the death rate from COVID-19. We could calculate that about 5% of reported cases in Canada have been fatal. However, as described above, the number of cases is likely understated, meaning the mortality rate of COVID-19 could actually be much lower.

We don’t have the option of simply extracting “unsampled” data for COVID-19, but by being aware of the potential error, we can avoid extrapolating sampled measures to the entire population.

Integrating Data Sources

You likely have an abundance of data sources available to you as a digital analyst. Every martech, adtech, and analytics tool creates its own stream of data, often with its own metrics and definitions. A survey of common tools will find multiple definitions for “clicks”, “sessions”, “users”, “conversions”, and other commonly used metrics. If these various sources are not properly integrated or understood, a considerable amount of time and energy is often spent on reconciling discrepancies. And if you have ever switched between platforms (say Adobe Analytics to Google Analytics), you would know that it’s almost impossible to make a meaningful translation from one to the other.

We are seeing similar issues arise in the tracking and reporting of COVID-19. In Canada, there is no common national reporting system or methodology implemented across the country. Each province and each regional health unit may be tracking COVID-19 slightly differently. As a result, on any given day, you will likely see different numbers depending on which source you look at.

In Ontario, the province-wide reporting system iPHIS (integrated Public Health Information System) has performed so poorly that Toronto Public Health decided to abandon the system and create its own reporting platform. In the marketing world, this would be like if you were using the Adobe stack across your organization, and one department decides it doesn’t work for them and starts using Google Analytics instead.

Metric definitions are also not being applied consistently across the board. Some reports include “cases” of COVID-19 that are both confirmed and presumptive, while others include only laboratory-confirmed cases. When it comes to testing, some regions are reporting number of patients tested, while others are reporting the number of test samples processed. Since patients are being tested multiple times, this is akin to the difference between sessions and users.

One of the leading efforts to deal with these challenges of integrating data from across the country has been the COVID-19 Canada Open Data Working Group, which has been compiling data from every region and province in Canada to create a “single source of truth.”

In digital analytics, we usually know that our data is inaccurate. There are many reasons why the tracking on a website could fail, be blocked, or otherwise result in missing sessions, users, or conversions. The best approach to deal with this inaccuracy is to focus on trends and ratios rather than absolute counts. The same can be said for monitoring COVID-19. We can assume that the data is inaccurate, but if a particular source is reporting consistently, the trends and ratios still provide a directional measure that can inform decision-making.

We will continue to see more numbers, charts, and dashboards over the coming weeks and months. The focus of the ongoing analysis will also shift as we move beyond the peak of new cases and look for signs to lift the state of emergency and ease social distancing restrictions. Through all of this, applying your skills as an analyst can help you filter out the noise, understand the risks, and make informed judgements about the current situation.

By |2020-04-24T10:05:02-04:00April 23rd, 2020|0 Comments
Categories: Analyst Capability

Leave A Comment