These logistical challenges mean that research can struggle to keep up with new outbreaks. During 2015 and 2016, Zika spread widely, spurring researchers to plan large-scale clinical studies and vaccine trials.[20] But as soon as many of these studies were ready to start, the cases stopped. This is a common frustration in outbreak research; by the time the infections end, fundamental questions about contagion can remain unanswered. That’s why building long-term research capacity is essential. Although our research team has managed to generate a lot of data on the Zika outbreak in Fiji, we were only able to do this because we already happened to be there investigating dengue. Similarly, some of the best data on Zika have come from a long-running Nicaraguan dengue study led by Eva Harris at the University of California, Berkeley.[21]
Researchers have also lagged behind outbreaks in other fields. Many studies of misinformation during the 2016 US election weren’t published until 2018 or 2019. Other research projects looking at election interference have struggled to get off the ground at all, while some are now impossible because social media companies – whether inadvertently or deliberately – have deleted the necessary data.[22] At the same time, fragmented and unreliable data sources are hindering research into banking crises, gun violence and opioid use.[23]
Getting data is only part of the problem, though. Even the best outbreak data will have quirks and caveats, which can hinder analysis. In her work tracking radiation and cancer, Alice Stewart noted that epidemiologists rarely have the luxury of a perfect dataset. ‘You’re not looking for a spot of trouble against a spotless backdrop,’ she said,[24] ‘you’re looking for a spot of trouble in a very messy situation.’ The same issue crops up in many fields, whether trying to estimate the spread of obesity in friendship data, uncover patterns of drug use in the opioid epidemic, or trace the effects of information across different social media platforms. Our lives are messy and complicated, and so are the datasets they produce.
If we want a better grasp of contagion, we need to account for its dynamic nature. That means tailoring our studies to different outbreaks, moving quickly to ensure our results are as useful as possible, and finding new ways to thread strands of information together. For example, disease researchers are now combining data on cases, human behaviour, population immunity, and pathogen evolution to investigate elusive outbreaks. Taken individually, each dataset has its own flaws, but together they can reveal a more complete picture of contagion. Describing such approaches, Caroline Buckee has quoted Virginia Woolf, who once said that ‘truth is only to be had by laying together many varieties of error.’[25]
As well as improving the methods we use, we should also focus on the questions that really matter. Take social contagion. Considering the amount of data now available, our understanding of how ideas spread is still remarkably limited. One reason is that the outcomes we care about aren’t necessarily the ones that technology companies prioritise. Ultimately, they want users to interact with their products in a way that brings in advertising revenue. This is reflected in the way we talk about online contagion. We tend to focus on the metrics designed by social media companies (‘How do I get more likes? How do I get this post to go viral?’) rather than outcomes that will actually make us healthier, happier, or more successful.
With modern computational tools, there is potential to get unprecedented insights into social behaviour, if we target the right questions. The irony, of course, is that the questions we care about are also the ones that are likely to lead to controversy. Recall that study looking at the spread of emotions on Facebook, in which researchers altered people’s News Feeds to show happier or sadder posts. Despite criticism of how this research was designed and carried out, the team was asking an important question: how does the content we see on social media affect our emotional state?
Emotions and personality are, by their very definition, emotive and personal topics. In 2013, psychologist Michal Kosinski and his colleagues published a study suggesting that it was possible to predict personality traits – such as extroversion and intelligence – from the Facebook pages that people liked.[26] Cambridge Analytica would later use a similar idea to profile voters, triggering widespread criticism.[27] When Kosinski and his team first published their method, they were aware that it could have uncomfortable alternative uses. In their original paper, they even anticipated a possible backlash against technology firms. The researchers speculated that as people became more aware of what could be extracted from their data, some might turn away from digital technology entirely.
If users are uncomfortable with exactly how their data is being used, researchers and companies have two options. One is to simply avoid telling them. Faced with concerns about privacy, many tech companies have downplayed the extent of data collection and analysis, fearing negative press coverage and uproar from users. Meanwhile, data brokers (who most of us have never heard of) have been making money selling data (which we weren’t aware they had) to external researchers (who we didn’t know were analysing it). In these cases, the assumption seems to have been that