Nuclear physics is one of the most prominent examples of a ‘dual-use technology’.[5] The research has brought huge scientific and social benefits, but it has also found extremely harmful uses. In the preceding chapters, we’ve met several other examples of technology that can have both a positive and negative use. Social media can connect us to old friends and useful new ideas. Yet it can also enable the spread of misinformation and other harmful content. Analysis of crime outbreaks can identify people who may be at risk, making it possible to interrupt transmission; it can also feed into biased policing algorithms that may over-target minority groups. Large-scale GPS data is revealing how to respond effectively to catastrophes, how to improve transport systems, and how new diseases might spread.[6] But it also risks leaking personal information without our knowledge, endangering our privacy and even our safety.
In March 2018, the Observer newspaper reported that Cambridge Analytica had secretly gathered data from tens of millions of Facebook users, with the aim of building psychological profiles of US and British voters.[7] Although the effectiveness of such profiling has been disputed by statisticians,[8] the scandal eroded public trust in technology firms. According to software engineer – and ex-physicist – Yonatan Zunger, the story was a modern retelling of the ethical debates that had already occurred in fields like nuclear physics or medicine.[9] ‘The field of computer science, unlike other sciences, has not yet faced serious negative consequences for the work its practitioners do,’ he wrote at the time. As new technology appears, we mustn’t forget the lessons that researchers in other fields have already learned the hard way.
When ‘big data’ became a popular buzzword in the early twenty-first century, the potential for multiple uses was a source of optimism. The hope was that data collected for one purpose could help tackle questions in other areas of life. A flagship example of this was Google Flu Trends (GFT).[10] By analysing the search patterns of millions of users, researchers suggested it would be possible to measure flu activity in real-time, rather than waiting a week or two for official US disease tallies to be published.[11] The initial version of GFT was announced in early 2009, with promising results. However, it didn’t take long for criticisms to emerge.
The GFT project had three main limitations. First, the predictions didn’t always work that well. GFT had reproduced the seasonal winter flu peaks in the US between 2003 and 2008, but when the pandemic took off unexpectedly in spring 2009, GFT massively underestimated its size.[12] ‘The initial version of GFT was part flu detector, part winter detector,’ as one group of academics put it.[13]
The second problem was that it wasn’t clear how the predictions were actually made. GFT was essentially an opaque machine; search data went in one end and predictions came out the other. Google didn’t make the raw data or methods available to the wider research community, so it wasn’t possible for others to pick apart the analysis and work out why the algorithm performed well in some situations but badly in others.
Then there’s the final – and perhaps biggest – issue with GFT: it didn’t seem that ambitious. We get flu epidemics each winter because the virus evolves, making current vaccines less effective. Similarly, the main reason governments are so worried about a future pandemic flu virus is that we won’t have an effective vaccine against the new strain. In the event of a pandemic, it would take six months to develop one,[14] by which time the virus will have spread widely. To predict the shape of flu outbreaks, we need a better understanding of how viruses evolve, how people interact, and how populations build immunity.[15] Faced with this hugely challenging situation, GFT merely aimed to report flu activity a week or so earlier than it would have been otherwise. It was an interesting idea in terms of data analysis, but not a revolutionary one when it comes to tackling outbreaks.
This is a common pitfall when researchers or companies talk about applying large datasets to wider aspects of life. The tendency is to assume that, because there is so much data, there must be other important questions it can answer. In effect, it becomes a solution in search of a problem.
In late 2016, epidemiologist Caroline Buckee attended a tech fundraising event, pitching her work to Silicon Valley insiders. Buckee has a lot of experience of using technology to study outbreaks. In recent years, she has worked on several studies using GPS data to investigate malaria transmission. But she is also aware that such technology has its limitations. During the fundraising event, she became frustrated by the prevailing attitude that with enough money and coders, companies could solve the world’s health problems. ‘In a world where technology moguls are becoming major funders of research, we must not fall for the seductive idea that young, tech-savvy college grads can single-handedly fix public health on their computers,’ she wrote afterwards.[16]
Many tech approaches are neither feasible nor sustainable. Buckee has pointed to many failed attempts at tech pilot studies or apps that hoped to ‘disrupt’ traditional methods. Then there’s the need to evaluate how well health measures actually work, rather than just assuming good ideas will emerge naturally like successful start ups. ‘Pandemic preparedness requires a long-term engagement with politically complex, multidimensional problems – not disruption,’ as she put it.
Technology can still play a major role in modern outbreak analysis. Researchers routinely use mathematical models to help design control measures, smartphones to collect patient data, and pathogen sequences to track the spread of infection.[17] However, the biggest challenges are often practical rather than computational. Being