and prevent such infections, they will need all the data they can get.

Thanks to the availability of new information sources like genetic sequences, we are increasingly able to unravel how different diseases and traits spread through populations. Indeed, one of the biggest changes to human healthcare in the twenty-first century will be the ability to rapidly and cheaply sequence and analyse genomes. As well as uncovering outbreaks, researchers will be able to study how human genes influence conditions ranging from Alzheimer’s to cancer.[36] Genetics has social applications too. Because our genomes can reveal characteristics like ancestry, genetic testing kits have become popular gifts for people interested in their family history.

Yet the availability of such data can have unintended effects on privacy. Because we share so many genetic characteristics with our relatives, it’s possible to learn things about people who haven’t been tested. In 2013, for example, The Times reported that Prince William had Indian ancestry, after testing two distant cousins on his mother’s side. Genetics researchers soon criticised the story, because it had revealed personal information about the prince without his consent.[37] In some cases ancestry revelations can have devastating consequences: there have been several reports of families thrown into disarray after discovering hidden adoptions or infidelity in a Christmas ancestry test.[38]

We’ve already seen how data about our online behaviour is gathered and shared so that companies can target adverts. Marketers don’t just measure how many people clicked on an ad; they know what kind of person they are, where they came from, and what they did next. By combining these datasets, they can piece together how one thing influences another. The same approach is common when analysing human genetic data. Rather than look at genetic sequences in isolation, scientists will compare them with information like ethnic background or medical history. The aim is to uncover the patterns that link the different datasets. If researchers know what these look like, they can predict things like ethnicity or disease risk from the underlying genetic code. This is why genetic testing companies like 23andMe have attracted so many investors. They aren’t just collecting customers’ genetic data; they are gathering information about who these people are, which makes it possible to gain much deeper health insights.[39]

It’s not just for-profit companies that are building such datasets. Between 2006 and 2010, half a million people volunteered for the UK Biobank project, which aims to study patterns in genetics and health over the coming decades. As the dataset grows and expands, it will be accessible to teams around the globe, creating a valuable scientific resource. Since 2017, thousands of researchers have signed up to access the data, with projects investigating diseases, injuries, nutrition, fitness, and mental health.[40]

There are huge benefits to sharing health information with researchers. But if datasets are going to be accessible to multiple groups, we need to think about how to protect people’s privacy. One way to reduce this risk is to remove information that could be used to identify participants. For example, when researchers get access to medical datasets, personal information like name and address will often have been removed. Even without such data, though, it may still be possible to identify people. When Latanya Sweeney was a graduate student at MIT in the mid-1990s, she suspected that if you knew a US citizen’s age, gender, and ZIP code, in many cases you could narrow it down to a single person. At the time, several medical databases included these three pieces of information. Combine them with an electoral register and Sweeney reckoned you could probably work out whose medical records you were looking at.[41]

So that’s what she did. ‘To test my hypothesis, I needed to look up someone in the data,’ she later recalled.[42] The state of Massachusetts had recently made ‘anonymised’ hospital records freely available to researchers. Although Governor William Weld had claimed the records still protected patients’ privacy, Sweeney’s analysis suggested otherwise. She paid $20 to access voter records for Cambridge, where Weld lived, then cross-referenced his age, gender, and ZIP code against the hospital dataset. She soon found his medical records, then mailed him a copy. The experiment – and the publicity it generated – would eventually lead to major changes in how health information is stored and shared in the US.[43]

As data spread from one computer to another, so do the resulting insights into people’s lives. It’s just not medical or genetic information we need to be careful with; even seemingly innocuous datasets can hold surprisingly personal details. In March 2014, a self-described ‘data junkie’ named Chris Whong used the Freedom of Information Act to request details of every yellow taxi ride in New York City during the previous year. When the New York City Taxi and Limousine Commission released the dataset, it included the time and location of the pick up and drop off, the fare, and how much each passenger tipped.[44] There were over 173 million trips in total. Rather than give the real licence plates, each taxi was identified by a string of apparently random digits. But it turned out the journeys were anything but anonymous. Three months after the dataset was released, computer scientist Vijay Pandurangan showed how to decipher the taxi codes, converting the scrambled digits back into the original licence plates. Then graduate student Anthony Tockar published a blog post explaining what else could be discovered. He’d found that with a few simple tricks, it was possible to extract a lot of sensitive information from the files.[45]

First, he showed how a person might stalk celebrities. After hours spent trawling through a search of images for ‘celebrities in taxis in Manhattan in 2013’, Tockar found several pictures with a licence plate in view. Cross-referencing these with celebrity blogs and magazines, he worked out what the start point or destination was, and matched this against the supposedly anonymous taxi dataset. He could also see how much celebrities had – or hadn’t – tipped. ‘Now while this information is relatively benign, particularly a year down

Вы читаете The Rules of Contagion
Добавить отзыв
ВСЕ ОТЗЫВЫ О КНИГЕ В ИЗБРАННОЕ

0

Вы можете отметить интересные вам фрагменты текста, которые будут доступны по уникальной ссылке в адресной строке браузера.

Отметить Добавить цитату