coherently as Google has indexed and organized the Web.”

This phenomenon is called ambient intelligence. It’s based on a simple observation: The items you own, where you put them, and what you do with them is, after all, a great signal about what kind of person you are and what kind of preferences you have. “In the near future,” writes a team of ambient intelligence experts led by David Wright, “every manufactured product—our clothes, money, appliances, the paint on our walls, the carpets on our floors, our cars, everything—will be embedded with intelligence, networks of tiny sensors and actuators, which some have termed ‘smart dust.’”

And there’s a third set of powerful signals that is getting cheaper and cheaper. In 1990, it cost about $10 to sequence a single base pair—one “letter”—of DNA. By 1999, that number had dropped to $.90. In 2004, it crossed the $.01 threshold, and now, as I write in 2010, it costs one ten-thousandth of $.01. By the time this book comes out, it’ll undoubtedly cost exponentially less. By some point mid-decade, we ought to be able to sequence any random whole human genome for less than the cost of a sandwich.

It seems like something out of Gattaca, but the allure of adding this data to our profiles will be strong. While it’s increasingly clear that our DNA doesn’t determine everything about us—other cellular information sets, hormones, and our environment play a large role—there are undoubtedly numerous correlations between genetic material and behavior to be made. It’s not just that we’ll be able to predict and avert upcoming health issues with far greater accuracy—though that alone will be enough to get many of us in the door. By adding together DNA and behavioral data—like the location information from iPhones or the text of Facebook status updates—an enterprising scientist could run statistical regression analysis on an entire society.

In all this data lie patterns yet undreamed of. Properly harnessed, it will fuel a level of filtering acuity that’s hard to imagine—a world in which nearly all of our objective experience is quantified, captured, and used to inform our environments. The biggest challenge, in fact, may be thinking of the right questions to ask of these enormous flows of binary digits. And increasingly, code will learn to ask these questions itself.

The End of Theory

In December 2010, researchers at Harvard, Google, Encyclop?dia Britannica, and the American Heritage Dictionary announced the results of a four-year joint effort. The team had built a database spanning the entire contents of over five hundred years’ worth of books—5.2 million books in total, in English, French, Chinese, German, and other languages. Now any visitor to Google’s “N-Gram viewer” page can query it and watch how phrases rise and fall in popularity over time, from neologism to the long fade into obscurity. For the researchers, the tool suggested even grander possibilities—a “quantitative approach to the humanities,” in which cultural changes can be scientifically mapped and measured.

The initial findings suggest how powerful the tool can be. By looking at the references to previous dates, the team found that “humanity is forgetting its past faster with each passing year.” And, they argued, the tool could provide “a powerful tool for automatically identifying censorship and propaganda” by identifying countries and languages in which there was a statistically abnormal absence of certain ideas or phrases. Leon Trotsky, for example, shows up far less in midcentury Russian books than in English or French books from the same time.

The project is undoubtedly a great service to researchers and the casually curious public. But serving academia probably wasn’t Google’s only motive. Remember Larry Page’s declaration that he wanted to create a machine “that can understand anything,” which some people might call artificial intelligence? In Google’s approach to creating intelligence, the key is data, and the 5 million digitized books contain an awful lot of it. To grow your artificial intelligence, you need to keep it well fed.

To get a sense of how this works, consider Google Translate, which can now do a passable job translating automatically among nearly sixty languages. You might imagine that Translate was built with a really big, really sophisticated set of translating dictionaries, but you’d be wrong. Instead, Google’s engineers took a probabilistic approach: They built software that could identify which words tended to appear in connection with which, and then sought out large chunks of data that were available in multiple languages to train the software on. One of the largest chunks was patent and trademark filings, which are useful because they all say the same thing, they’re in the public domain, and they have to be filed globally in scores of different languages. Set loose on a hundred thousand patent applications in English and French, Translate could determine that when word showed up in the English document, mot was likely to show up in the corresponding French paper. And as users correct Translate’s work over time, it gets better and better.

What Translate is doing with foreign languages Google aims to do with just about everything. Cofounder Sergey Brin has expressed his interest in plumbing genetic data. Google Voice captures millions of minutes of human speech, which engineers are hoping they can use to build the next generation of speech recognition software. Google Research has captured most of the scholarly articles in the world. And of course, Google’s search users pour billions of queries into the machine every day, which provide another rich vein of cultural information. If you had a secret plan to vacuum up an entire civilization’s data and use it to build artificial intelligence, you couldn’t do a whole lot better.

As Google’s protobrain increases in sophistication, it’ll open up remarkable new possibilities. Researchers in Indonesia can benefit from the latest papers in Stanford (and vice versa) without waiting for translation delays. In a matter of a few years, it may be possible to have an automatically translated voice conversation with someone speaking a different language, opening up whole new channels of cross-cultural communication and understanding.

But as these systems become increasingly “intelligent,” they also become harder to control and understand. It’s not quite right to say they take on a life of their own—ultimately, they’re still just code. But they reach a level of complexity at which even their programmers can’t fully explain any given output.

This is already true to a degree with Google’s search algorithm. Even to its engineers, the workings of the algorithm are somewhat mysterious. “If they opened up the mechanics,” says search expert Danny Sullivan, “you still wouldn’t understand it. Google could tell you all two hundred signals it uses and what the code is and you wouldn’t know what to do with them.” The core software engine of Google search is hundreds of thousands of lines of code. According to one Google employee I talked to who had spoken to the search team, “The team tweaks and tunes, they don’t really know what works or why it works, they just look at the result.”

Google promises that it doesn’t tilt the deck in favor of its own products. But the more complex and “intelligent” the system gets, the harder it’ll be to tell. Pinpointing where bias or error exists in a human brain is difficult or impossible—there are just too many neurons and connections to narrow it down to a single malfunctioning chunk of tissue. And as we rely on intelligent systems like Google’s more, their opacity could cause real problems—like the still-mysterious machine-driven “flash crash” that caused the Dow to drop 600 points in a few minutes on May 6, 2010.

In a provocative article in Wired, editor-in-chief Chris Anderson argued that huge databases render scientific theory itself obsolete. Why spend time formulating human-language hypotheses, after all, when you can quickly analyze trillions of bits of data and find the clusters and correlations? He quotes Peter Norvig, Google’s research director: “All models are wrong, and increasingly you can succeed without them.” There’s plenty to be said for this approach, but it’s worth remembering the downside: Machines may be able to see results without models, but humans can’t understand without them. There’s value in making the processes that run our lives comprehensible to the humans who, at least in theory, are their beneficiaries.

Supercomputer inventor Danny Hillis once said that the greatest achievement of human technology is tools that allow us to create more than we understand. That’s true, but the same trait is also the source of our greatest disasters. The more the code driving personalization comes to resemble the complexity of human cognition, the harder it’ll be to understand why or how it’s making the decisions it makes. A simple coded rule that bars people from one group or class from certain kinds of access is easy to spot, but when the same action is the result of a swirling mass of correlations in a global supercomputer, it’s a trickier problem. And the result is that it’s harder to hold these systems and their tenders accountable for their actions.

Вы читаете The Filter Bubble
Добавить отзыв
ВСЕ ОТЗЫВЫ О КНИГЕ В ИЗБРАННОЕ

0

Вы можете отметить интересные вам фрагменты текста, которые будут доступны по уникальной ссылке в адресной строке браузера.

Отметить Добавить цитату