coherently as Google has indexed and organized the Web.”
This phenomenon is called ambient intelligence. It’s based on a simple observation: The items you own, where you put them, and what you do with them is, after all, a great signal about what kind of person you are and what kind of preferences you have. “In the near future,” writes a team of ambient intelligence experts led by David Wright, “every manufactured product—our clothes, money, appliances, the paint on our walls, the carpets on our floors, our cars, everything—will be embedded with intelligence, networks of tiny sensors and actuators, which some have termed ‘smart dust.’”
And there’s a third set of powerful signals that is getting cheaper and cheaper. In 1990, it cost about $10 to sequence a single base pair—one “letter”—of DNA. By 1999, that number had dropped to $.90. In 2004, it crossed the $.01 threshold, and now, as I write in 2010, it costs one ten-thousandth of $.01. By the time this book comes out, it’ll undoubtedly cost exponentially less. By some point mid-decade, we ought to be able to sequence any random whole human genome for less than the cost of a sandwich.
It seems like something out of
In all this data lie patterns yet undreamed of. Properly harnessed, it will fuel a level of filtering acuity that’s hard to imagine—a world in which nearly all of our objective experience is quantified, captured, and used to inform our environments. The biggest challenge, in fact, may be thinking of the right questions to ask of these enormous flows of binary digits. And increasingly, code will learn to ask these questions itself.
The End of Theory
In December 2010, researchers at Harvard, Google,
The initial findings suggest how powerful the tool can be. By looking at the references to previous dates, the team found that “humanity is forgetting its past faster with each passing year.” And, they argued, the tool could provide “a powerful tool for automatically identifying censorship and propaganda” by identifying countries and languages in which there was a statistically abnormal absence of certain ideas or phrases. Leon Trotsky, for example, shows up far less in midcentury Russian books than in English or French books from the same time.
The project is undoubtedly a great service to researchers and the casually curious public. But serving academia probably wasn’t Google’s only motive. Remember Larry Page’s declaration that he wanted to create a machine “that can understand anything,” which some people might call artificial intelligence? In Google’s approach to creating intelligence, the key is data, and the 5 million digitized books contain an awful lot of it. To grow your artificial intelligence, you need to keep it well fed.
To get a sense of how this works, consider Google Translate, which can now do a passable job translating automatically among nearly sixty languages. You might imagine that Translate was built with a really big, really sophisticated set of translating dictionaries, but you’d be wrong. Instead, Google’s engineers took a probabilistic approach: They built software that could identify which words tended to appear in connection with which, and then sought out large chunks of data that were available in multiple languages to train the software on. One of the largest chunks was patent and trademark filings, which are useful because they all say the same thing, they’re in the public domain, and they have to be filed globally in scores of different languages. Set loose on a hundred thousand patent applications in English and French, Translate could determine that when
What Translate is doing with foreign languages Google aims to do with just about everything. Cofounder Sergey Brin has expressed his interest in plumbing genetic data. Google Voice captures millions of minutes of human speech, which engineers are hoping they can use to build the next generation of speech recognition software. Google Research has captured most of the scholarly articles in the world. And of course, Google’s search users pour billions of queries into the machine every day, which provide another rich vein of cultural information. If you had a secret plan to vacuum up an entire civilization’s data and use it to build artificial intelligence, you couldn’t do a whole lot better.
As Google’s protobrain increases in sophistication, it’ll open up remarkable new possibilities. Researchers in Indonesia can benefit from the latest papers in Stanford (and vice versa) without waiting for translation delays. In a matter of a few years, it may be possible to have an automatically translated voice conversation with someone speaking a different language, opening up whole new channels of cross-cultural communication and understanding.
But as these systems become increasingly “intelligent,” they also become harder to control and understand. It’s not quite right to say they take on a life of their own—ultimately, they’re still just code. But they reach a level of complexity at which even their programmers can’t fully explain any given output.
This is already true to a degree with Google’s search algorithm. Even to its engineers, the workings of the algorithm are somewhat mysterious. “If they opened up the mechanics,” says search expert Danny Sullivan, “you still wouldn’t understand it. Google could tell you all two hundred signals it uses and what the code is and you wouldn’t know what to do with them.” The core software engine of Google search is hundreds of thousands of lines of code. According to one Google employee I talked to who had spoken to the search team, “The team tweaks and tunes, they don’t really know what works or why it works, they just look at the result.”
Google promises that it doesn’t tilt the deck in favor of its own products. But the more complex and “intelligent” the system gets, the harder it’ll be to tell. Pinpointing where bias or error exists in a human brain is difficult or impossible—there are just too many neurons and connections to narrow it down to a single malfunctioning chunk of tissue. And as we rely on intelligent systems like Google’s more, their opacity could cause real problems—like the still-mysterious machine-driven “flash crash” that caused the Dow to drop 600 points in a few minutes on May 6, 2010.
In a provocative article in
Supercomputer inventor Danny Hillis once said that the greatest achievement of human technology is tools that allow us to create more than we understand. That’s true, but the same trait is also the source of our greatest disasters. The more the code driving personalization comes to resemble the complexity of human cognition, the harder it’ll be to understand why or how it’s making the decisions it makes. A simple coded rule that bars people from one group or class from certain kinds of access is easy to spot, but when the same action is the result of a swirling mass of correlations in a global supercomputer, it’s a trickier problem. And the result is that it’s harder to hold these systems and their tenders accountable for their actions.