would indicate. She’s a “local maximum”: Though there are people whose posts you’re far more interested in, it’s her posts that you see.
In part, this feedback effect is due to what early Facebook employee and venture capitalist Matt Cohler calls the local-maximum problem. Cohler was an early employee at Facebook, and he’s widely considered one of Silicon Valley’s smartest thinkers on the social Web.
The local-maximum problem, he explains to me, shows up any time you’re trying to optimize something. Say you’re trying to write a simple set of instructions to help a blind person who’s lost in the Sierra Nevadas find his way to the highest peak. “Feel around you to see if you’re surrounded by downward-sloping land,” you say. “If you’re not, move in a direction that’s higher, and repeat.”
Programmers face problems like this all the time. What link is the best result for the search term “fish”? Which picture can Facebook show you to increase the likelihood that you’ll start a photo-surfing binge? The directions sound pretty obvious—you just tweak and tune in one direction or another until you’re in the sweet spot. But there’s a problem with these hill-climbing instructions: They’re as likely to end you up in the foothills—the local maximum—as they are to guide you to the apex of Mount Whitney.
This isn’t exactly harmful, but in the filter bubble, the same phenomenon can happen with any person or topic. I find it hard not to click on articles about gadgets, though I don’t actually think they’re that important. Personalized filters play to the most compulsive parts of you, creating “compulsive media” to get you to click things more. The technology mostly can’t distinguish compulsion from general interest—and if you’re generating page views that can be sold to advertisers, it might not care.
The faster the system learns from you, the more likely it is that you can get trapped in a kind of identity cascade, in which a small initial action—clicking on a link about gardening or anarchy or Ozzy Osbourne—indicates that you’re a person who likes those kinds of things. This in turn supplies you with more information on the topic, which you’re more inclined to click on because the topic has now been primed for you.
Especially once the second click has occurred, your brain is in on the act as well. Our brains act to reduce cognitive dissonance in a strange but compelling kind of unlogic—“Why would I have done
And that’s when these loops are relatively benign. Sometimes they’re not.
We know what happens when teachers think students are dumb: They get dumber. In an experiment done before the advent of ethics boards, teachers were given test results that supposedly indicated the IQ and aptitude of students entering their classes. They weren’t told, however, that the results had been randomly redistributed among students. After a year, the students who the teachers had been told were bright made big gains in IQ. The students who the teachers had been told were below average had no such improvement.
So what happens when the Internet thinks you’re dumb? Personalization based on perceived IQ isn’t such a far-fetched scenario—Google Docs even offers a helpful tool for automatically checking the grade-level of written text. If your education level isn’t already available through a tool like Acxiom, it’s easy enough for anyone with access to a few e-mails or Facebook posts to infer. Users whose writing indicates college-level literacy might see more articles from the
In a broadcast world, everyone is expected to read or process information at about the same level. In the filter bubble, there’s no need for that expectation. On one hand, this could be great—vast groups of people who have given up on reading because the newspaper goes over their heads may finally connect with written content. But without pressure to improve, it’s also possible to get stuck in a grade-three world for a long time.
Incidents and Adventures
In some cases, letting algorithms make decisions about what we see and what opportunities we’re offered gives us fairer results. A computer can be made blind to race and gender in ways that humans usually can’t. But that’s only if the relevant algorithms are designed with care and acuteness. Otherwise, they’re likely to simply reflect the social mores of the culture they’re processing—a regression to the social norm.
In some cases, algorithmic sorting based on personal data can be even
The online movie rental Web site Netflix is powered by an algorithm called CineMatch. To start, it was pretty simple. If I had rented the first movie in the
This technique is called kNN (k-nearest-neighbor), and using it CineMatch got pretty good at figuring out what movies people wanted to watch based on what movies they’d rented and how many stars (out of five) they’d given the movies they’d seen. By 2006, CineMatch could predict within one star how much a given user would like any movie from Netflix’s vast hundred-thousand-film emporium. Already CineMatch was better at making recommendations than most humans. A human video clerk would never think to suggest
But Reed Hastings, Netflix’s CEO, wasn’t satisfied. “Right now, we’re driving the Model-T version of what’s possible,” he told a reporter in 2006. On October 2, 2006, an announcement went up on the Netflix Web site: “We’re interested, to the tune of $1 million.” Netflix had posted an enormous swath of data—reviews, rental records, and other information from its user database, scrubbed of anything that would obviously identify a specific user. And now the company was willing to give $1 million to the person or team who beat CineMatch by more than 10 percent. Like the longitude prize, the Netflix Challenge was open to everyone. “All you need is a PC and some great insight,” Hastings declared in the
After nine months, about eighteen thousand teams from more than 150 countries were competing, using ideas from machine learning, neural networks, collaborative filtering, and data mining. Usually, contestants in high-stakes contests operate in secret. But Netflix encouraged the competing groups to communicate with one another and built a message board where they could coordinate around common obstacles. Read through the message board, and you get a visceral sense of the challenges that bedeviled the contestants during the three- year quest for a better algorithm. Overfitting comes up again and again.
There are two challenges in building pattern-finding algorithms. One is finding the patterns that are there in all the noise. The other problem is the opposite:
Where movies are concerned, the dangers of overfitting are relatively small—many analog movie watchers have been led to believe that because they liked
The term