would indicate. She’s a “local maximum”: Though there are people whose posts you’re far more interested in, it’s her posts that you see.

In part, this feedback effect is due to what early Facebook employee and venture capitalist Matt Cohler calls the local-maximum problem. Cohler was an early employee at Facebook, and he’s widely considered one of Silicon Valley’s smartest thinkers on the social Web.

The local-maximum problem, he explains to me, shows up any time you’re trying to optimize something. Say you’re trying to write a simple set of instructions to help a blind person who’s lost in the Sierra Nevadas find his way to the highest peak. “Feel around you to see if you’re surrounded by downward-sloping land,” you say. “If you’re not, move in a direction that’s higher, and repeat.”

Programmers face problems like this all the time. What link is the best result for the search term “fish”? Which picture can Facebook show you to increase the likelihood that you’ll start a photo-surfing binge? The directions sound pretty obvious—you just tweak and tune in one direction or another until you’re in the sweet spot. But there’s a problem with these hill-climbing instructions: They’re as likely to end you up in the foothills—the local maximum—as they are to guide you to the apex of Mount Whitney.

This isn’t exactly harmful, but in the filter bubble, the same phenomenon can happen with any person or topic. I find it hard not to click on articles about gadgets, though I don’t actually think they’re that important. Personalized filters play to the most compulsive parts of you, creating “compulsive media” to get you to click things more. The technology mostly can’t distinguish compulsion from general interest—and if you’re generating page views that can be sold to advertisers, it might not care.

The faster the system learns from you, the more likely it is that you can get trapped in a kind of identity cascade, in which a small initial action—clicking on a link about gardening or anarchy or Ozzy Osbourne—indicates that you’re a person who likes those kinds of things. This in turn supplies you with more information on the topic, which you’re more inclined to click on because the topic has now been primed for you.

Especially once the second click has occurred, your brain is in on the act as well. Our brains act to reduce cognitive dissonance in a strange but compelling kind of unlogic—“Why would I have done x if I weren’t a person who does x—therefore I must be a person who does x.”Each click you take in this loop is another action to self-justify—“Boy, I guess I just really love ‘Crazy Train.’” When you use a recursive process that feeds on itself, Cohler tells me, “You’re going to end up down a deep and narrow path.” The reverb drowns out the tune. If identity loops aren’t counteracted through randomness and serendipity, you could end up stuck in the foothills of your identity, far away from the high peaks in the distance.

And that’s when these loops are relatively benign. Sometimes they’re not.

We know what happens when teachers think students are dumb: They get dumber. In an experiment done before the advent of ethics boards, teachers were given test results that supposedly indicated the IQ and aptitude of students entering their classes. They weren’t told, however, that the results had been randomly redistributed among students. After a year, the students who the teachers had been told were bright made big gains in IQ. The students who the teachers had been told were below average had no such improvement.

So what happens when the Internet thinks you’re dumb? Personalization based on perceived IQ isn’t such a far-fetched scenario—Google Docs even offers a helpful tool for automatically checking the grade-level of written text. If your education level isn’t already available through a tool like Acxiom, it’s easy enough for anyone with access to a few e-mails or Facebook posts to infer. Users whose writing indicates college-level literacy might see more articles from the New Yorker; users with only basic writing skills might see more from the New York Post.

In a broadcast world, everyone is expected to read or process information at about the same level. In the filter bubble, there’s no need for that expectation. On one hand, this could be great—vast groups of people who have given up on reading because the newspaper goes over their heads may finally connect with written content. But without pressure to improve, it’s also possible to get stuck in a grade-three world for a long time.

Incidents and Adventures

In some cases, letting algorithms make decisions about what we see and what opportunities we’re offered gives us fairer results. A computer can be made blind to race and gender in ways that humans usually can’t. But that’s only if the relevant algorithms are designed with care and acuteness. Otherwise, they’re likely to simply reflect the social mores of the culture they’re processing—a regression to the social norm.

In some cases, algorithmic sorting based on personal data can be even more discriminatory than people would be. For example, software that helps companies sift through resumes for talent might “learn” by looking at which of its recommended employees are actually hired. If nine white candidates in a row are chosen, it might determine that the company isn’t interested in hiring black people and exclude them from future searches. “In many ways,” writes NYU sociologist Dalton Conley, “such network-based categorizations are more insidious than the hackneyed groupings based on race, class, gender, religion, or any other demographic characteristic.” Among programmers, this kind of error has a name. It’s called overfitting.

The online movie rental Web site Netflix is powered by an algorithm called CineMatch. To start, it was pretty simple. If I had rented the first movie in the Lord of the Rings trilogy, let’s say, Netflix could look up what other movies Lord of the Rings watchers had rented. If many of them had rented Star Wars, it’d be highly likely that I would want to rent it, too.

This technique is called kNN (k-nearest-neighbor), and using it CineMatch got pretty good at figuring out what movies people wanted to watch based on what movies they’d rented and how many stars (out of five) they’d given the movies they’d seen. By 2006, CineMatch could predict within one star how much a given user would like any movie from Netflix’s vast hundred-thousand-film emporium. Already CineMatch was better at making recommendations than most humans. A human video clerk would never think to suggest Silence of the Lambs to a fan of The Wizard of Oz, but CineMatch knew people who liked one usually liked the other.

But Reed Hastings, Netflix’s CEO, wasn’t satisfied. “Right now, we’re driving the Model-T version of what’s possible,” he told a reporter in 2006. On October 2, 2006, an announcement went up on the Netflix Web site: “We’re interested, to the tune of $1 million.” Netflix had posted an enormous swath of data—reviews, rental records, and other information from its user database, scrubbed of anything that would obviously identify a specific user. And now the company was willing to give $1 million to the person or team who beat CineMatch by more than 10 percent. Like the longitude prize, the Netflix Challenge was open to everyone. “All you need is a PC and some great insight,” Hastings declared in the New York Times.

After nine months, about eighteen thousand teams from more than 150 countries were competing, using ideas from machine learning, neural networks, collaborative filtering, and data mining. Usually, contestants in high-stakes contests operate in secret. But Netflix encouraged the competing groups to communicate with one another and built a message board where they could coordinate around common obstacles. Read through the message board, and you get a visceral sense of the challenges that bedeviled the contestants during the three- year quest for a better algorithm. Overfitting comes up again and again.

There are two challenges in building pattern-finding algorithms. One is finding the patterns that are there in all the noise. The other problem is the opposite: not finding patterns in the data that aren’t actually really there. The pattern that describes “1, 2, 3” could be “add one to the previous number” or “list positive prime numbers from smallest to biggest.” You don’t know for sure until you get more data. And if you leap to conclusions, you’re overfitting.

Where movies are concerned, the dangers of overfitting are relatively small—many analog movie watchers have been led to believe that because they liked The Godfather and The Godfather: Part II, they’ll like The Godfather: Part III. But the overfitting problem gets to one of the central, irreducible problems of the filter bubble: Overfitting and stereotyping are synonyms.

The term stereotyping (which in this sense comes from Walter Lippmann,

Вы читаете The Filter Bubble
Добавить отзыв
ВСЕ ОТЗЫВЫ О КНИГЕ В ИЗБРАННОЕ

0

Вы можете отметить интересные вам фрагменты текста, которые будут доступны по уникальной ссылке в адресной строке браузера.

Отметить Добавить цитату