It was around that time that I started programming. The library had a book about Perl.2 So I taught myself Perl, and soon I was making websites for local businesses. That was my first tech job, I guess.Did you go on to study computer science in college?
Yeah, I went to college in 1999. At the time, the dot-com boom was going strong. There was a lot of optimism in my undergrad class. The computer science major was bigger than it had been in previous years.
I definitely felt behind my peers. I had always been at the top of my class in math and science, but I didn’t have a whole lot of programming experience. The homework was hard.
When you’re first learning programming and something goes wrong, you don’t really know how to tell where it’s going wrong or why. It’s an intuition you have to develop over time. You learn where to look or what to push on to figure out why this particular piece of code isn’t working the way you expected it to work. And, even for experienced programmers, you never know how long that process is going to take. Sometimes you figure it out in a few minutes, sometimes it’s a few hours. Sometimes you never figure it out, and you have to start over from scratch.What happened after you graduated college?
As I mentioned, I started college in 1999, during the dot-com boom. By the time I graduated in 2003, the bubble had popped. Given what I was hearing about the job market, I decided to go to grad school.Did you want to become an academic?
I wasn’t sure. The actual experience of being a Ph.D. student was definitely hard for me. I felt again like, Oh my God, these other people are so much smarter than me.
When it came time to identify a research topic and write a thesis proposal, I really struggled. I think that was the hardest part of the whole process. I didn’t have a lot of academics among my family or friends. I didn’t know where to start. By the time I got through the thesis proposal, I was drained. The whole thing had left me feeling pretty burned-out—maybe about as burned-out as I’ve ever been.
My adviser had a very large stable of grad students. One summer, she didn’t have funding for all of us, so I ended up working with a different professor on a research project that eventually became a startup: something called reCAPTCHA.Tell us about that.
Anybody who’s been around the internet for long enough has seen a CAPTCHA. These days, it’s the little thing that pops up with a checkbox that says “I am not a robot.” And sometimes it asks you to prove it by clicking images that have a taxi or a traffic light or whatever.
The professor I was working for invented the original CAPTCHA for Yahoo. Back in the day, Yahoo had a bunch of people signing up for free email accounts and then using them to send spam. The CAPTCHA was supposed to put a check on that.
The idea was that you’d display this distorted text and tell the user to type it. A computer could generate these tests very easily and know what the right answer was. But at the time it was hard for computers to read the distorted text. So the CAPTCHA prevented people from writing programs to automatically create a hundred thousand Yahoo email accounts for sending spam.
CAPTCHAs started to get used everywhere on the web. At some point we did the math and figured out, “Wow, people are filling in millions, maybe billions of these a day. They are collectively wasting a huge chunk of time typing in these obnoxious characters. Why don’t we try to do some good for the world and use CAPTCHAs to digitize books?”How?
It’s the same idea as the original CAPTCHA. But instead of displaying random words, you’re displaying words from old books or newspapers or magazines that optical character recognition software has trouble reading. So we get humans to read the words and tell us what they are.
We would display two words. One was a word that we actually knew. The other word was taken from a scanned book, and maybe we had some guesses. We would use the word we knew to confirm that the person was actually a human. Then, assuming that they passed the first word, we would count their answer for the other word as a vote for the correct spelling of that word.
If they happened to agree with the optical character recognition software, then, great—it was probably right. If they disagreed, then maybe you send the word out to a couple more people and try to get some agreement on what the word is.Where would you get the scanned books or newspapers or magazines?
Well, our idea was that there must be places that have old works that they want to digitize. We could partner with them. And that became the business model. It started out as an academic research project but, by the end of that summer—this was 2007—we had decided to make it into an actual company.
We ended up getting a contract with The New York Times. We started digitizing old years of the paper that were in the public domain. So we started with 1922 or 1923 and then kept going backward. We went in reverse chronological order because obviously the older scans were harder to read.
That was a really fun project to work on. We were a small team, maybe six people, and we never had an office. It felt like being at another university research lab.
Then, in 2009, we found out that Google was considering acquiring us.
Into the MothershipWhy did Google want reCAPTCHA?
They wanted to use it to digitize Google Books.
At that point, Google had been scanning books in the public