There’s a function that’s being optimized—which is, at some level, what a neural net is doing.9 But it’s not really AI.
I think one of the big tensions in data science that is going to unfold in the next ten years involves companies like SoFi, or Earnest, or pretty much any company whose shtick is, “We’re using big data technology and machine learning to do better credit score assessments.”10
I actually think this is going to be a huge point of contention moving forward. I talked to a guy who used to work for one of these companies. Not one of the ones I mentioned, a different one. And one of their shticks was, “Oh, we’re going to use social media data to figure out if you’re a great credit risk or not.” And people are like, “Oh, are they going to look at my Facebook posts to see whether I’ve been drinking out late on a Saturday night? Is that going to affect my credit score?”
And I can tell you exactly what happened, and why they actually killed that. It’s because with your social media profile, they know your name, they know the names of your friends, and they can tell if you’re black or not. They can tell how wealthy you are, they can tell if you’re a credit risk. That’s the shtick.
And my consistent point of view is that any of these companies should be presumed to be incredibly racist unless presenting you with mountains of evidence otherwise. Anybody that says, “We’re an AI company that’s making smarter loans”: racist. Absolutely, 100 percent.
I was actually floored, during a recent Super Bowl, when I saw this SoFi ad that said, “We discriminate.” I was just sitting there watching this game, like, I cannot believe it—it’s either they don’t know, which is terrifying, or they know and they don’t give a shit, which is also terrifying.
I don’t know how that court case is going to work out, but I can tell you in the next ten years, there’s going to be a court case about it. And I would not be surprised if SoFi lost for discrimination. And in general, I think it’s going to be an increasingly important question about the way that we handle protected classes generally, and maybe race specifically, in data science models of this type.11 Because otherwise it’s like, okay, you can’t directly model if a person is black. Can you use their zip code? Can you use the racial demographics for the zip code? Can you use things that correlate with the racial demographics of their zip code? And at what level do you draw the line?
And we know what we’re doing for mortgage lending—and the answer there is, frankly, a little bit offensive—which is that we don’t give a shit where your house is. We just lend. That’s what Rocket Mortgages does.12 It’s a fucking app, and you’re like, “How can I get a million-dollar loan with an app?” And the answer is that they legally can’t tell where your house is. And the algorithm that you use to do mortgages has to be vetted by a federal agency.
That’s an extreme, but that might be the extreme we go down, where every single time anybody gets assessed for anything, the actual algorithm and the inputs are assessed by a federal regulator. So maybe that’s going to be what happens. I actually view it a lot like the debates around divestment. You can say, “Okay, we don’t want to invest in any oil companies,” but then do you want to invest in things that are positively correlated with oil companies, like oil field services companies? What about things that in general have some degree of correlation? How much is enough?
I think it’s the same thing where it’s like, okay, you can’t look at race, but can you look at correlates of race? Can you look at correlates of correlates of race? How far do you go down before you say, “Okay, that’s okay to look at”?I’m reminded a bit of Cathy O’Neil’s book Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy [2016]. One of her arguments, which it seems like you’re echoing, is that the popular perception is that algorithms provide a more objective, more complete view of reality, but that they often just reinforce existing inequities.
That’s right. And the part that I find offensive as a mathematician is the idea that somehow the machines are doing something wrong. We as a society have not chosen to optimize for the thing that we’re telling the machine to optimize for. That’s what it means for the machine to be doing illegal things. The machine isn’t doing anything wrong, and the algorithms are not doing anything wrong. It’s just that they’re literally amoral, and if we told them the things that are okay to optimize against, they would optimize against those instead. It’s a frightening, almost Black Mirror–esque view of reality that comes from the machines, because a lot of them are completely stripped of—not to sound too Trumpian—liberal pieties. It’s completely stripped.They’re not “politically correct.”
They are massively not politically correct, and it’s disturbing. You can load in tons and tons of demographic data, and it’s disturbing when you see percent black in a zip code and percent Hispanic in a zip code be more important than borrower debt-to-income ratio when you run a credit model. When you see something like that, you’re like, Ooh, that’s not good. Because the frightening thing is that even if you remove those specific variables, if the signal is there, you’re going to find correlates with it all the time, and you either need to have a regulator that says, “You can use these variables, you can’t use these variables,” or, I don’t know, we need to change the law.
As a data scientist I