For what now seem like obvious reasons the baseball offense was more interesting to James than the other two potentially big fields of research, fielding and pitching. Hitting statistics were abundant and had, for James, the powers of language. They were, in his Teutonic coinage, “imagenumbers.” Literary material. When you read them, they called to mind pictures. “Let us start with the number 191 in the hit column,” he wrote,
and with the assertion that it is not possible for a flake (I would hope that no one reading this book doesn’t know what a flake is) to get 191 hits in a season. It is possible for a bastard to do this. It is possible for a warthog to do this. It is possible for many people whom you would not want to marry your sister to do this. But to get 191 hits in a season demands (or seems to demand, which is as good for the drama) a consistency, a day-in, day-out devotion, a self-discipline, a willingness to play with pain and (to some degree) a predisposition to the team game which is wholly inconsistent with flakiness. It is entirely possible, on the other hand, for a flake to hit 48 homers. Hitting 48 homers is something done by large, slow men three-quarters thespian….
James was an aesthete. But he was also a pragmatist: he had happened upon something broken and wanted to fix it. But he could only fix what he had the tools to fix. The power of statistical analysis depends on sample size: the larger the pile of data the analyst has to work with, the more confidently he can draw specific conclusions about it. A right-handed hitter who has gone two for ten against left-handed pitching cannot as reliably be predicted to hit .200 against lefties as a hitter who has gone 200 for 1,000. The offensive statistics available to James in 1978 were sufficiently comprehensive to reach specific, meaningful conclusions. Offense he could fix. He couldn’t fix fielding because, as he had explained in his first Abstract, there wasn’t the data available to make a meaningful appraisal of fielding. Pitching didn’t need to be fixed. Or, at any rate, James didn’t think it did.
In 1979, in the third, now annual, Baseball Abstract, James wrote, “a hitter should be measured by his success in that which he is trying to do, and that which he is trying to do is create runs. It is startling, when you think about it, how much confusion there is about this. I find it remarkable that, in listing offenses, the league will list first—meaning best—not the team which scored the most runs, but the team with the highest batting average. It should be obvious that the purpose of an offense is not to compile a high batting average.” Because it was not obvious, at least to the people who ran baseball, James smelled a huge opportunity. How did runs score? “We can’t directly see how many runs each player creates,” he wrote, “but we can see how many runs each team creates.”
He set out to build a model to predict how many runs a team would score, given its number of walks, hits, stolen bases, etc. He’d dig out the numbers for, say, the 1975 Red Sox. (Walks by individual players were still hard to find in 1975, thanks to Henry Chadwick, but team totals were available.) He could also find out how many runs the 1975 Red Sox scored. What he needed to determine was the relative importance to the team’s scoring of the various things Red Sox players did at the plate and on the base paths—that is, assign weights to outs, walks, steals, singles, doubles, etc. There was nothing elegant or principled in the way he went about solving the problem. He simply tried out various equations on the right side of the equals sign until he found one that gave him the team run totals on the left side. The first version of what James called his “Runs Created” formula looked like this:
Runs Created = (Hits + Walks) X Total Bases/(At Bats + Walks)
Crude as it was, the equation could fairly be described as a scientific hypothesis: a model that would predict the number of runs a team would score given its walks, steals, singles, doubles, etc. You could plug actual numbers from past seasons into the right side and see if they gave you the runs the team scored that season. James was, in a sense, trying to predict the past. If the actual number of runs scored by the 1975 Boston Red Sox differed dramatically from the predicted number, his model was clearly false. If they were identical, James was probably onto something. As it turned out, James was onto something. His model came far closer, year in and year out, to describing the run totals of every big league baseball team than anything the teams themselves had come up with.
That, in turn, implied that professional baseball people had a false view of their offenses. It implied, specifically, that they didn’t place enough value on walks and extra base hits, which featured prominently in the “Runs Created” model, and placed too much value on batting average and stolen bases, which James didn’t even bother to include. It implied that sacrifices of any sort were aptly named, as they made no contribution whatsoever. That is: outs were more precious than baseball people believed, or seemed to believe. Not all baseball people, of course. The Jamesean analysis was consistent with an approach to the game championed most vocally by the former manager of the Baltimore Orioles, Earl Weaver. Weaver designed his offenses to maximize the chances of a