The genesis of this project was in reading “The New Bill James Historical Baseball Abstract” in 2001. I was familiar with James’ work, but his “Win Shares” system, which distills all of a particular player’s contributions into a single number, opened my eyes to the potential of deep statistical analysis.
Beyond Win Shares itself, a couple of other things caught my attention in that book. The first was using “runs saved” to evaluate a pitcher (more on that later). The second was that, once he had Win Shares, James looked at the final numbers from several angles to rate players. Not just total Win Shares, but Win Shares per 162 games, top 3 seasons, top 5 consecutive seasons … all these went into the evaluation.
The third nugget was, in doing detailed comparisons of players, he would sometimes run through their hitting stats and compare the offense they generated to the league average, i.e. what a typical team would do. So Rogers Hornsby generated as much offense in 1921 as an average 1921 team would produce in 33 games; while Eddie Collins, in 1916, generated 35 games’ worth despite having less impressive statistics. That was the thunderbolt for me – it occurred to me that expressing their statistics in that way would be almost as accurate as Win Shares, and could be applied to other sports, too. I could compare Walter Payton to what an average NFL team did in 1984, or Michael Jordan to an average NBA team from 1991.
I started with baseball, using James’ work as a touchstone, checking every so often that the numbers I was generating moved in parallel to his. But I went off the reservation pretty early (I can’t let go of RBI, sorry), and after 100 or so players, I was ready to adapt Apples & Oranges, as I began to call it, to other sports. There were surprises along the way, but mostly I was heartened by how consistent it seemed to work as I moved to cricket, football, basketball, etc.
Baseball players are evaluated on two offensive and two defensive components. Offense 1 is total bases: singles, doubles times 2, home runs times 4, etc. We also count walks, steals, hit by pitch, sacrifice bunts, and steals. We subtract caught stealing and grounding into a double play. Then the whole mess is divided by the league average for a particular season.
Here are Babe Ruth’s numbers for 1920:
We don’t have GDP for 1920. Anyway, that’s 546 total bases. The typical American League team in 1920 had 16.30 per game (not counting HBP and SH, which are bonus categories). Dividing 546 by 16.30 gives us 33.50 games’ worth.
Offense 2 is much simpler. We average the player’s runs scored and RBI, and divide by the league scoring average. For the Bambino in 1920, that’s 158 and 137, respectively, giving us an average of 147.50. Divide by 4.76 runs per game to get 30.99 games’ worth of runs. Now average with the first number to get a total offense of 32.24.
Defense 1 is outs. Putouts by catchers are adjusted for team strikeouts, and first basemen are adjusted (heavily) by infield assists. Infield putouts (including pitchers) and all assists count as half an out each. Outfield putouts and strikeouts are a full out, and catchers get a bonus for preventing stolen bases. Errors, wild pitches and passed balls each take away an out.
Defense 2 is for catchers and pitchers only. Pitchers are credited for each run they allow less than 1.5 times the league average for the number of innings they pitch. Catchers get a small share of the team’s ERA, or cERA if available, prorated to the number of games caught. All other players get a zero in this spot. Take the average of Defense 1 and 2, then average that number with the offensive total to get the grand total for that season. For Ruth, that’s 9.30 on Defense 1, 0.11 on Defense 2 (he pitched a little), yielding 4.70 on defense and 18.47 all around. Then add up every season to get the grand total.
So let’s start where I started, with the catchers:
|Catchers||Games||Total||Per 160||Sqr Root||Sum||Off||Def|
The total is the average of the total offensive and defensive games’ worth that the player earned in his career. This number is then divided by the games played and multiplied by 160 to give us a 160-game average. That number is added to the square root of the total to give us the final rating (in bold). The last two columns are the offensive and defensive ratings per 160 games. In general, because they play so many games, baseball players have higher totals and lower 160-game averages than other athletes. For other sports, the average is quite a bit higher than the square root of the total. In baseball, they’re almost even. But that bold number tends to come out in the same ranges.
Johnny Bench rates a little low defensively because his pitching staff was mediocre, driving down his cERA, and he played a few hundred games at positions other than catcher. Defense is influenced most by playing a lot at a particular position, rather than how well you played it.
Campanella’s career is truncated at both ends: by the color line at the beginning and by a car crash at the end. In a fairer world, he’d rate higher. He’s dead even per game with Bench.
|First Base||Games||Total||Per 160||Sqr Root||Sum||Off||Def|
Because the Red Sox wasted Babe Ruth’s first five years as a pitcher, Gehrig ends up with the highest offensive average of the players I’ve rated. Once he stopped pitching, Ruth was at 30.81.
We see the effect of the designated hitter rule here. Ortiz is the obvious example, but there are others. Eddie Murray was a good fielder who DH’d a lot. Frank Thomas was a lousy fielder who DH’d a lot. In both cases it cost them something like half a point on the final rating.
Hank Greenberg only played 1,417 games, due to World War II. Some guys are hurt by a short career, some guys are helped. It helped Greeberg; I don’t think he sustains those numbers over 2,200 games.
I can’t explain how McGwire does so well defensively other than to say he had a lot of balls hit his way.
|Second Base||Games||Total||Per 160||Sqr Root||Sum||Off||Def|
If you’ve studied “The New Historical Abstract,” then you know this is where I part ways with the guy who inspired me. Hornsby and Lajoie over Morgan …
There are four guys over 30.00 among second basemen, which has emerged as the dividing line between the immortals or the merely great. Or maybe it’s 29.50. Not sure yet. Anyway, four over 30, and all within a point-and-a-half of each other, is unusual.
Apples & Oranges rates second basemen, as a group, over shortstops in the field. Shortstop is the more difficult position, but second basemen are more productive because of all the double plays they turn. In fact, I give zero extra credit for double plays. Just getting an assist and a putout on the same play is enough to push them ahead.
|Third Base||Games||Total||Per 160||Sqr Root||Sum||Off||Def|
|Home Run Baker||1600||141.46||14.15||11.89||26.04||23.91||4.39|
Apples & Oranges doesn’t rate third basemen much higher than first basemen on defense. It makes a certain amount of sense; once you get past the fact that it’s more difficult to play third, you see that the number of plays they make is pretty similar. Brett and Molitor have their numbers depressed even further because they DH’d so much.
I was shocked at how low Boggs ended up. He deserves his own writeup, but for now I’ll just point out two things. One, his numbers fell off a cliff after 1991, his eighth year in the league. We remember early Boggs hitting doubles off the Green Monster, not the guy slapping singles for the Yankees. Second, a huge part of his value was in his walks, and I have a feeling that there’s diminishing returns on very high OBP players when it comes to scoring runs. I’d love to really go in-depth the issue one day.
|Shortstop||Games||Total||Per 160||Sqr Root||Sum||Off||Def|
|Cal Ripken Jr.||3029||233.87||12.35||15.29||27.65||18.97||5.74|
|Pee Wee Reese||2210||178.88||12.95||13.37||26.33||19.46||6.45|
The great shortstops, unlike second base, tend not to be all-rounders … other than Wagner, of course. The next three players after him are great hitters who ended up moving to easier positions, and Jeter should have moved. Ripken was a natural third baseman.
Ozzie’s the only real glove-first guy on the list. Most other defensive specialists lose value at the end of their careers because they can’t stay in the lineup. But even for him, defense is less than half of his offensive value. Defensive production is shared too evenly for stars to dominate like they do on offense.
Wagner could play anywhere and that dilutes his defense. Just as a shortstop, he rates at a stratospheric 7.27. Robin Yount, on the other hand, helped himself by moving to center field, which, like second base, is more productive than shortstop.
|Left Field||Games||Total||Per 160||Sqr Root||Sum||Off||Def|
Bonds … yeah, I know. Whatever he did to his body, the results are there on the field. The home runs counted, the games he helped win counted, and the ERAs he inflated counted. Same goes for McGwire and Sosa.
Pete Rose is hard to place, since he played all over the field. He had the most games at first base, but all but two those all came in the second half of his long career, after he left the Reds. He essentially played a whole normal career before that. Of the other positions, left field is the one he played the most.
Williams lost 4¾ seasons to military service. With those games, he’d probably be neck-and-neck with Bonds, maybe even ahead.
Al Simmons leads in defense because he played a good deal of center field. Henderson’s the top “true” left fielder.
|Center Field||Games||Total||Per 160||Sqr Root||Sum||Off||Def|
|Ken Griffey Jr.||2689||240.27||14.30||15.50||29.80||22.39||6.21|
I was mildly surprised that Cobb came in ahead of Mays. Maybe if we had caught-stealing data for Cobb’s career that would be different, but I think his numbers would hold steady relative to the league average. Could it be that we underrate Cobb as a ballplayer because he was such a terrible person?
It’s harder to make the wartime case for Joe DiMaggio than it is for Ted Williams. With an extra 450 games in his prime, DiMaggio probably passes Mantle, but no more.
I mentioned that Apples & Oranges rates center field ahead of shortstop, defensively — mostly because an outfield putout counts twice as much as an infield putout. Why? Two reasons. First, infield outs generally take two people, one to field and make the throw, the other to get the out, so they split credit. Second, the stakes in the outfield are higher. If an infielder screws up, there’s a man on first and the players on base advance once. If an outfielder misses one, it’s at least a double, and all the runners score.
|Right Field||Games||Total||Per 160||Sqr Root||Sum||Off||Def|
If Ruth’s numbers look off, it’s because he started as a pitcher early in his career and, as I explain below, pitcher starts count double in determining averages. That’s also why his defensive average is so high.
Roberto Clemente and Paul Waner, both right fielders for the Pirates, are almost exactly even. Makes me wonder about Ralph Kiner …
|Starting Pitchers||Games||Total||Per 160||Sqr Root||Sum||Off||Def|
Starting pitchers forced me into my first judgment call. On one hand, they’re massively valuable on a per-game basis; more so than any other group of players in any sport. On the other hand, they can’t go every game, needing three or four games off for every one they pitch. If I rated them like everyone else, they’d end up with final values in the 40s. But cutting their scores by three-quarters seemed an extreme way to balance the scales, especially given that I’m comparing them, among others, to NFL players who play 16 times a year and get a week off between games.
The compromise I came up with was to count every pitching start as two games when determining the 160-game averages. It’s not a number grounded in any scientific formulation, just one that I found gives us the best balance between their game-to-game production and their general unavailability. Games started are on the individual player pages.
|Relief Pitchers||Games||Total||Per 160||Sqr Root||Sum||Off||Def|
Baseball statisticians twist themselves into knots trying to find extra value for relievers in “high-leverage” situations. I don’t … but even if you do, it doesn’t make up for the plain fact that relievers don’t do very much when compared to starters or regular position players. Pitching the ninth inning with a lead isn’t that valuable, beyond serving as a security blanket for the manager.
As I write this, the Tampa Bay Rays are engaged in a high-profile experiment to have relievers begin the game, then bring in the “starter” to take care of innings 2 through 6 or whatever. I don’t think it’ll mean much … it will just expose the lie of the high-leverage situation. Pitch Sergio Romo in the first, sixth or ninth and it won’t matter … he’ll give you the same mediocre innings.
Eckersley rates as high as he does because he started 361 games in the first half of his career. (But Wilhelm started only 52 games and still comes out ahead …)
Quisinberry gets dinged for an extremely low strikeout total. He had great control and did a lot of things well to keep the score down, but ultimately relied on his defense to get almost all his outs.
|19th Century||Position||Games||Total||Per 160||Sqr
|Old Hoss Radbourn||P||653||100.06||13.90||10.00||23.90||7.81||19.98|
This selection of 19th century stars is here mostly for context. Anson and Delahanty have the numbers to be included with the modern players (as Billy Hamilton and Kid Nichols were), but the rest of them don’t really measure up.
Radbourn and Spalding, like a lot of pitchers who played before 1890, put up extremely high single-season numbers but burned out quickly. I treated them exactly like modern starters, counting each start double for the purpose of figuring out the averages.
To come: I have very rough estimates on some Negro League stars based on published information. I feel the need to include them, but I still lack some critical details, especially league-wide numbers for context. Seamheads.com is doing some excellent work in that regard, and it may be that soon I’ll be able to fine-tune the spreadsheets into something in which I have enough confidence to publish.
To do list: