Wednesday, June 18, 2014

Fixing Run Differential, Part 1

Run differential (total runs scored minus runs allowed) is often cited as a better indicator of overall team performance than actual winning percentage.  In fact, there is a statistical basis for this statement, and the resulting equation that approximates winning percentage is the Pythagorean Win-Loss formula (where RS is runs scored and RA is runs allowed):
In many cases, this formula does an excellent job of assessing the most probable record for a team, given a number of runs scored and runs allowed, especially if the run differential is low.  The reason for this is that close games (extra inning games, 1-run games) are pretty close to a 50/50 coin flip, and the Pythagorean winning percentage for RS close to RA is approximately 50%.  For example, the 2012 Baltimore Orioles only had a +7 run differential (712 RS and 705 RA), but managed to finish with a 93-69 record.  Their overall record was most certainly helped by their 29-9 mark in 1-run games, the highest single season winning percentage (0.763) in the Expansion Era (1961-present) by any team.  [Interestingly, the Orioles franchise has the three highest single season winning percentages in this era: 1970 (0.727), 1981 (0.750), and 2012 (.763).]  Based on their run differential, however, their projected record using the Pythagorean Win-Loss formula was 82-80.  If we just replace the 29-9 record in 1-run games with 19-19 (a more probable outcome statistically), the Orioles record would have been 83-79, very close to the expected value.

The flaw in run differential as a metric, in my opinion, is that it often over-estimates how good a team is that has a large run differential.  This is due to the fact that it only takes into account overall run scoring, not the game-to-game distribution.  Let's take an extreme example in which a team scores 14 total runs over the course of 2 games.  If the team scores 0 in the first game and 14 in the second game, the expected number of wins is approximately 1, because the 0-run game is certainly a loss and the 14-run game is almost certainly a win.  However, if the team scores 7 runs in each game, the expected number of wins is much higher.  From 2010-2014, teams that scored exactly 7 runs won approximately 83% of the time.  Therefore, the expected number of wins over this two game stretch is 1.66.

So if run distribution matters, what can we do about it?  It occurred to me that instead of using overall run scoring, I could develop a metric that uses game-by-game run scoring (and run prevention) to estimate wins.  First, we need to develop a table that gives winning percentage for each value of runs scored (or allowed) in a single game.  To do this, I used the Baseball-Reference.com Play Index "Situational Records" tool for all games since the beginning of the 2010 season.  Here is the breakdown for each case:

RS/game W-L%
0 0.000
1 0.100
2 0.248
3 0.392
4 0.559
5 0.652
6 0.755
7 0.829
8 0.873
9 0.907
10 0.937
11 0.970
12 0.981
13+ 0.993

In order to get the same table for runs allowed, we can simply take 1 minus the W-L% column above.  (So scoring 3 runs in a game gives you a 39.2% chance to win, wheres allowing exactly 3 runs in a game gives you a 60.8% chance to win.)  From this table, it is evident that run differential, at least on a per game basis, has diminishing returns.  After about 6 runs scored, each additional run contributes less and less to the overall chance of winning.

Using the table above, we can compute an expected win total based on run distribution.  Let's say that a team plays 6 games and they score 2, 5, 3, 8, 1, and 4 runs in those games.  Their expected win total for those games would be (.248 + 0.652 + .392 + 0.873 + 0.100 + 0.559) = 2.824 wins.  In a similar way, we could compute the average number of wins based on the number of runs allowed in those 6 games.  To get an overall expected number of wins (based on run scoring and prevention), we could take the average.

Based on the methodology above, we can compute an expected record for the 2012 Orioles.  The number of wins based on run scoring is 83.9 and the number of wins based on run prevention is 81.5, for an overall average of 82.7 wins, which is very close to the Pythagorean win total (82).  This makes sense given the reasoning above regarding Pythagorean Win-Loss for small run differentials.

Now let's consider the 2014 Oakland Athletics, who have a 42-28 record and a +126 run differential (359 RS and 233 RA).  The Pythagorean Win-Loss formula estimates their record to be 48-22, or 6 games better than their actual record.  However, it is my conjecture that this estimate inflates the win total due to a large number of blowout wins (18 wins by at least 5 runs).  Based on my formula, the 2014 A's expected number of wins is 40.5.

In my next post, I will give expected records for all 30 teams based on this formula.  If you have any suggestions or comments about this post, feel free to leave them below.

1 comment:

  1. This is like comparing OPS to wOBA. Sure, one is better, but it's also impossible to compute without a spreadsheet.

    ReplyDelete