Wednesday, October 7, 2015

Settling the NL Cy Young Debate

Let me start by saying that I don't think there is a bad choice this year among the top three candidates: Clayton Kershaw, Zack Greinke, and Jake Arrieta.  I have seen good arguments made for all three, but most of the "all-encompassing" stats have some major flaws.  Jonah Keri nicely summarizes these in his piece on Grantland.  His choice in using the Baseball Prospectus stat called DRA (Deserved Run Average) is a fine one, as it doesn't attempt to quantify how good a pitcher actually is (like FanGraphs WAR tends to do), but rather how good a pitcher has actually done based on results.  I have been working on a similar (albeit much less complicated) stat for a while now that accomplishes a similar goal.  My statistic (which for now I will simply call Adjusted ERA) uses HRs, hits, walks, and a 4th category called EBA (extra bases added) to quantify pitcher performance on an ERA scale.  Similar to Bill James' Component ERA, this stat attempts to remove "cluster luck" (the sequencing of hits and other events) from ERA and give a better indication of how the pitcher performed.  In my version, the EBA component consists of several events (excluding hits and walks) that allow runners to take an extra base (stolen bases and wild pitches, for example) or remove runners that are already on base (pickoffs and double plays, for example).

The top 15 qualified starters in Adjusted ERA for 2015 are:


Based on this, I would have to give a very slight edge to Greinke.  Greinke's ability to suppress extra bases (especially in the running game) gave him the slight edge over Arrieta.


Monday, September 14, 2015

80 Game Score Droughts

As Tristan mentioned in his Geeky Stat of the Day, Rich Hill just posted a Game Score of 84 and hadn't posted another game score of at least 80 since June 7, 2007, a span of 3020 days.  This drought seems pretty long (over 7 years!), but is it the longest?

So far, I have found 2 that are longer since 1914:

Si Johnson had a Game Score of 84 (9 IP, 1 H, 2 BB, 1 K) on May 18, 1933 against the Boston Braves as a member of the Cincinnati Reds.  His next Game Score of more than 80 came 3302 days later (9.05 years) against his previous team (the Reds) as a member of the Philadelphia Phillies.  In this game, he pitched a 10 inning shutout, giving up 5 hits and finishing with a Game Score of 82.

Rip Collins had the second longest span at 3259 days (8.93 years).  The first game was on July 11, 1921 when he shut out the White Sox (9 IP, 5 H, 3 BB, 6 K) as a member of the New York Yankees and the second game came on June 13, 1930 when he shut out the Red Sox (11 IP, 4 H, 0 BB, 2 K) as a member of the St. Louis Browns.

If we change the search to most starts between 80 Game Score games instead of most days, we get the following top 5:
  1. Steve Trout -- 190 starts (1979-1987)
  2. Bump Hadley -- 185 starts (1933-1941)
  3. Livan Hernandez -- 177 starts (2004-2010)
  4. Greg Maddux -- 176 starts (2001-2006)
  5. Tim Wakefield -- 168 starts (1998-2005)
Yes, Greg Maddux was at the end of his career and his game on August 13th, 2006 was his last game with a game score of at least 80.  Interestingly enough, Tim Wakefield's drought came in the middle of his career.  He had 5 games from '95-'98 and 5 games from '05-'08 with a Game Score of 80 or more, but none in between.

If we cheat a little bit and change the criteria to a Game Score of 79 or more, then we have a runaway winner.  Socks Siebold had a Game Score of 79 in one of his only 3 appearances in 1916 (September 24th) and then had a Game Score of 81 on June 11, 1931, a whopping 5373 days (14.72 years!) later.  His streak was aided by his absence in the league from 1920-1928, but that's still an incredible number (and a great name as well).

Thursday, September 3, 2015

Will there ever be a 5 homer game?

As many people probably know, the 4 home run game is one of the rarest feats in baseball.  There have been 16 4-homer games in baseball history, the last of which was Josh Hamilton's monster performance in 2012.  Here are the last 14 instances (since 1914) using the Baseball-Reference Play Index.  In all but one of these games, the player at least had a chance to hit a 5th homer.  Only Carlos Delgado's  4 HR game in 2003 came in exactly 4 plate appearances.  Will we ever see a player hit 5 homers in a game?  I crunched the numbers to find out our chances.

In order to spare people from all of the gory math details, I will try to simplify the explanation.  Let's start with a single player in a single game.  The important numbers we need to know are a player's average home run rate (HR per Plate Appearance) and how many plate appearances he received in the game.  For simplicity, we will assume that each plate appearance is an independent event, i.e. that the likelihood of a home run in one trip to the plate does not affect the next trip to the plate.  With these two numbers, we can determine the probability that he hits any given number of home runs in those plate appearances using the binomial distribution.  For example, let's assume that a player hits a homer in 3% of his plate appearances (the average for the league is currently about 2.8%).  If he gets exactly 4 plate appearances in a single game, his chances are approximately 1 in 1.23 million to homer in all 4 trips to the plate.  However, even giving him just one more chance (4 homers in 5 plate appearances) drastically improves his odds.  This somewhat average power hitter now has a 1 in 253 thousand chance to hit 4 homers in that single game.  If we give him 6 chances, the probability again improves to 1 in 86,400.  

If you thought those odds were bad, now consider the same player hitting 5 homers in 5 plate appearances.  The chances of that happening are 1 in 41.2 million.  But what about a hitter with more power?  A player with a 6% home run rate (for example, a guy like Mike Trout this season) has a 1 in 1.29 million chance to hit 5 homers in 5 chances, which is nearly the same chance that our "average" player has to hit 4 in 4 chances.  The two main conclusions we can make are:
  1. A higher home run rate exponentially increases a player's chances of hitting 5 homers in a a game.  A great power hitter (6% home run rate) is 32 times as likely than an average power hitter (3% home run rate) to hit 5 HR in a game.
  2. An extra plate appearance exponentially increases a player's chances of hitting 5 homers in a a game.  Hitting 5 homers in a game with 6 PAs is 5.85 times as likely as hitting 5 homers in a game with only 5 PAs.
In order to find our chances of any player hitting 5 home runs, we need to consider the league as a whole.  We cannot simply use the average home run rate and assume that every game played in a season features 9 "average" players on each team.  This would actually underestimate our true chances of seeing a 5 HR game.  Likewise, we cannot assume that the average player gets exactly 5 plate appearances in a game.  In both cases, we must consider a distribution of different numbers.  I used the top 250 hitters in terms of total plate appearances to find the distribution of home run rates throughout baseball, ranging from Giancarlo Stanton and Nelson Cruz to Michael Bourn and Eric Sogard.  In addition, I found the distribution of number of plate appearances per game.  It turns out that this eliminates almost 75% of all plate appearances since players are most likely to only have 4 plate appearances in a game (58% of all PAs come in 4 PA games, 16.6% of all PAs come in games with 3 PA or less).

Using all of this data, I created a formula to compute several different estimates given the current offensive environment in baseball:
  • On average, we should see 9.63 3-homer games per year
  • A 4 HR game should happen on average every 5.1 years (19.7% probability)
  • A 5 HR game should happen on average every 405 years (0.247% probability)
So you're saying there's a chance?  Yes, every 405 years seems like a longshot but not everything happens at regular intervals.  We could get lucky.  Using these numbers, there is about a 10% chance that it happens in the next 40 years.  I just hope I am alive when it finally does.

Tuesday, September 1, 2015

Fantasy Focus Podcast Question

Inspired by the Phillies winningest pitcher only having 6 victories this year (Cole Hamels, now a member of the Rangers), the following question was asked:

Which team's winningest pitcher had the lowest season win total, and how many wins did he have?

If we restrict the search to only starting pitchers (who started in 60% of the games they played), the answer for the lowest win total is 6, which has happened 3 times:

  • The 1957 Kansas City Athletics had 2 starters with 6 wins, Ned Garver (6-13) and Alex Kellner (6-5).  This result, however, is dubious for several reasons.  Based on the search criteria, 5 pitchers are eliminated that started at least 7 games because they also appeared frequently as relievers.  In fact, 4 of these 5 relievers had at least 7 wins [Tom Morgan (9-7), Jack Urban (7-4), Virgil Trucks (9-7), and Wally Burnette (7-12)].  Also, the Athletics only played in 153 games that year, giving them less chances to get wins than the current 162 game schedule.
  • The 1997 Oakland Athletics (featuring the Bash Brothers) had one starter with 6 wins, and that was Ariel Prieto (6-8).  That A's team was pretty bad (69 wins as a team), but the biggest contributing factor was probably that they had 9 starters make at least 10 starts and no starter with more than 24 starts.  I'm sure their 5.48 team ERA didn't exactly help either.  The team leader in wins was actually reliever Aaron Small, who had 9 wins and 5 losses.
  • The 2012 Colorado Rockies had only one starter with 6 wins as well, and that was Jeff Francis (6-7).  Just like the '97 A's, the Rockies were a bad team (64 wins) with a terrible staff ERA (5.22).  Coincidentally, they also had exactly 9 pitchers with 10 or more starts (including a 49 year old Jamie Moyer) and no starter with more than 24 starts.  The leader on the team in wins was actually Rex Brothers, who finished with an 8-2 record from the bullpen.
If we change the search to all pitchers (starters and relievers), the answer is 7, which has happened 4 times (although the 2015 Phillies have a decent shot at taking this record):
  • The 1981 New York Mets had two pitchers with 7 wins-- starter Pat Zachry (7-14) and closer Neil Allen (7-6).  However, this was a strike-shortened season so it really should not count.
  • The 1987 Cleveland Indians had three pitchers with 7 wins-- the Candy Man Tom Candiotti (7-18),  48 year old knuckleballer Phil Niekro (7-11), and reliever Scott Bailes (7-8)
  • The 1996 Detroit Tigers had one pitcher (Omar Olivares) with 7 wins.  I'd say that's not too bad considering the team only won 53 games all season.
  • The 2013 Houston Astros, also historically bad with 51 total wins, had one pitcher (starter Jordan Lyles) with 7 wins.

Monday, August 31, 2015

Daily Fantasy Baseball, Part I

When playing daily fantasy baseball, the most important concept is value.  Using the restriction of a salary cap, the objective is to assemble a lineup that scores the most points.  However, the players that score the most points per game are not necessarily the best players to use in your lineup.  In general, you want to look at a player's production relative to his price when determining his actual value to your lineup.  Obviously, player salaries fluctuate based on past performance, matchup, ballpark, position in the batting order, and several other factors.  Therefore, it is not even sufficient to look at a player's average points scored (in all games) compared to his current salary.  Rather, to get a better picture, we must examine his performance in each game relative to his salary for only that game.  (As far as I can tell, no free website that I have seen keeps track of this type of data and compiles it for you.)  

I have been keeping Draft Kings point totals and salary data all season long and have written a few programs to analyze all of it.  In doing so, I have developed a few metrics to measure a player's profitability.  First, what do I mean by profit?  In economics, profit is total revenue minus total cost.  In DFS, we have an easy way to measure cost (the player's salary), but revenue requires one extra step.  In order to equate points to a dollar value, we have to know, on average, how much a single DFS point is worth.  Based on my analysis, I have concluded that:
  • On average, it costs about $517 in salary per point for pitchers
  • On average, it costs about $569 in salary per point for hitters
This tells us that pitchers are approximately 10% more valuable than hitters since it takes about 10% less salary on average to score 1 point.  The profit formulas then become:
  • For pitchers:     (points scored) x ($517) - (salary)
  • For hitters:        (points scored) x ($569) - (salary)
Using these formulas, I have created a database of every player's game logs including salary and points scored.  I calculated total profit over the course of the season, average profit per game, percent of games played in which the player returned a profit. and a metric I like to call the GPP index.  This index heavily weights performances that are outliers in the positive direction, i.e. when a player returns 2 or 3 times the point total that one would expect based on his salary.  This, in general, is what will win you GPP matchups-- outstanding performances by players who are bargains based on their price.

The top 25 hitters in profit per game are listed below:

Top Draft Kings DFS players in profit per game
This analysis shows us that the most profitable hitters are a combination of high-priced superstars (Bryce Harper, Paul Goldschmidt, Josh Donaldson), breakout players having career years (A.J. Pollock, Lorenzo Cain, Manny Machado), and undervalued, cheaper role players (Chris Colabello, Billy Burns, Eddie Rosario).  In terms of GPP value, though, the picture is somewhat different.  Joey Votto, for example, is not a great GPP play because he is consistently good without the upside.  He is only profitable in less than 40% of his games (relatively low) and rarely provides an outstanding performance, giving him a GPP rank of 119 despite being the 22nd most profitable player.  Chris Colabello, on the other hand, has returned a profit in 57% of his games and comes at a relatively cheap price.  He also often returns 2 or 3 times his salary and is the highest ranked GPP player so far this season.

I have a lot more DFS data to share (including analysis of pitchers), but I wanted to put this out there and see if there is any response for more.  Part II may be coming soon if the demand is there.