Thursday, June 12, 2014

Alphabet Triplets

Most of you may know me as the mysterious "intern" of the now-defunct Baseball Today podcast.  I would often try to answer the "Ridiculous Question of the Day" by doing some research and writing small programs to sift through tons of play-by-play data.  One of the common questions I get is:  "What is the most difficult question that you answered?"  Well, it turns out that it was a question that never made it on to the podcast.  Frankly, it may have even been a little too ridiculous.

It all started with a question that was answered on the podcast.  A listener wanted to know: "When was the last time a starting lineup featured every letter of the alphabet at least once?"  On the podcast, Mark Simon called out myself and talented wordsmith/blogger Diane Firstman of the VORG to answer the question.  We both provided answers that were discussed on the show, but it was a follow-up exchange on Twitter with a fan of the podcast that really intrigued me.  Since it happened (relatively) often that a 9-man lineup featured every letter of the alphabet, we came up with a more ridiculous version:

What is the minimum number of players (using any names in baseball history, not just lineups) needed to cover the entire alphabet?

Obviously, since a single team's lineup contained every letter, the answer is 9 or less.  I also realized that the answer was more than 1, since no player's name contains every letter of the alphabet.  In fact, the greatest number of unique letters in a single player name is 15, and this rare distinction belongs to only one player in the RetroSheet database.  That player is Washington Fulmer, who only appeared in a single game for the 1875 Brooklyn Atlantics.  It also seemed unlikely that it would only take two player names to get the entire alphabet.  For example, a search using the 11 letters that do not appear in Mr. Fulmer's name (b, c, d, j, k, p, q, v, x, y, z) yields zero results.  I quickly wrote a program to count the number of unique letters in each name in the database and I got results ranging from 3 letters (Al Hall and C.C. Lee) to 15 letters (Washington Fulmer), with the average being 8.38 unique letters.  Based on this, I figured the answer would be 3 or 4, it was just a matter of finding the right players.

After a decent amount of searching through names, I quickly found that using 4 names was definitely possible.  I stumbled upon a trio of names - Felix Mackiewicz, Joseph Quinn, and Hy Vandenberg - that was only missing the letter "t."  Therefore, any player with the letter "t" in his name (a list that would include 6711 names) would complete the alphabet.  Armed with this knowledge, I believed that I could find 3 players to cover the entire alphabet (an "Alphabet Triplet"), but it would take a lot of searching.  Once I found one, I thought, I still would not be satisfied.  I wanted to find them all.  How many of these Alphabet Triplets exist?

If I was going to search for the elusive Alphabet Triplets, I certainly didn't want to use a brute force approach and I certainly would never find them all searching manually, one-by-one.  In fact, given the number of player names in the directory (18,174), the number of possible combinations of any 3 players was over 1 trillion.  ONE TRILLION.  I didn't have that kind of time.  I wrote a small piece of code to select random three-player combinations and check the number of unique letters.  I then executed the code for a short period of time and I was able to get almost 44000 combinations per second, which was good, but not good enough.  In fact, if I had to check every combination, it would take the program a full 263 days to complete!

In an effort to reduce the overall number of combinations I needed to check, I developed a strategy of targeting the least common letters first.  Of the 18,174 names in the database, here are the 8 least common letters (by number of names with at least one instance of the letter):

Q:  147 players (0.8% of all players)
X:  323 players (1.8%)
Z:  1194 players (6.6%)
V:  1854 players (10.2%)
F:  2073 players (11.4%)
P:  2407 players (13.2%)
W:  2684 players (14.8%)
J:  3459 players (19.0%)

I decided to focus on the 4 rarest letters (Q, X, Z, and V) and to employ some mathematical tricks.  Given a group of 3 players and 4 letters, there are a limited number of ways to distribute the letters.  One possibility is that one player has all 4 letters, in which case the other 2 players could by any player.  However, a search for a player with Q, X, Z, and V returns no results, so we can eliminate this case.  The second possibility is that one player has 3 of the 4 letters, one player has the 4th letter, and the third player is any player.  Breaking it down for the individual letters gives the following triplet possibilities, with the number of players in each list in parentheses:

{QXZ player (0), V player (1854), any player (18174)}
{QXV player (0), Z player (1194), any player (18174)}
{QZV player (13), X player (323), any player (18174)}
{XZV player (1), Q player (147), any player (18174)}

The first two possibilities can be eliminated since there are no QXZ players and no QXV players.  There are, however, 13 QZV players (including one of my all-time favorites, Omar Vizquel) and 1 XZV player (Xavier Hernandez).

In a similar way, one player can have 2 of the 4 letters, another player can have the other 2, and the third can be any player:

{QX player (2), ZV player (148), any player (18174)}
{QZ player (32), XV player (29), any player (18174)}
{QV player (17), XZ player (24), any player (18174)}

Finally, the last possibility is that one player has 2 of the 4 letters and the other 2 players each have one of the remaining two letters, giving these possible triplets:

{QX player (2), Z player (1194), V player (1854)}
{QZ player (32), X player (323), V player (1854)}
{QV player (17), X player (323), Z player (1194)}
{XZ player (24), Q player (147), V player (1854)}
{XV player (29), Q player (147), Z player (1194)}
{ZV player (148), Q player (147), X player (323)}

Combining all of these lists gives approximately 157 million combinations.  Given that the program I wrote can check about 44,000 every second, this should take only 1 hour!  That's much better than having to wait 263 days.  I let the program run, and when I came back, I had my complete list of 20 Alphabet Triplets:

Anthony Vasquez, Paxton Crawford, Jack Billingham
Esmerling Vasquez, Paxton Crawford, Johnny Blatnik
Esmerling Vasquez, Paxton Crawford, Jerry Buchek
Esmerling Vasquez, Paxton Crawford, John Buckley
Esmerling Vasquez, Paxton Crawford, Johnny Grabowski
Esmerling Vasquez, Paxton Crawford, Herby Jackson
Esmerling Vasquez, Paxton Crawford, John Kirby
Esmerling Vasquez, Paxton Crawford, Johnny Kucab
Jorge Vasquez, Paxton Crawford, Harry Kimberlin
Jorge Vasquez, Felix Doubront, Matthew Cepicky
Guillermo Velasquez, Paxton Crawford, Johnny Blatnik
Guillermo Velasquez, Paxton Crawford, Jerry Buchek
Guillermo Velasquez, Paxton Crawford, John Buckley
Guillermo Velasquez, Paxton Crawford, Johnny Grabowski
Guillermo Velasquez, Paxton Crawford, Herby Jackson
Guillermo Velasquez, Paxton Crawford, John Kirby
Guillermo Velasquez, Paxton Crawford, Johnny Kucab
Omar Vizquel, Paxton Crawford, Johnny Grabowski
Mox McQuery, Steve Filipowicz, John Brackenridge
Jeffrey Marquez, Alex Garbowski, Don Pavletich

(Note:  When I first ran the program, I got 23 results, but that list includes managers and umpires in the RetroSheet database who never actually played)

One final note:
Paxton Crawford is definitely the MVP of the Alphabet Triplets, as he appears in 17 of the 20 listed above.  Interestingly, he is the only player in history with a name that includes X, F, W and P (4 of the 7 rarest letters).

Now that's ridiculous!

1 comment: