Wednesday, February 7, 2007

Why does the AL dominate the NL?

Which league is better, and why are they better? This is a question that seems to have been discussed ad infinitum the past year or so, prompted mostly by the AL's uber-pwnage of the NL in interleague games the last two years. The first part of my question, "which league is better?", is not particularly hotly debated. The AL is clearly better, as evidenced by their 262-241 record versus the NL in 2005-2006. Indeed, the AL's Runs Scored and Runs Allowed in interleague play the past few years imply a winning percentage of 53-54% versus the NL. The bigger question is "why is the AL so much better?". Is it the pitching, the hitting, or a combination of both? In my previous posts on team-level projections, you may recall that I found the offensive level in the NL to be comparable or even superior to the AL, but that the AL held a large advantage both in starting pitching and relieving.

To quantify those findings a bit further: to compare AL and NL offenses, I take my NL projections and add 0.5 runs per game (RPG) (because we will be replacing the pitcher's 0.200/0.200/0.200 line with some sort of average hitter). In doing so, I find the average AL starting lineup is projected to score 5.24 RPG, with the corresponding NL average of 5.34 RPG (before my dear readers send me emails informing me that these league averages are all-together too high, please remember that these estimates assume that the starting eight or nine play every inning of every game). Additionally, I found that the quality of projected NL benches was slightly superior to AL benches. By contrast, AL and NL rotations came in at 4.40 and 4.58 RPG, respectively, with the bullpen difference being even larger (3.77 to 4.13). Could the AL's dominance truly be a function of the pitching only?

To answer that, I first decided to take a quick trip in the time machine to the beginning of 2006. After all, 2007 hasn't happened yet, so my projections for the upcoming season are not particularly informative to what has already happened. What I find is that in preseason 2006, my predictions of league offensive strength were exactly reversed, 5.33 RPG for the AL, and 5.23 RPG for the NL. So, using the exact same projection system, it looks like the NL has gained offensive strength this offseason, at the expense of the AL. In contrast, the bullpen projections were 4.05 and 4.21 RPG for the AL and NL, a much smaller gap than this year (note: I did not do preseason projections of rotation strength last year, and the defensive projections both years give the AL a slight 0.03 RPG edge). So, the take-home message is that as of preseason 2006, I expected the AL to have a small positive advantage both on the offensive and pitching side of the ball.

Can we achieve and independent confirmation of these results, using the actual 2006 seasonal data? Well, at this point I should mention that there has been a lot of very good work already done on this topic, using more advanced statistical methods than I will use here. For example, Mitchel Lichtman did a three part study in July 2006, the conclusion of which was that the AL's advantage was actually all on the hitting side. My comments about this work are that they certainly do not mesh with what I have seen here, and that while I believe his statistical methods were rigorous, his interpretation of the data required certain assumptions that may not be valid. See this blog conversation for more. In January of this year, John Walsh wrote an article on The Hardball Times website in which he found that hitters switching from the AL to the NL experienced a 0.029 OPS boost, which one can attribute to the lower quality of pitching in the NL. The magnitude of this effect was estimated at 50-60 runs, which is about 60% of the effect needed to explain the AL's advantage. In this analysis, then, it's the AL pitching, not the hitting that dominates.

I am going to propose an extremely simple test, that I think will lead to an interesting prediction at the end. To truly determine where the league dominance lies and why, we need at least two metrics. One metric tells us which league is better. We can use interleague records, and the message here is that the AL is superior. A second metric is needed to determine whether the hitting or pitching contributes more to this dominance, and I propose here that a simple metric is the runs per game (RPG) difference between the AL and NL for a given year. Over the past 13 years, the AL RPG scoring rate has been ~0.35 RPG higher than the NL, on average. This is due, of course, to the presence of the DH in the AL. However, the differential has been as high as 0.71 RPG in 1996, and as low as 0.16 RPG in 2001. When the differential is below average, we can infer that this means that either the AL pitchers are winning the war, or the NL hitters. Think about it - for the RPG differential to fall, something is either causing the RPG to drop in the AL (i.e. pitching) or the RPG to rise in the NL (i.e. the hitting). Conversely, an above-average RPG differential means the reverse (NL pitching dominance, or AL hitting dominance). Combining the two metrics, then, tells us who is dominating and why. This is summarized in the table below:

RPG differential below average
RPG differential above average
AL wins interleague
AL pitching dominates
AL hitting dominates
NL wins interleague
NL hitting dominates
NL pitching dominates

Still with me? Okay. The observed RPG differential in 2006 was........0.21 RPG, which is quite close to the minimum differential ever achieved. To me, this lends credence to the theory that AL pitching is currently dominant.

What can our PECOTA projections tell us then about what to expect for 2007's RPG differential? Well, we already know that the offensive RPG estimates have moved 0.2 runs to favor a higher RPG environment in the NL (that is, the NL should score 0.2 RPG more next year than last year, relative to the AL). Additionally, the bullpen projections have moved to favor a lower run environment in the AL relative to the NL. Both of these metrics are moving in the same direction, favoring a lower RPG environment in the AL relative to the NL. Put those together, and it is not unrealistic to imagine that the 0.21 RPG differential between the AL and the NL could disappear completely, creating a unique condition in which the NL outscores the AL. At the very least, I expect the RPG differential to break through its previous low of 0.16 RPG. For my two readers, I encourage you to bookmark this post, so that I can be summarily taunted when this prediction turns out to be wrong.


JLo said...

I WILL bookmark this page and there is going to be an abundance of taunting if you ARE wrong!!!

Sal said...

That looks like a MATLAB plot.

EA, given the sample size of interleague play, what kind of error bars are we looking at here? Is it possible that the highs and lows are within our noise detection?

EA said...

Hey Sal, thanks for the comments. With regards to the Runs per game delta, that is for the entire season (including interleague play). Still, I don't really know what the noise in this metric with respect to PRESEASON projections, since I have only two years of projections. Additionally, all these projections have a very strong component of "looking in the rearview mirror". The projections of AL pitching dominance are based on the fact that so many AL pitchers broke out last year, which I wouldn't have predicted in preseason 2006. It's very likely that a number of key pitchers in the AL will get hurt this year, and we won't see nearly the discrepancy I am expecting. My prediction is more of a wondering than anything. I am curious to see what happens, but I think a huge talent differential would have to exist for the NL to outscore the AL.