3-D Baseball: Evaluating Pitchers with FIP, Part I

Lately, I've found myself discussing the merits of FIP and other defense-independent pitching statistics quite a bit, so I've decided to compile the contents of my various posts on the subject here to address generally some of the more common issues people have taken with them. A lot of people, it seems, are reluctant to consider FIP as a tool to evaluate how well a pitcher pitched or how good a season he had, and feel that its primary use is as a projection tool that does not really evaluate what has already happened very well. There is a conception that FIP throws out events and is therefore unfit for use at evaluating what happened, or that it inappropriately favours strikeouts over other outs.

These ideas are generally based on an incomplete understanding of either what FIP does or of what other stats do. This understanding has to be addressed before a decision can be made on whether or how much to consider FIP, et al in evaluating pitcher performance. FIP is not:

-a projection
-a measure of who struck out the most hitters
-an arbitrary compilation of a few stats that someone was overly enamored with
-a stat that pretends balls in play never happened
-created by an agent (ok, hopefully no one reading this blog needs to be assured of that one)
-a comprehensive stat that tells you everything about how well someone pitched

The first place most people get hung up on is with the handling of strikeouts and balls in play. Some people think FIP only counts strikeouts and throws out all BIP events as if they don't exist. This is not exactly what is happening. FIP appears to throw out BIP events because it presents the values of other events relative to the value of a ball in play. The formula for FIP is:

(13*HR+3*BB-2*K)/IP + C

where C is a constant that shifts FIP so it is centered around the same mean as ERA. Most people notice that balls in play are not included in the formula and assume that they are simply ignored. Where do the weights for the other events come from, though? We can start with the following linear weights values for each event as published at TangoTiger.net:

HR		1.40
3B		1.03
2B		0.75
1B		0.46
BB		0.30
out		-0.27

FIP and other defense-independent measures lump balls in play together and consider their average value rather than take the value of each event based on its outcome. FIP lumps all non-HR balls in play together (though some measures, like tRA, lump them into multiple groups). So what is the average value of a BIP according to the above table? To answer, we need to know how frequently balls in play become singles, doubles, triples, and outs. Then, we multiply the frequencies by the values of each event and sum them for the average value of a ball in play. This comes out to about -.04. So the values of each event in FIP are:

HR		1.40
BB		0.30
out		-0.27
BIP		-0.04

Next, we determine the value of each event relative to the value of a BIP by taking the difference between each value and -.04:

HR		1.44
BB		0.34
out		-0.23
BIP		0

This is the value of each event used in FIP. The formula above puts these weights onto a scale of per 9 innings; in other words, each weight is multiplied by 9, and then divided by IP. That gives you 13, 3, -2, and 0 as the weights per inning. So BIP are still there, just hidden in the formula. Their value is used to determine the weights given to each other event.

So that is what FIP is: a measure that regresses the value of all balls in play completely to the league average and then weights the value of other events relative to that value. As the argument goes, a ground ball to the shortstop is as good as a strikeout...unless the shortstop doesn't field it for an out. FIP gives the pitcher credit for that ground ball, as well as every other ball in play, based on how likely it is to become an out or a hit or whatever, not based on whether a fielder got to it and converted an out or not. This is where many fans who have gotten past the first issue begin to take umbrage. Why should we regress any of the outcomes if we only want to assess what actually happened? Isn't the point of regression to form some sort of projection rather than an assessment?

Not always, no. If the point were to project, the regression and compilation of the stat would be significantly different. In this case, the point of regression is not to project future value, but to determine the value of past events. This is actually much more commonplace than most people realize; most stats regress factors to the mean, and what's more, for the most part they do so in a way that does not reflect value. WHIP regresses the value of all baserunners to the same value, so a HR is worth the same as a walk. It also regresses all sequencing and timing of events completely to the mean, as do opponent AVG, OBP, and SLG (all of which regress the value of events to some arbitrary value, i.e. all times on base or all bases to 1). All of these, along with ERA, regress defensive support completely to the mean, and ERA regresses bullpen support on inherited runners to the mean (meaning that it is assumed that the runs saved or cost for the pitcher by the defense and the bullpen are assumed to be average, and thus the actual outcomes of these effects don't have to be accounted for). ERA, while it doesn't regress sequencing of events, does regress leverage/situation and distribution across games. Just about all stats regress quality of opponents. Stats that aren't park adjusted regress park effects. That doesn't make any of these "projection" stats.

No stat considers all factors un-regressed. The factors FIP chooses to regress and leave un-regressed are designed specifically to model value produced by the pitcher. That is not true for most other stats. So why would you hold it against FIP that it regresses factors when it does so logically and empirically but not against other stats that do so mostly arbitrarily?

You can't throw out FIP or any other defense-independent stat just because it regresses the value of events unless you are going to throw everything else out the window with it. Are the outcomes of all singles the same, or of all doubles, triples, or home runs? Of course not, but when you look at a pitcher's opponent batting line, the values of these events are all regressed to their average value. Is a run in the bottom of the 9th of a tie game worth the same as a run in a blowout? Again, no, but ERA regresses all runs to the same value, or rather, all earned runs to the same value. ERA also regresses all unearned runs to the same value of zero. These things don't bother most fans because they can accept the idea that sequencing or timing of events might not reflect how well a pitcher actually pitched even though they are reflected in the outcome of what happened. This is the same concept as FIP uses: you decide what you want to look at with your stat, and then you decide where you want to stick to strictly measuring the outcomes and where you want to regress the results.

This is the first half of a two part article. Continue reading here: Evaluating Pitchers with FIP, Part II

1 comments:

jinaz said...: Just wanted to say how much I enjoyed this piece. I've been using FIP for years and I felt like I understood it reasonably well. I've even built a series of pitcher WAR estimates on it. But I understand it better now after reading this. It's the best explanation of where the coefficients come from that I've seen, including Tango's original development articles. :)
-j; October 10, 2009 at 6:37 PM

3-D Baseball

Evaluating Pitchers with FIP, Part I

1 comments:

Post a Comment

Javier Vazquez K-Watch

Links

Retrosheet Credit

Lahman Credit

Contributors

Blog Archive