1-D Baseball? On the DH and Fielding

Let me begin this discussion by saying I am not a fan of the DH. Maybe it's because I grew up a National League fan, or maybe I'm just surrounded by too many people who think, quite loudly, that it's stupid, but for whatever reason, I just don't care for it (and for those who feel the need to ask, yes, I enjoy watching pitchers try to hit a few times a game). I am, however, a lot more ambivalent toward the idea than I used to be. One of the reasons I've softened my position is that the argument that it creates one-dimensional players, so often the rhetorical linchpin of those who rail against the position, rings a bit hollow to me. Off the top of your head, how many players have ascended to one-dimensional (meaning just offense, I guess, though I generally think of offense as being multi-dimensional) greatness through the grace of the DH? There's Edgar Martinez (though he did play nearly 5000 innings in the field) and Frank Thomas (over 8000 innings in the field), and then maybe Harold Baines as far as players who were primarily DH over their careers.

Now, how many offense-only stars can you think of who were stuck in the field somewhere? There's Gary Sheffield, Manny Ramirez, Mark McGwire, Jim Thome, Richie Allen, Harmon Killebrew, Willie McCovey, etc. Generally, if you can hit well enough to pass as a DH in the Majors, especially a regular DH, you can find a spot on a team with or without the DH, even if you field like William Taft. There were and are one-dimensional players without the DH, and the DH hasn't created an abundance of players suited only for DH. Corollary to this, of course, is that had the Mariners lacked the position of DH, they'd have found a place for someone like Edgar to play, and he'd have still been a stud.

This got me thinking: if I can't think of that many great hitters who made their careers primarily as a DH, how much is the DH even used as its own position isolated from the field? A lot of teams rotate fielders into and out of the DH slot throughout the season, or use it to spell fielders for a night. Not that many teams use it to completely quarantine the iron glove of a big bat. Just how much is the DH its own singular position, and how much is it an amalgamation of other positions drifting between the field and DH?

I looked at all AL teams this decade, and for each team, I looked at every player who had at least 1 PA at DH for that team in that season. Then, for each player, I looked at how many PAs he had at DH compared to how many PAs he had total that season. From these figures, I calculated the percentage of his team's PAs at DH each player had, and the percentage of each player's PAs that came at DH. For each team, the total percentage of PAs its DHs spent DHing as opposed to in the field was found by multiplying these two percentages for each player and then summing the products. The following table demonstrates this process for the 2000 Angels:

batter PA_DH teamPA perc playerPA perc2 term
gil-b001 6 674 0.89% 343 1.75% 0.02%
palmo001 57 674 8.46% 296 19.26% 1.63%
baugj001 2 674 0.30% 23 8.70% 0.03%
glaut001 12 674 1.78% 678 1.77% 0.03%
gantr001 33 674 4.90% 103 32.04% 1.57%
spies001 192 674 28.49% 345 55.65% 15.85%
salmt001 145 674 21.51% 680 21.32% 4.59%
walbm001 2 674 0.30% 155 1.29% 0.00%
erstd001 97 674 14.39% 747 12.99% 1.87%
andeg001 41 674 6.08% 681 6.02% 0.37%
cleme001 21 674 3.12% 80 26.25% 0.82%
vaugm001 58 674 8.61% 712 8.15% 0.70%
molib001 8 674 1.19% 513 1.56% 0.02%







TOTAL





27.49%


In the above table, PA_DH is the number of PAs each hitter had at DH, teamPA is the total number of PAs the team had at DH that season, and playerPA is the total number of PAs the hitter had that season at all positions. perc is PA_DH/teamPA, perc2 is PA_DH/playerPA, and term is perc*perc2.

In this case, the Angels used mostly fielders rotated into the DH slot. Only Scott Spiezio had more than half his PAs come at DH, and he took fewer than 30% of the team's PAs at DH. The average DH for the Angels in 2000 took only 27% of his PAs at DH and over 70% of his PAs as a fielder (or PH).

Which teams used their DHs most exclusively as non-fielders, and which teams' DHs spent the most time in the field? Here are the 5 highest and lowest percentages of the decade, along with the AL average:

year team perc
2000 SEA 92.94%
2007 TOR 92.76%
2000 TOR 91.04%
2003 SEA 90.15%
2005 BOS 88.94%
...
lgAVG
55.73%
2005 ANA 26.53%
2005 TOR 26.09%
2000 KCA 25.00%
2001 KCA 23.37%
2002 KCA 19.68%


A few teams (Seattle and Toronto twice each) had their DH, on average, take over 90% of his PAs at DH. A few teams (all the Royals) had their DH average only a quarter of his PAs at DH. The average AL team this decade had its DH take roughly 56% of his PAs at DH and 44% of his PAs at other positions. While there were a handful of players who were almost exclusively DHs for their teams, most DHs this decade were splitting time pretty evenly between the field and DH. The DH who only hits and doesn't play the field, at least on a regular basis, is pretty uncommon in today's game.

I guess that's good news for those worried about the DH ruining the game by creating a class of one-dimensional players. That's just not what's happening, by and large, and when there are players who almost exclusively DH, they tend to be good enough hitters that teams have made room in the field for their type in the past anyway. Teams continue to make room for equally bad fielders if they can hit. These players may be one-dimensional, but so were countless players before them who hit a ton and couldn't field and still played. That's not an artifact of the DH.


The following tables have the information from the above tables for all AL teams this decade:

all players by team
team totals


Continue Reading...

Pitching to Contact and FIP

Last month, I wrote a two-part primer on defense independent pitching stats, with a heavy focus on FIP. Lately, FIP is a somewhat hot topic following the Cy Young voting, where the pitchers who dominated the defense-independent metrics won out in both leagues, and one of the two specifically cited FIP as a focus of his pitching strategy. One of the issues that is oft-discussed is the concept that pitchers whose ERAs far outstrip their FIPs because of good defensive support are simply pitching to their defense, and should be rewarded for that. After all, why push so hard for strikeouts when you know that your defense is going to convert a lot of outs when you do put the ball in play? Should a pitcher (a purely hypothetical pitcher, of course) who only strikes out 6.7 per 9 have that held against him when compared to a pitcher who strikes out 9.8 per 9 or one who strikes out 10.4 per 9 when the 6.7 per 9 pitcher is pitching to a defense that better handles balls in play?

The first thing to note is that FIP is not dependent on strikeouts in the way most people think. As we learned in last month's primer, FIP is dependent on 4 different values, one of which is strikeouts, and another of which is balls in play. Both are important to FIP, as are BB and HR. FIP is a balance of all 4 areas for a pitcher, with each area weighted according to their observed value in actual MLB games. They do not favour strikeouts as a style, and in fact, do not favour them at all when they come with high walk and HR totals. It is fully possible to have a poor FIP as a high-strikeout pitcher.

FIP, in general, does not favour strikeouts in any way other than how their observed value relates to the other 3 areas included in the formula. Now that that is out of the way, FIP does overvalue strikeouts if a team's defense is very good, but only slightly, and, conversely, it actually undervalues strikeouts if a team has poor defense. To see how this works, let's revisit the table of values for the 4 types of events covered by FIP:

HR
1.40
BB
0.30
out
-0.27
BIP
-0.04


Remember that the coefficients used by FIP are derived by subtracting the value of a BIP from the value of each other event and multiplying by 9. FIP is dependent on the value of these 4 types of events being close to those in the above table. When team defense differs significantly from average, it has no effect on the value of HR, BB, or SO (actually, that's not true, because those values are all sensitive to the run-environment, so when a team allows more or fewer runs, those values will change slightly, but for our purposes, we'll ignore that). Defense does, however, have an obvious effect on the value of a ball in play. With a good defense, a BIP will be worth less (to the offense) than -.04. With a bad defense, a BIP will be worth more. So the coefficients in FIP are in fact slightly off when a defense is not close to average, because FIP is tuned to fit the average value of a ball in play.

Here enters the beauty of FIP. Because of how the coefficients are derived, the formula can be easily tuned to fit any level of team defense. FIP, it turns out, is not wrong for pitchers in front of good or bad defenses, it just has to be tuned differently. All we have to do is recalculate the value of a ball in play and re-derive the coefficients.

Imagine a pitcher had as good a defense as he could possibly have. Say, for example, he had the 2009 Mariners' defense. This defense was worth 85 runs above average, or roughly .02 runs per BIP. An average BIP is worth -.04 runs, so an average BIP pitching in front of this defense is worth -.06 runs (remember that lower is better for the defense). We recalculate our coefficients and get:

13.13*HR + 3.23*BB - 1.90 SO*

Strikeouts did indeed lose value, and walks and home runs both became more costly. How much difference does this make, though? Let's revisit our hypothetical 6.7 strikeout pitcher. Let's say he also walks 2.1 per 9 (after adding in HBP and subtracting out IBB) and allows .33 HR per 9.

Using the traditionally derived coefficients, he'll have an FIP of about 2.84. Keeping in mind that we also need to calculate a new league constant to scale FIP to ERA since we changed the coefficients, we find that he would have an FIP of about 2.79 using our defense-specific coefficients. Traditional FIP underestimated him by about .05 ER/9, or about 1.2 runs per 200 innings, compared to defense-sensitive FIP.

Since we are already adjusting for defense in our calculations, we can go a step further in incorporating defensive context into our valuation. The league average ERA this year was 4.32, but we know that that won't be the case given a +85 run defense. A league average pitcher, given +85 defense, will have about a 3.84 ERA. Using that figure, we can recalculate our constant for FIP and calculate a new number that is an estimate of actual ERA, not of ERA minus defensive support. This means that we would expect our 2.79 FIP pitcher to have an actual ERA of about 2.31. This is, of course, not as valuable as an ERA of 2.31 in front of an average defense, so we have to account for that as well. If average is 3.87, then a replacement level starter (using a .380 winning percentage as replacement level) will have an ERA right at 5.

A 2.31 ERA is good for a .749 winning percentage against a league average 4.32 ERA. Our replacement level pitcher, who is normally .380, is not .380 against the league with his +85 defense, however. He is .437. That means that our pitcher is worth about 6.9 WAR per 200 innings. Using his traditional FIP, we would give him a .679 winning percentage over a .380 replacement level, which comes out to 6.6 WAR per 200 innings. Our hypothetical** pitcher actually gained .3 wins once we considered the nuances of pitching to contact in front of a stellar defense. That's actually quite a bit. It's worth over a million dollars to the pitcher on the open market.

At the beginning of this article, I said that the traditional coefficients were only slightly off with an extreme defense. Here, we find that they can be off by as much as .3 wins if we take a Cy Young caliber contact pitcher and put him in front of the best defense on the planet. Can we really write off .3 wins as slight enough to use traditional FIP as a stand-in for defense-sensitive FIP if we want to capture the value of pitchers separate from, but in the context of, their defense?

If the value were ever really that high, I'd say no. It isn't, though, at least not if what we want to measure is how defense affects a pitcher's approach. Everything we plugged into the calculations above were purely after-the-fact measurements, but the only thing a pitcher can leverage in adjusting his approach are expected values. That means that if our +85 defense only projects to be worth 60 runs a year going forward (I'm making that number up for illustration purposes), then the pitcher can only leverage 70% of those 85 runs by adjusting his approach. Even though the defense ended up saving 85 runs, there is no way the pitcher could have leveraged the 25 they saved over their projection without knowing they would outperform the projection in advance (which, by the loose definition of a projection, you can't). He also can't leverage his full home run rate, which in this case is probably at least to some extent anomalous. If he knew ahead of time that he would only allow .33 HR per 9 giving up that much contact, he could leverage contact quite a bit (the .3 wins arrived at above being "quite a bit" in this case), but only knowing his projected home run rate, he can only leverage up to his projection, not beyond.

For these reasons, our hypothetical pitcher is never going to actually be undervalued by .3 wins per 200 innings using traditional FIP just because he pitches to contact, even if we give him by far the best defense in baseball.

This also means that just because a pitcher's ERA is better than his FIP, even if that difference is because of defensive support, it does not mean the pitcher was utilizing a better defense if his team defense was not far above average overall. Let's create a new hypothetical pitcher who has the same FIP as the one above and an ERA in the 2.2s, but whose team defense we measure to be a bit below average. In this case, we don't know that the difference between the pitcher's FIP and ERA is because the pitcher got better than average defensive support, but he might have. Let's assume that he did. Does he get credit for pitching to contact and using that good defensive support? In this case, no, because whatever his defensive support ends up being, we expect it to be below average, so deciding to pitch to contact is a bad choice. In terms of how this pitcher can leverage his defense, strikeouts are actually slightly underrated (slightly enough that we can basically ignore it, but they are underrated) even though the pitcher's ERA over-credits him for good defensive support, because he has no way to leverage that defensive support based on decisions about pitching approach made before the fact. Our pitcher is now far overrated by ERA because of defensive support and not at all underrated by FIP because of an ability to leverage good defense.

We return to our initial question about using FIP for pitchers who receive good defensive support: should a pitcher be punished for pitching to contact in order to leverage a good defense? The answer is somewhat complicated. No, a pitcher should not be punished for leveraging good defense if he is doing it properly, but FIP can actually be tweaked to account for that pretty easily because the methodology for deriving the coefficients lends itself perfectly to adjusting the formula for differing values of balls in play. Traditional FIP and defense-sensitive FIP track very closely together, though, to the point that the difference is mostly negligible and not worth not using FIP in almost any conceivable case. Even in cases where defense-sensitive FIP is a bit off from FIP, FIP will still capture the context of pitching to defense, while still separating the actual value of the defense, better than ERA (note how much closer defense-sensitive FIP,
after we recalculated the coefficients to take defensive context into account, was to the traditional measure than it was to the predicted ERA once we also added in the value of the defense). Furthermore, you can't tell if a pitcher even had the opportunity to properly leverage his defensive support just by comparing FIP to ERA, even if you assume that the difference is due to defensive support. A pitcher with a 2.8 FIP and a 2.2 ERA, even assuming that his ERA includes a lot of defensive support, did not necessarily ever have the opportunity to leverage that support by choosing to pitch to contact. In fact, the degree to which a pitcher can leverage his defense has nothing to do with his defensive support itself, but with the projected value of his defensive support before the fact and with his expected rates of HR, BB, and SO given a certain pitching strategy.


*NOTE: You won't be able to exactly replicate any of these values from the numbers given here because of rounding discrepancies, so if you are trying to work through the math on your own and find some differences, that is probably why.

**NOTE: This pitcher truly is hypothetical. Don't believe me? What real life pitcher threw in a 4.32 ERA league in 2009? That's mostly why the value doesn't match up at all with the pitcher you looked up, by the way.
Continue Reading...

Did Wainwright and Carpenter Split the Vote? and other Cy Young Stories

Congrats to Tim Lincecum, who was awarded his second Cy Young in as many seasons yesterday. His selection is intriguing for a number of reasons, in particular the apparent shift it signals in the BBWAA. For the first time ever in a non-shortened season, the voters have dipped their minimum win threshold to as low as 15 to reward a starter for simply out-pitching everyone else in the league. And to top it off, this came a day after resoundingly favouring Zack Greinke's superiority in spite of his only 16 wins.

This shift comes in part, I'd imagine, because of the shifting electorate. Keith Law and Will Carroll, both internet based writers, voted for the first time. Both turned in the only ballots that didn't have the same three names as everyone else. However, Carroll still voted Wainwright #1, and that's just 2 guys. There's more to it. Maybe there are other new voters in the newspaper ranks as well, but I suspect some of the older guard is changing as well. There have always been writers who have been more open to more detailed and less haphazard analysis, but with the advent of FanGraphs and other stat sites, good numbers are easier to come by than ever before, all gathered in one hugely popular place that most voters have probably heard of and even visited. Now those voters are better armed than ever before. There are also those who look at ever-shrinking win totals and the endangerment of the 20-game winner and ask if starting pitchers still have the same extent to influence wins and losses in this age of extensive bullpens and high-powered, deficit-erasing offenses. Those voters have begun to question if 15-7 is really the best measure any longer.

I'd love to write about all that, because it's really the most interesting thing about this vote to me. I'd love to, but, unfortunately, I won't. Two paragraphs are all I can spare right now because, as interesting as that is, everyone I encounter is so damn fixated on painting controversy all over the award.

Maybe I hang around too many Cardinal fans (being one myself). Maybe I shouldn't go near message boards this time of year. Whatever it is, I have heard and read way too much from people who are flat out angry (example of actual message board topic: "Simple Poll: If you could punch Keith Law in the face, would you?" - I'd link it, but it seems to have been deleted). On the one hand, many people are now concluding, as many in the analytical world did long ago, that these awards are meaningless. On the other, rather than just ignoring them, they are yelling and sending hate mail to voters and publishing for others outlets where they can do the same. So how should you, as a reasonable person, address these concerns?

For a lot of people, you shouldn't even bother. Just let them be angry and don't worry about throwing yourself in their path. Some people really do want explanations though, and don't understand how it worked out this way, or really think some unfortunate circumstances robbed their guy of a rightful award. The first, and probably most basic, thing I want to address is the idea that the two St. Louis candidates split the vote, essentially giving Lincecum the award by default. This idea was floated around before the winner was announced and has been repeated a lot since (it was even discussed on MLB.com's live award show-which, by the way, had to be the most anti-clamactic announcement I've ever heard: long corny segment with Captain Morgan mascot, cut to anchor desk, small, forced laugh at Captain Morgan mascot, and, out of nowhere, with no build-up, simply stated as if it were casual discourse, "Tim Lincecum wins the Cy Young." Fine).

This concept has no basis in reality, however. I've yet to see any evidence that voters vote regionally as the theorists claim, but even if they did, this would be impossible. Imagine, for a moment, that voting were entirely provincial, and voters split their allegiance within their own division and voted purely for their guy. What would happen? The NL West has 10 votes, so they all go to Lincecum. Let's assume worst case scenario for the St. Louis two and split the 12 NL Central votes evenly between them. Of course, the 12 Central voters also put the one they didn't vote for 2nd. Assuming the same three are all on each ballot, and the West voters are equally split between Carp and Wainwright for 2nd and 3rd, that leaves us with the following scenario:


First Second Third Total
Lincecum 10 0 12 62
Carpenter 6 11 5 68
Wainwright 6 11 5 68


This is the most favourable a split of purely regional voting can go to Lincecum, where the other two divide everything as evenly as possible. Assuming the East splits its votes between all 3, there is no way for Lincecum to win a vote where all three are considered equal with the deciding factor being regional splitting of votes, because the Central division has more voters. Whichever between Carp and Wainwright can draw the extra first or second place vote from the East would win.

The three-deep ballot makes such a splitting of votes impossible. If voters are really split between Wainwright and Carpenter because of their regional proximity, then those voters simply put the other guy second and push Lincecum to third, and he makes up no ground by the other two splitting the first place votes.

Of course, we also know that voters don't vote purely regionally. Without comprehensive data on who voted what, it's hard to say exactly how much regionalism appears to affect voting, but I doubt it's at all a significant factor, and anyway, even if it were, it wouldn't matter. The current ballot makes the issue of splitting votes moot.

There's also the issue of fans who feel that the ballots should not go 3 deep, and the most first place votes should win. The basic idea is actually to prevent splitting the vote (so don't let anyone try to use both arguments on you; if they are concerned about splitting of the vote, then extending the ballot past 1 name is essential to deal with that concern). This year, there was likely some splitting of the first place votes along many lines-not regional, but based on evaluation methods. Lincecum and Carpenter split the votes of those who looked past wins. Wainwright and Lincecum split the votes of those who considered IP an important factor. Etc. Lincecum and Carpenter were superior to Wainwright in a lot of important stats, but similar to each other rate-wise, so they split a lot of votes between them. There were more people who voted that felt that Wainwright was the best pitcher, but there were also more people who voted who felt that Lincecum was better than Wainwright than felt Wainwright was better than Lincecum. If you asked the voters, instead of picking one, to choose between those two, Lincecum would win, so is going only by first place votes a better representation of the sentiment of the voters? Should the winner be the guy who got 1 additional first place vote when it's also true that an additional 6 voters felt that guy was the worst of the top 3 candidates?

In practice, it is rare that the most first place votes don't win. The only time it happens is when there are a group of candidates who a lot of voters feel are better than the one who gets the most first place votes, but who split the first place votes among themselves. The last time it happened, Ivan Rodriguez won the 1999 AL MVP while Pedro Martinez carried more first place votes. The voters who considered pitchers more heavily had only one candidate, so Martinez carried all the votes of a minority. The majority of voters considered Pedro's season short of several non-pitchers, but their first place votes were split between multiple candidates. The only other time it happened with the Cy Young, Tom Glavine beat out Trevor Hoffman in 1998 despite Hoffman having 2 more first place votes. Hoffman carried every vote from the voters who weighed relief efforts relatively more than the other voters, whereas those who felt Hoffman's contributions in limited innings fell short of a plethora of other candidates split their first place votes between several starters. As a whole, the voters felt Glavine was better than Hoffman, just as had been the case with Pudge in 1999.

There is also the issue of one-name only voting discouraging voters from putting down the name they truly feel is deserving. Would all of the voters who felt Kevin Brown was the best pitcher in the NL in 1998 have written his name down if they also thought Glavine was easily better than Hoffman, but that Hoffman would win the award over both Glavine and Brown if they didn't put their support into the guy they thought could win the Award? Having only one name per ballot asks voters to choose between voting based on who they think will give the award to the most deserving candidate and voting for who they think is the best pitcher. This also means that it's impossible to know for sure who would have had the most first place votes under a one-name-ballot system. Would any voters who went Carpenter-Lincecum have looked at Carp's innings, thought, no way he gets more votes than Lincecum, and then put down Lincecum's name as the most deserving candidate between him and Wainwright? No way to know, but it's feasible, so it's impossible to consider Wainwright a guaranteed winner under that system anyway. You can't collect the votes under one set of rules and then change the rules to decide what the votes really said. Of course, this is seldom if ever raised as an issue exept when done so by someone complaining that it robbed his/her favoured candidate. People seem to be searching for arguments against the result by attacking the method rather than arguing against the method itself.

What really seems to set people off is the matter of Vazquez and Haren being on ballots at all. People immediately began deploring their least favourite voters, thinking it could only have been that idiot who did this. THT noted that John Heyman blasted "dumb sportswriters" on his Twitter for their omission of Carpenter, only to later back off and apologize, as commenter Zach Sanders wrote, once he found out that one of them was Will Carroll. Even other BBWAA voters, even ones who consider themselves statistically versed (including one who cited day/night splits as a reason for voting Carpenter over Lincecum), have spoken out against their colleagues here and implied that they don't belong in the process.

I don't really know what to say to someone so adamantly opposed to differing opinions, except to show them how it is reasonable to think Dan Haren and Javier Vazquez were in a class with the other top pitchers in the league (besides Lincecum, anyway). I wrote privately my picks with some explanation nearly a month ago, and to be honest, I'm shocked at the uproar against anyone for putting one of those two in the top 3. I rated them as much closer to Wainwright/Carpenter than either of those guys were to Lincecum, so outrage for 2 votes for Haren or Vazquez over Carpenter or Wainwright and nothing but total apathy toward Wainwright getting 12 first place votes baffles me. It really does. I don't have the most faith in fans or writers when it comes to analyzing players, but I just don't understand the venom. Here's what I wrote last month:


NL Cy Young:

1. Tim Lincecum
2. Chris Carpenter
3. Javier Vazquez

Despite what all the pundits are saying about this being as tight as a race could possibly be (and maybe they're right as far as the actual voting goes, I don't know), the NL race also has a clear winner to me, although not as clear as in the AL, and then a pack bunched behind him. Carpenter may have made a run at Lincecum if he hadn't gotten hurt, but with over a 30 IP difference between them, Lincecum pulls well ahead.

I'm sure you want to know how I can put Vazquez ahead of Wainwright. There were 4 pitchers I considered behind Lincecum: Carpenter, Wainwright, Vazquez, and Haren. For all of them (and Lincecum), I looked at their pitching from several perspectives of run prevention, including both traditional perspectives (ERA and RA) and defense-independent perspectives (FIP from FanGraphs, xFIP from THT, and tRA and tRA* from StatCorner). For each perspective, I considered pure rate production (an expected winning percentage unweighted by IP) as well as total production (W% converted to wins based on IP, both above average and above replacement). ERA, RA, and FIP were also park-adjusted using Baseball-Reference's multi-year pitcher park factor for each players' home park (the others already have adjustments that neutralize park to a large extent). I also looked at WPA and WPA/LI with the wins above average figures.

Overall, the things that stuck out were:

-Lincecum was clearly ahead of the pack whether I looked at rate production, wins above average, or wins above replacement.

-Carpenter's rate production was closest to Lincecum's. He was dead even rate-wise with Vazquez in just the defense-independent stats, but he pulled away in the traditional stats. However, Carpenter's win value production, especially above replacement (IP have more weight above replacement than above average) fell back to the pack. He was about even with Wainwright in WAR production just behind Vazquez and Haren, and either slightly ahead of or slightly behind Vazquez for second in WAA depending on how I averaged the different perspectives.

-Haren picked up the most ground from the park adjustment. Pitching in Chase Field probably hurt him quite a bit in most people's eyes because his raw stats aren't as good as he actually pitched.

-Wainwright, and this is the kicker, was consistently at the back of the pack no matter what I looked at. Rate wise, WAR, WAA (both with and without the WPA stats) all had his production rated at the bottom of the group. He was really good this year, but if you just look at how well he pitched and not just at his traditional numbers, he wasn't quite as good as the best pitchers in the league. When everything I was using to evaluate each pitcher was pointing to Vazquez having pitched better than Wainwright, I just couldn't write him in ahead of Vazquez.


And that's all I can really say to people who question selections of Vazquez or Haren as damning one to unworthy idiocy. When I look at each pitcher's production, I just don't see it that way. It's certainly close enough that it shouldn't incite this kind of reaction. Also, looking back on my work now (not what I wrote above, but where I worked it out), I may have given too much consideration to rate production. I'm still undecided on how to handle "best" in terms of rate (regressed to some extent for those with lower playing time) vs. pure value. Point being, I can easily see Vazquez ahead of Carpenter for second. In fact, if you are most concerned with WAR, I can see any of them ahead of Carpenter. But uproar over Vazquez or Haren over Carpenter and apathy toward Wainwright ahead of everybody? I can't see it. If you are willing to concede that methods can legitimately differ enough to produce any order of Lincecum, Carpenter, and Wainwright, how can a method that puts Vazquez or Haren in the top three be so far off as to be grounds for expulsion from the process?

Continue Reading...

A Turning Point

Game 3, series tied at one game apiece.

Bottom of the second. Bases loaded. Shane Victorino at the plate. Andy Pettitte pitching, and on the ropes.

This single at-bat, which began with Pettitte’s 51st pitch of the game, may have been the moment that turned the World Series in favor of the Yankees and lost it for the Phillies.

Let’s review this a bit. By game three, Joe Girardi had already announced his intention to go the rest of the Series pitching his key starters on three days rest. It was a gamble that the pundits were already not only debating, but berating Girardi about long before the Series would play itself out. Some were already arguing that if this gambit failed, Girardi would lose his job in the off-season.

And so, as this game opened every interested eye was focused on the performance of the oldest of the three-man tandem comprised of C. C. Sabbathia, A. J. Burnett, and Andy (the only one with a first name) Pettitte and singled out by their manager to perform under these circumstances.

After giving up a single on his first pitch to Jimmy Rollins, Pettitte would get Victorino to pop out weekly to third, then strike out Chase Utley and Ryan Howard. On the surface, this would look like a good start to the game for Pettitte. But it took him 24 pitches to get through his first trip to the mound. This trend could not continue if he were to succeed not just in this game, but for a manager who had already told the world that even this aging veteran would pitch next on short rest.

We come now to the second inning. The Phillies are at home, and are looking to press this narrow advantage to its limits. Having split in New York, this is their first Series game in front of their own fans. This inning would give them something to cheer about, and lift their hopes for brighter prospects as the Series continued.

Jason Werth led off the inning with a homerun. Not good for Pettitte, but not the whole story of this at-bat, either. First pitch ball. Second pitch ball. Third pitch ball. 3 and 0 to one of the legitimate power threats in this rich lineup is not a recipe for success. Taking 3 and 0, Jason times the next pitch and fouls it off. On the 6th pitch of the at bat, Werth would send the ball out over the wall for the first run of the game. Pitch count now over 30 with no one out in the second.

Pettitte would battle back to strike out Ibanez on four pitches, showing how this veteran, on the road in a big game, was not going to be easily rattled.

Pedro Feliz would come up next, and would double on a hard hit ball deep into the outfield reaches of the park.

Carolos Ruiz would come up next and walk on 5 pitches.

Had the Phillies won the Series, this next at-bat could have been cited as the turning point. Cole Hamels comes up with a one run lead, runners on first and second, and one out. He is called on to bunt, sacrificing both runners into scoring position with two out, but with the star short-stop on deck who had already singled in the game.

Hamels lays down a decent bunt, about 30 feet out in front of the plate and to the third base side of the pitcher’s mound. Charging in to field the bunt was Andy Pettitte. Charging out to field the bunt came Jorge Posada. In a classic case of “I got it, you take it,” both pulled away from the bunt at the last second assuming the other would field it. The ball rolled untouched while all three runners advanced without a play.

Bases loaded. Jimmy Rollins strolls to the plate.

Pitch one: ball.

Pitch two: ball.

Pitch three: ball.

Pitch four: taken 3 and 0 for a called strike.

Pitch 5 - the 50th pitch of the game, with one out in the second inning: ball four.

The second run scores while the defense twiddles their thumbs, unable to do a damn thing about it.

Let’s review the second inning: homer, out, double, walk, bunt single, walk with the bases loaded.

Here comes Victorino - in what I believe is the turning point of the Series.

The first two pitches are the key to this at-bat, and in some ways a testimony to the grit and resolve of this sly veteran (who else, after losing control and walking a batter with the bases loaded, would start the next batter with two sliders low and away?).

Both pitches were sliders in the dirt - low and outside.

Neither pitch was even close to the strike zone.

In what is still for me an inexplicable mystery, the count after those two pitches? 0-2.

Yep, Shane Victorino swung meekly and rather sickly at two pitches that not only never threatened the strike zone, but which offered him no hope of even making contact. With a weakening Pettitte fumbling to maintain control of his pitches and the Yankees hopes in game 3, Victorino took him out of the dog house. With two strikes, he would hit a weak fly ball to left field, deep enough to score another run to give the Phils a 3-0 lead, but effectively end what could have been a back-breaking inning for them. Utley would follow with a strike out to completely end the rally.

Pettitte would come out again in the third, already having thrown 57 pitches and down 3-0, and find his stuff again. His Yankee teammates would score two runs in the 4th to make it a ball-game again, then three in the 5th to take a lead they would never relinquish.

Pettitte would go six, throwing 104 pitches (only 59 for strikes) and picking up the win.

There is no way to predict what would or could have happened if... . But, Shane Victorino swung mysteriously at the first two pitches thrown by a man who was on the ropes, who had walked the previous man with the bases loaded, and then ended up trading an out for a run and killing a rally. It is not to to see how this single at-bat ultimately changed the outlook for both the Phillies and the Yankees, and could have gone very differently but for two inexplicable swings and misses. I’m not saying the Phils would have won the Series, or even for that matter the game. But for me, this was the turning point in the Series - the point after which, and perhaps even because of which, the Yankees withdrew triumphant.
Continue Reading...

Converting OBP/SLG to wOBA

Statistical analysis has crept into the mainstream consciousness. It is no longer difficult to find fans who look at slash lines before Triple Crown stats or who can properly choose (after looking it up, of course) between Brian Giles' and Carlos Lee's careers at the plate (just try to do it without considering walks). Unfortunately, the face of this new wave of stats has become OPS. As frustrating as it can be that the past few decades of work have been essentially boiled down to the equivalent of rummaging through the pantry, picking out a mishmash of the best ingredients, and throwing them into a casserole rather than looking up a real recipe, it is a step forward. It may be tiring having to explain that, even without considering park effects, Kenny Lofton still hit better than Vinny Castilla, OPS be damned, but it's infinitely better than trying to have a discussion with someone who insists that Gary Gaetti and his 360 HR and 1341 RBI are Lofton's superior. Plus, it's nice to be able to go to a ball game and look up to the scoreboard when an unfamiliar hitter comes up and see something that will give you a better idea of how well he's hit than AVG/HR/RBI.

Luckily for those of us who don't care to stop at OPS, linear weights, et al have become widely available. Baseball-Reference has Jim Palmer's Batting Runs. BaseballProspectus has EqA. FanGraphs and StatCorner have wOBA. All these are designed to model runs, not to stab in the dark at them. These are the stats that looked up the recipe first. But can we still make OPS work? The Book says, "for you OPS lovers, you will note that (OBPx2+SLG)/3 is a close approximation of wOBA." The problem with this is that if you run that calculation with the idea of wOBA for scale, you will find a lot of hitters to be much better than you thought, as average for that calculation is a good .030-.035 points higher than for wOBA. So if all you have are a player's OBP and SLG, and you want to know how good a hitter he is, 2OPS/3 will be on an unfamiliar scale. You know how good a .350 wOBA is, but not how good a .350 2OPS/3 is.

The upside of 2OPS/3 is that it retains the simplicity of OPS. It's easy to calculate, and, because it is still arbitrary, it looks clean and simple. So if you want to use it, know that average (for this decade) is around .365, not .330. That's decent enough, I guess. We can still do better, though. Of course, to be honest, should we? The previous paragraph listed several alternatives that are readily available and, frankly, already do everything we can hope to with OPS, but better. So why bother? There is a reason, after all, that The Book wraps up its lone paragraph on OPS rather succinctly: "This is the last time we will talk about OPS."

But say for some reason all you have are OBP and SLG. Maybe you're at a game and the scoreboard flashes a player's slash line and you want to know how good that is beyond what OPS can tell you (and you brought a calculator or pen and paper, or at least picked up some spare napkins at the concession stand to jot down your calculations on). Maybe you're having a real-life, offline discussion where the stats come up. Maybe you have a projection that only gives you the traditional slash line, or you are looking up some sort of split that isn't presented on the sites that give you linear weights. Whatever the reason, you find yourself trying to decide if a .400/.420 line is better than, worse than, or about the same as a .370/.460 line, and just how much difference there is. How do you do it?

One way is to look at all players who have a specific breakdown of OBP/SLG and see what those players' corresponding wOBA was. You can look at all players who had roughly a .330 OBP and .420 SLG and see that they had, on average, a .326 wOBA. And then you can do the same for every semi-common combination of OBP and SLG. Sound like a plan?

Now take every player-season from 1993-2008 and round the OBP and SLG to the nearest hundredth, and group all player-seasons with the same truncated OBP and SLG together. Limit combinations to only those with a combined 2500 PAs. Then, you can look at the average wOBAs for all slugging percentages with a set OBP to see how much each point of SLG is worth and vice versa to determine the value of a point of OBP. You end up with something like this:

wOBA by SLG, OBP=.330

trOBP
trSLG
actOBP
actSLG
wOBA
PA
.3300 .3200 .3289 .3192 .2930 2755
.3300 .3300 .3296 .3295 .3007 6635
.3300 .3400 .3299 .3398 .3019 8002
.3300 .3500 .3296 .3499 .3061 15144
.3300 .3600 .3297 .3599 .3077 13466
.3300 .3700 .3297 .3701 .3124 11015
.3300 .3800 .3299 .3800 .3132 16022
.3300 .3900 .3302 .3908 .3181 15032
.3300 .4000 .3300 .3999 .3208 23948
.3300 .4100 .3309 .4101 .3243 19832
.3300 .4200 .3290 .4195 .3256 17874
.3300 .4300 .3301 .4303 .3296 17440
.3300 .4400 .3302 .4393 .3321 16495
.3300 .4500 .3302 .4506 .3364 14757
.3300 .4600 .3298 .4601 .3392 17508
.3300 .4700 .3293 .4701 .3416 10955
.3300 .4800 .3303 .4794 .3476 13828
.3300 .4900 .3306 .4899 .3501 7501
.3300 .5000 .3308 .4987 .3528 6732
.3300 .5100 .3310 .5092 .3571 5783
.3300 .5200 .3309 .5207 .3620 4285
.3300 .5400 .3307 .5392 .3637 3024
.3300 .5500 .3332 .5494 .3751 2722

And so on for every other truncated OBP, and then repeat grouping by truncated SLG instead of OBP. From here, we can look at how much wOBA changed for each .010 points of SLG when we hold OBP constant and see that .010 points of SLG is worth about .003 points wOBA. Note from the following graph that the relationship is roughly linear:



We can also do the same to see that .010 points of OBP is worth .005-.006 points wOBA (the graph for OBP vs. wOBA when SLG is held constant looks similar, just with a steeper slope). This is pretty close to the 2:1 rule of thumb for OBP:SLG (it's actually around 1.8 by this method). This value is relatively constant whether OBP and SLG are high, low, or average, as illustrated by the following graph:


Again, the graph is similar whether you hold SLG or OBP constant. There is a slight downward trend as SLG and OBP rise, but nothing major.

That is the first result to note: .010 points of SLG are worth roughly .003 points wOBA, while .010 points of OBP are worth .005-.006 points wOBA. You can use this rule of thumb to compare two players by taking the differences between their OBPs and SLGs.

What if you want to actually replicate a wOBA figure, though? This is a bit messier. Really, this isn't worth it unless you just don't have access to wOBA itself for whatever reason. But say you need to do it. We want to complete the formula:

wOBA = A*OBP + B*SLG + C

We already know A and B to be .56 and .31, but we don't yet know C. So we go back to our original table and calculate .56*OBP + .31*SLG for each combination of OBP and SLG, and then subtract that from the wOBA for each combination:

C = wOBA - (A*OBP + B*SLG)

Here, we introduce a problem. C is not really a constant. It changes when OBP and SLG change. Honestly, did we really expect anything related to OPS that wasn't arbitrary to be mathematically simple? Here's what the graph of C looks like for each predicted wOBA:


Lovely. The formula for the line of best fit is printed on the graph. That is our value for C. The x in that equation is really (A*OBP + B*SLG). So our formula for converting OBP and SLG to wOBA is now:

x = .56*OBP + .31*SLG
wOBA = -.53x^2 + 1.35x - .045

This can be combined into one equation with substition if you prefer, but it looks a bit ugly, so we'll just leave it be for now. This is now to the point where anyone who would care to do the calculation would almost be better off just calculating wOBA directly from raw stats, but whatever. Go only as far into the calculations as you need. If you want to go this far, this is how you do it.

Does this formula work? For the most part, yeah. The scale and league average match pretty closely with wOBA, and for most players, it works out to be pretty close. This estimate is within .010 points of the actual wOBA for over 95% of player seasons since 1993. About 3 quarters of player seasons are within .005 points wOBA. Half of them are within .0027 points wOBA. The average absolute difference between predicted and actual wOBA is .0036 points. Not bad, especially considering wOBA counts stolen bases and our estimate doesn't.

Obviously, the players this works most poorly for are those with a large effect from either stolen bases or from intentional walks, as wOBA handles those in a fundamentally different way from OBP and SLG (in that it considers SB/CS at all and that it differentiates IBB from nIBB). For example, the two biggest discrepancies between predicted and actual wOBA were Bonds in 2004 (120 IBB) and Willy Taveras in 2008 (68/75 in SB attempts). Both were over .025 points off.

So there you have it. Can you convert OBP and SLG to a reasonable estimate of wOBA? Yes. Should you? Probably not, unless it's all you have and you really need a reliable way to convert to actual runs. If all you want to do is get an idea of how good someone is at the plate or who is better than whom, there's little point in going all the way through the conversion. But you could do it. So take that, OPS.

Now I just need to convince the scoreboard operators at my local stadium to substitute my definition of OPS for theirs. Then we'll really be in business.
Continue Reading...

Competitive Balance, Reprise

In the wake of the Yankees' latest World Championship (congrats to the Yankees and their fans), the issue of financial disparity has resurfaced with renewed vigour, with the most common refrain being in support of the simplest idea available in the public domain: a salary cap. In light of such, I would like to direct all readers of this blog to the report on competitive balance (the subject of my first post on this blog) that MLB commissioned of the independent Blue Ribbon Panel on Baseball Economics (<-PDF), headed by George Mitchell, at the beginning of this decade. Note that nowhere in the report is any form of a salary cap suggested, and that most of the recommendations have yet to be implemented or even pushed for by MLB. Continue Reading...

Selig on Revenue Sharing (circa 2003)

I was reading through old articles today and found the following gem from our favourite current MLB Commissioner (by default, of course, but it's the type of thing he'd want credit for) from 2003:

You wouldn't have seen Anaheim [2002] or Florida [2003] in a World Series if this were 10 years ago. I'm convinced there will be other manifestations of revenue sharing in the future.
-USA Today interview


The above quote is supposed to be about how well revenue-sharing is working, but in effect it just demonstrates that the Commissioner's Office either thinks fans are stupid or itself is stupid. Possibly both. The article was written in 2003, so that the Marlin's would not have been in a World Series 10 years earlier is unspectacular. Guess what else he could have said? "You wouldn't have seen Arizona in a World Series if this were 10 years ago." That would have only been slightly more disingenuous, just a lot more obvious. No kidding Florida probably wouldn't make the WS in 1993. Nor before. Nor in 1994. None of those have anything to do with revenue sharing, including the detail Selig is attributing to his brainchild.

In fact, the Marlins are a pretty textbook example of what went wrong with revenue sharing. In 1997, they mustered up enough payroll to crack the top third in baseball and won their first World Series. Also in 1997, MLB instituted revenue sharing. The Marlins proceeded to fall immediately to the bottom third of the league in payroll and stayed there continuously through 2003 (including their WS winning 2003 season). They eventually became notorious for pocketing revenue sharing funds rather than investing them into the team and have fielded team payrolls lower than their revenue sharing payments alone.

As for Anaheim, the only two things I can think of are:

1)that Selig assumed most fans didn't know where Anaheim was in proximity to LA (then they went and changed their name and made it obvious how full of shit he was).

or

2)that there was officially no "Anaheim" team ten years earlier at all.

So I guess Selig's goal here is to win by default?
Continue Reading...

Found some old sketches

I found an old sketchbook of mine from about 8 years back (judging by the dates I wrote down in it) the other day, and among the experiments and quick sketches, I found a couple of baseball related portraits. I remember doing them now that I've seen them again. The Ted Williams sketch was practice for a larger drawing of Williams completing his follow-through (an old gift which has long hung in my dad's pool room), and the Musial was one of my first (and still one of my only) goes with coloured pencils. Not a whole lot, but they took me back. Scans in the full post.





Continue Reading...

Evaluating Pitchers with FIP, Part II

This is the second half of a two part article. Read the first half here: Evaluating Pitchers with FIP, Part I

Rather than throw out stats simply because they regress, you should only throw them out if you think what it does and does not regress is wrong (possibly RBI or W-L record for pitchers, for example) or that how it regresses events is wrong (possibly WHIP, for example). FIP regresses sequencing/timing, leverage/situation, and distribution of batted balls (including things like how many ground balls happen to find holes or how many find fielders), all of which generally fall under the term "Luck", as well as factors like park effects (unless you park adjust it, which you can if you want) and quality of opponents. It does not regress defensive support. Your decision on whether or not to consider FIP in evaluating what happened should hinge on whether you feel these decisions offer a useful perspective or not. Maybe you have a problem with all of these decisions. That would make evaluating pitchers very difficult, however.

If you think sequencing or timing should not in any case be regressed, that also means opponent batting lines are out. If you think leverage or situation should not in any case be regressed, then ERA is out. Perhaps you have an issue with regressing anything termed "Luck", which, in the case of FIP, means if a pitcher gives up a line drive in the gap or one right at a fielder, you don't care if it was because of the pitcher's ability or if it was just random chance, you want to evaluate what happened. If you feel that way, however, you should ask whether you also have a problem with the other "Luck" aspects listed in the above paragraph besides batted ball distribution. They also show up in other stats like ERA and opponent batting lines. Is it luck whether a pitcher allows a single with the bases loaded in the bottom of the 9th of a tie game or in a blowout? The former is obviously much more costly measuring by outcome, but opponent batting lines count them the same, and ERA actually counts the latter as worse (because it will often be counted as 2 ER, while the former can only ever count as 1). ERA might not even count the former case at all if there was an error in the inning that would have been a third out. If a shortstop boots a ball, do you say the pitcher had nothing to do with it, or do you not want to consider luck and look only at what happened? Opponent batting lines ignore the actual outcome and pretend the pitcher got the out as if the error never happened. ERA either ignores the event altogether or pretends the pitcher got the out even though he didn't (he won't get credit for the out in the IP part of ERA, but if it becomes the third out of the inning, he is given credit for getting out of the inning so no further runs are earned; in fact, in such a case, ERA can completely throw out even events like walks and home runs as if they never happened).

Maybe you still want to charge batted ball distribution to the hitter but not other aspects of "Luck", which is ok as long as you understand what you are doing and that you are still regressing "Luck" in other cases rather than charging the pitcher regardless. That doesn't mean you throw out FIP. You still have the other benefits of regressing "Luck" that you have chosen to regress in using other stats, and you have the advantage that FIP regresses some of those factors differently and sometimes better. For example, in ignoring distribution and regressing all events to the average value of the event, opponent batting lines assign an arbitrary value to each event. FIP weights each event to match the value of the event to its value in reality. You could also do this with opponent batting lines, but you would have to do it yourself.

You also have the other advantages of FIP, particularly that it does not regress defensive support. If a fielder makes a great play, should we credit that value entirely to the pitcher? The pitcher may have had some effect in creating the out, but did the pitcher really create as much value on a diving play in the hole by the shortstop as he does with a strikeout, or does the shortstop create some of that value? Both outs are worth the same overall (depending on the situation; the strikeout is actually worth slightly more overall, but given that both occur in a situation that gives them the same value), but in one, the pitcher created almost all of the value to the defensive team while on the other, he shares most of the value with the shortstop. Over the course of the season, some of the deviation from the average value of a ball in play will be due to the pitcher pitching better or worse than average and some will be due to the defense playing better or worse than average. Crediting all of the deviation to either the pitcher or the defense (or rather, all to the pitcher or none to pitcher, which is what both FIP and other stats like ERA, opponent batting lines, etc. do; none of them actually measure defensive value) is wrong, which means you probably shouldn't throw out either type of stat, because both tell you something about how well the pitcher pitched. Which one is less, wrong, though? It depends in part on how big your sample is, but probably crediting none of the variation to the pitcher, at least over one season. You are going to lose less by regressing the effect of the pitcher completely to the mean than by regressing the effect of the fielder completely to the mean. Even if you disagree with that, the regression that each does is wrong to some extent, so you shouldn't take one and say it does not measure value because of its regression and take the other and pretend it doesn't also regress.

So in fielding-independent stats, we have a distinct perspective on defensive support that is not provided by traditional stats. We also have a distinct perspective on other issues. Opponent batting lines and FIP both give a sequence-independent perspective, but each perspective takes a different approach in choosing how to group events to regress to the mean value and in how to decide what value to use for those events, as well as how to present that value (opponent batting lines as a pseudo-binomial rate and FIP as a run value rate). ERA and FIP both present a distribution-independent run-value rate that is based on the actual values of events, but each makes different assumptions about what values or factors should or should not be regressed. Some of these assumptions are clearly better in FIP's case (choosing not to regress defensive or bullpen support), some are more grey but still favour FIP (not discarding events that happen after a botched third out), and some simply offer differing perspectives that each have value (choosing to regress the value of events or simply take the outcome, regressing sequencing or not). It is important to consider all of these perspectives in analyzing what happened.

Most people will not consider only one stat in evaluating pitchers because they intuitively understand (even if they aren't aware that this is what's happening) the concept that each stat they look at is regressing factors, usually arbitrarily, and is not measuring all value. Many people also have the idea that some factors should be regressed, or else they wouldn't look at anything related to opponent batting lines (WHIP, AVG/OBP/SLG against, etc) that regress the value of each event to some average. They use a combination of stats to get the full picture: opponent production gives a sequence/timing-independent perspective, ERA gives a leverage/situation- and distribution-independent perspective that does not regress sequence, strikeouts and walks give a fielding-independent perspective that doesn't regress defensive support, wins/losses give a park-independent perspective that does not regress leverage or sequence (though the last one is ignored by a lot of people because it also does a lot of things that it shouldn't, and there are better measures that do the same thing). Each piece gives some part of the picture that is not the full picture. FIP and other stats of its ilk add a fielding-, sequence-, batted-ball-distribution- and leverage-independent perspective, and they give an added dimension to the perspectives considered in other stats. FIP becomes one of many stats to consider to give you a fuller perspective. It is not designed to measure everything or be comprehensive. Sometimes people will want to throw it out because they think it purports to do such and thus label it a failure for not being so. This is a mistake; you would not throw out any stat you do use for such a failure. FIP is just another of those stats to add to the equation. If you don't like how FIP handles some factors, you shouldn't throw out the entire perspective. You just balance its flaws with other perspectives that handle those factors differently, just as you would include FIP to balance the flaws of those perspectives.

At this point, the regression of batted-ball distribution is probably the number one issue taken with FIP. There are a lot of fans who can accept most of the above, and even that distribution of batted ball locations should be regressed, but not batted ball types. At this point, you've probably already accepted fielding-independent measures in general (which is really what all of the above is about; FIP is merely the most familiar of these). This is where you would want to consider a number of different fielding independent measures, such as DIPS, various forms of tRA, and xFIP. Each handles these factors differently, so if you don't like how FIP regresses batted ball types, perhaps you would prefer something like tRA. Just for bonus coverage, though, we can look a bit at whether FIP's handling of BIP is actually wrong.

A common perception is that FIP regresses batted ball type distribution to the average distribution. It doesn't really, though. It just regresses the aggregate value of all balls in play. There is a clear skill in whether a pitcher tends to allow more fly balls or ground balls that manifests even over a single season, so we probably shouldn't regress ground ball or fly ball tendencies. What is the primary difference between ground balls and fly balls, though? Home runs. So this tendency is accounted for in FIP. Once you remove home runs and only look at balls in play, the difference in the value of a ground ball and a fly ball in play is not that great. The distribution of singles and extra base hits will be different for each, but FIP doesn't care about the breakdown of hits, only their aggregate value, which is fairly similar. Ground balls are worth a bit more than fly balls in play, so extreme fly ball pitchers may be slightly undervalued by FIP (though I haven't looked at this to see whether it is true). Line drives are another story. Unlike GB/FB tendencies, line drive tendencies for pitchers tend to regress pretty heavily. Their value is also significantly different from that of ground balls and fly balls. So the question is, should we hold the pitcher responsible for high or low LD rates, or should we regress them? Because of the huge value of a line drive, this decision can make a big difference. Again, I suggest considering both perspectives here to some extent, but if we are already conceding the regression of some factors outside a pitchers control, the tendency of LD rate to regress should be enough that we should at least consider regressing it and leaving only GB/FB tendencies rather than just dismiss FIP for regressing the influence of line drives.

Maybe after all this, you still don't think FIP, et al provide any value in evaluating pitchers. Maybe you understand that FIP isn't just throwing out events and how the regression in FIP is a concept that appears in traditional stats as well, but you just don't feel the perspective FIP adds anything to the picture. If you have considered all of the above and asked yourself what you think is important and what is not important to evaluating pitchers, you are free to choose whatever stats best represent your perspectives. If you are rejecting defense-independent statistics for the more common reasons based on an incomplete understanding of the numbers, however, you should probably reconsider.
Continue Reading...

Evaluating Pitchers with FIP, Part I

Lately, I've found myself discussing the merits of FIP and other defense-independent pitching statistics quite a bit, so I've decided to compile the contents of my various posts on the subject here to address generally some of the more common issues people have taken with them. A lot of people, it seems, are reluctant to consider FIP as a tool to evaluate how well a pitcher pitched or how good a season he had, and feel that its primary use is as a projection tool that does not really evaluate what has already happened very well. There is a conception that FIP throws out events and is therefore unfit for use at evaluating what happened, or that it inappropriately favours strikeouts over other outs.

These ideas are generally based on an incomplete understanding of either what FIP does or of what other stats do. This understanding has to be addressed before a decision can be made on whether or how much to consider FIP, et al in evaluating pitcher performance. FIP is not:

-a projection
-a measure of who struck out the most hitters
-an arbitrary compilation of a few stats that someone was overly enamored with
-a stat that pretends balls in play never happened
-created by an agent (ok, hopefully no one reading this blog needs to be assured of that one)
-a comprehensive stat that tells you everything about how well someone pitched

The first place most people get hung up on is with the handling of strikeouts and balls in play. Some people think FIP only counts strikeouts and throws out all BIP events as if they don't exist. This is not exactly what is happening. FIP appears to throw out BIP events because it presents the values of other events relative to the value of a ball in play. The formula for FIP is:

(13*HR+3*BB-2*K)/IP + C

where C is a constant that shifts FIP so it is centered around the same mean as ERA. Most people notice that balls in play are not included in the formula and assume that they are simply ignored. Where do the weights for the other events come from, though? We can start with the following linear weights values for each event as published at TangoTiger.net:

HR
1.40
3B
1.03
2B
0.75
1B
0.46
BB
0.30
out
-0.27


FIP and other defense-independent measures lump balls in play together and consider their average value rather than take the value of each event based on its outcome. FIP lumps all non-HR balls in play together (though some measures, like tRA, lump them into multiple groups). So what is the average value of a BIP according to the above table? To answer, we need to know how frequently balls in play become singles, doubles, triples, and outs. Then, we multiply the frequencies by the values of each event and sum them for the average value of a ball in play. This comes out to about -.04. So the values of each event in FIP are:


HR
1.40
BB
0.30
out
-0.27
BIP
-0.04


Next, we determine the value of each event relative to the value of a BIP by taking the difference between each value and -.04:

HR
1.44
BB
0.34
out
-0.23
BIP
0

This is the value of each event used in FIP. The formula above puts these weights onto a scale of per 9 innings; in other words, each weight is multiplied by 9, and then divided by IP. That gives you 13, 3, -2, and 0 as the weights per inning. So BIP are still there, just hidden in the formula. Their value is used to determine the weights given to each other event.

So that is what FIP is: a measure that regresses the value of all balls in play completely to the league average and then weights the value of other events relative to that value. As the argument goes, a ground ball to the shortstop is as good as a strikeout...unless the shortstop doesn't field it for an out. FIP gives the pitcher credit for that ground ball, as well as every other ball in play, based on how likely it is to become an out or a hit or whatever, not based on whether a fielder got to it and converted an out or not. This is where many fans who have gotten past the first issue begin to take umbrage. Why should we regress any of the outcomes if we only want to assess what actually happened? Isn't the point of regression to form some sort of projection rather than an assessment?

Not always, no. If the point were to project, the regression and compilation of the stat would be significantly different. In this case, the point of regression is not to project future value, but to determine the value of past events. This is actually much more commonplace than most people realize; most stats regress factors to the mean, and what's more, for the most part they do so in a way that does not reflect value. WHIP regresses the value of all baserunners to the same value, so a HR is worth the same as a walk. It also regresses all sequencing and timing of events completely to the mean, as do opponent AVG, OBP, and SLG (all of which regress the value of events to some arbitrary value, i.e. all times on base or all bases to 1). All of these, along with ERA, regress defensive support completely to the mean, and ERA regresses bullpen support on inherited runners to the mean (meaning that it is assumed that the runs saved or cost for the pitcher by the defense and the bullpen are assumed to be average, and thus the actual outcomes of these effects don't have to be accounted for). ERA, while it doesn't regress sequencing of events, does regress leverage/situation and distribution across games. Just about all stats regress quality of opponents. Stats that aren't park adjusted regress park effects. That doesn't make any of these "projection" stats.

No stat considers all factors un-regressed. The factors FIP chooses to regress and leave un-regressed are designed specifically to model value produced by the pitcher. That is not true for most other stats. So why would you hold it against FIP that it regresses factors when it does so logically and empirically but not against other stats that do so mostly arbitrarily?

You can't throw out FIP or any other defense-independent stat just because it regresses the value of events unless you are going to throw everything else out the window with it. Are the outcomes of all singles the same, or of all doubles, triples, or home runs? Of course not, but when you look at a pitcher's opponent batting line, the values of these events are all regressed to their average value. Is a run in the bottom of the 9th of a tie game worth the same as a run in a blowout? Again, no, but ERA regresses all runs to the same value, or rather, all earned runs to the same value. ERA also regresses all unearned runs to the same value of zero. These things don't bother most fans because they can accept the idea that sequencing or timing of events might not reflect how well a pitcher actually pitched even though they are reflected in the outcome of what happened. This is the same concept as FIP uses: you decide what you want to look at with your stat, and then you decide where you want to stick to strictly measuring the outcomes and where you want to regress the results.


This is the first half of a two part article. Continue reading here: Evaluating Pitchers with FIP, Part II


Continue Reading...

Joe Mauer and Stealing Signs

There has been a minor sensation across the web today in response to a video a fan posted of Joe Mauer supposedly stealing signs and relaying them to the batter by touching either the earhole on his helmet or the front of his face (that's a YouTube video, by the way, so chances are it will be removed by MLB before long; if it's down when you follow the link, someone videotaped his/her TV screen during a 6th inning plate appearance by Jason Kubel where Mauer was on second base with comments inserted in the video explaining why the author is sure Joe Mauer was stealing signs and relaying them to Kubel). Of course, the discussion has mostly centered around whether or not stealing signs is cheating, whether Detroit should bean Mauer, and generally just the ethics of the game. Not to say that's not interesting, but I think there is another issue that is being overlooked. Regardless of what stealing signs means to you, does the video even make a good case that Mauer was in fact stealing signs? In case you can't see it by now, whoever posted the video seems very sure of him-/herself and that the video, along with the explanatory comments, provide clear evidence of sign stealing, even going so far as to point out at one point that sign stealing is usually not so blatant.

There's really nothing in the video that is blatant evidence that that's what is going on, though. Whoever made the video arrives at his/her conclusion based both on evidence that just isn't in the video and on a series of assumptions or logical jumps that make little sense and don't point to the clear conclusions the author of the comments seems to think.

For one, we are expected to see how Mauer is picking up the signs by looking at the signs the catcher puts down. Watch the catcher put down 2 fingers, for a curveball, the video tells us. This is impossible both because we don't know the sequence or indicator the Tigers are using with a runner on second and because the resolution of the video is too poor to see the signs anyway. I assume the person who posted the video could see the signs before recording and uploading it killed the quality, so we just have to take the his/her word for what the signs were. I am guessing that this person is just telling us the first sign in the sequence, since he/she doesn't know the pattern Detroit uses either. The problem is, this is clearly wrong. It's possible that Detroit would just go by the first sign as their pattern, but there is no reason to assume this, and checking what the pitches actually were shows that the person is reading the signs wrong.

The first pitch, that 2-for-a-curveball, was not a curveball. It didn't move like a curveball, the radar reading was fast for a curveball, and low-and-behold, Pitch F/X says there's no way it was a curveball. It was in fact a change-up that moved nothing like Verlander's curve, both by the GameDay algorithm and the pitch's recorded flight path. So we are already off to bad information in the video, as the sign that the person sees the catcher give and that we are assured is what Mauer is seeing to tip pitches to Kubel is wrong (and if 2 is really for a change and not a curve, then they are wrong later because there are 2-for-a-curveball's that really are curves).

This might not seem that important. After all, just because the person who posted the video can't read the signs properly doesn't mean Mauer can't. Mauer relays the signs to Kubel very early in the sequence, though, early enough that the sequence would almost have to just be that the first sign is the real sign. Since I suspect the first sign is probably what the video-maker is relying on to tell us what the catcher put down, and the video-maker's interpretation is wrong, it seems unlikely that Mauer can relay the pitch to Kubel so early in the catcher's sequence even if he does know the pattern.

Even if the real sign is the first one, Mauer gives his sign for one pitch after clearly missing the beginning of the sequence of signs the catcher puts down. He is checking the fielder positioning as the catcher begins giving signs but gives his signal anyway. No matter what the pattern is, if you miss the beginning of the sequence, you can't be sure you got the right sign. Then, there are other pitches where Mauer is looking straight toward the plate the whole time but relays nothing. For one of these, the explanation given is that the catcher either changed the signs or that Verlander just threw what he wanted without a sign. The latter explanation is unlikely, as the catcher gave the signs just like for every other pitch, they didn't do this at any other point in the at bat, and the catcher responded to the curve in the dirt like he knew exactly what pitch it was. The former explanation makes even less sense because it requires that:

-the catcher changed signs without notifying Verlander (possible if they worked out alternate sequences in advance with a signal to switch because they suspected sign stealing was likely, but that makes it less believable that Mauer could keep stealing signs, especially if they noticed he was doing it)
-they suddenly switched back to the signs that Mauer was supposedly stealing for no apparent reason, or Mauer knew for certain the new sequence after seeing it once.

On another pitch, the explanation as to why we don't see Mauer give the sign is that he was out of frame, but that if we watch Kubel's eyes, he looks to Mauer for the sign (another example of evidence that isn't in the video, as the resolution is too poor for us to see this as well). However, Mauer is only out of frame very early in the shot, earlier than the catcher had begun giving signs in any shot where such can be seen. Mauer is in frame for the entire range of time we would expect to see him relay the sign based on the timing of every other pitch he gave a sign in the at bat, and he gives no sign.

During one catcher sequence, Mauer gives Kubel both signs. He touches his earhole and then moves his hand to wipe the front of his face.

One bit of evidence is that Kubel steps out after he gets the sign for a curveball from Mauer, as if his stepping out will make the Tigers change the pitch they want to throw. He does not do this for any other curve signal Mauer gives.

We are told that everybody knows what is going on. Every time the camera cuts to a new person, that person knows what is going on. After Kubel eventually hits a sac fly, it's clear that the Twins knew Mauer was stealing signs because they give Kubel and the runner who scored from third high fives in the dugout for scoring and driving in a run. Michael Cuddyer knew as he stood on deck. Of course, Leyland and Laird both knew as well. But Laird's reaction to knowing someone is stealing signs is to, when he starts putting down signs and Mauer gives no signal, stop and start over (which he does in the video, accompanied by the explanation that he doesn't want to give a sign because he knows Mauer will steal it)? Is he trying to make sure Mauer gets it? The video-maker cites Laird's stopping and restarting on the signals as evidence he knows Mauer is stealing signs. If he even suspected Mauer were stealing signs, he would visit the mound and change the sequence with Verlander, not just keep putting down the same signs and looking frustrated.

Mauer was only the second Twin to reach second in the game, and the first was two batters earlier. Mauer would have had no chance to study the sequence the Tigers were using for that game. Unless Span picked up the pattern in 4 pitches and told Mauer what it was as he passed him at the plate before Mauer's at bat without Laird noticing, Mauer would have had to know that the Tigers were using the same sequence every game to confidently give a sign on the first pitch. This is not a safe assumption for an important late-season game against a division rival.

At one point, we are told that a pitch must be a fastball because the sign was given too quickly to tip. It wasn't. The camera was just on Ron Gardenhire for a few seconds as the signs were given. If the sign were abnormally quick, that wouldn't be a sign that a fastball was coming. It would be a sign that the catcher is just telling the pitcher to use the same sign he gave before calling time. For what it's worth, this pitch was probably a change-up, not a fastball. It moved like a change and was well slower than Verlander's fastball. It may have been a slow two-seamer, though.

The video is full of claims pointing out why what we are seeing has to mean Mauer is stealing signs. For the most part, this evidence is faulty. It is not nearly as clear-cut as the author of the comments purports, and much of it is just jumping to conclusions without any logical justification. The video fails to explain why, if Mauer knows the signs, he fails to relay them multiple times. It fails to explain why, if Detroit knew exactly what Mauer was doing (part of the evidence relies on the assumption that Laird's actions must be explained as being his reaction to knowing Mauer is stealing signs), they didn't meet on the mound and change the signs. There is no attempt to reconcile the unlikelihood that Mauer could know the sequence the Tigers were using so quickly, nor how could relay the sign without watching the sequence. There is no explanation as to why Mauer gives Kubel both signs before one pitch (in fact, this is just cited as evidence that Mauer knew which sign was given). Actions that don't logically lead to the conclusion of sign stealing, like Laird going through the signs multiple times, are attributed to the fact that Mauer was stealing signs.

Joe Mauer may or may not have been relaying signs, but it is unlikely that he knew the signs Detroit was using, and the video certainly presents no good evidence that he did. The more likely scenario is that Verlander and not Laird was tipping the pitches to Mauer. Verlander was pitching out of the windup, which is unusual for having men on base, and he holds his glove high enough that Mauer may have been able to see his grip over his shoulder. If Verlander is used to regripping early from the windup, Mauer may have been able to see his grip on some of the pitches. Some pitches, he probably didn't get as good a view or Verlander didn't change his grip quickly enough for him to relay a pitch, which is more likely than him missing a sign if he knew the sequence. He may have also given both signs on one pitch because Verlander regripped and then changed to another grip immediately after. After this point in the at bat, there is no evidence of Mauer relaying signals on the video (though there is only one pitch where they show him at the right time to catch it if he were), so Mauer may have figured Verlander was onto him after he gave the dummy regrip and given up on trying to relay the pitches. Heck, maybe Mauer was just adjusting his helmet and scratching his face. Seeing him do that a few times in an at bat isn't clear evidence that it's not just a coincidence or a habit of Mauer's, especially since he does both at one point, does one at a time where he probably didn't see the sign, and does neither multiple times when he did.

Fans too frequently take what they are told as fact without checking its validity first. It happens when broadcasters cite statistics or rules off memory and fans repeat them in arguments. It happens when someone mistakenly cites factual claims in a discussion and then someone who heard it takes their word. It happens when analysts make offhand remarks and people mistake them for researched analysis. The responses to this video are just another example of this tendency. This habit can get in the way of good baseball discussion because not enough people want to make sure they even have the facts straight before they run headlong into debate. How are we ever going to learn anything from each other if we can't at least do that?
Continue Reading...