3-D Baseball: November 2009

Pitching to Contact and FIP

Last month, I wrote a two-part primer on defense independent pitching stats, with a heavy focus on FIP. Lately, FIP is a somewhat hot topic following the Cy Young voting, where the pitchers who dominated the defense-independent metrics won out in both leagues, and one of the two specifically cited FIP as a focus of his pitching strategy. One of the issues that is oft-discussed is the concept that pitchers whose ERAs far outstrip their FIPs because of good defensive support are simply pitching to their defense, and should be rewarded for that. After all, why push so hard for strikeouts when you know that your defense is going to convert a lot of outs when you do put the ball in play? Should a pitcher (a purely hypothetical pitcher, of course) who only strikes out 6.7 per 9 have that held against him when compared to a pitcher who strikes out 9.8 per 9 or one who strikes out 10.4 per 9 when the 6.7 per 9 pitcher is pitching to a defense that better handles balls in play?

The first thing to note is that FIP is not dependent on strikeouts in the way most people think. As we learned in last month's primer, FIP is dependent on 4 different values, one of which is strikeouts, and another of which is balls in play. Both are important to FIP, as are BB and HR. FIP is a balance of all 4 areas for a pitcher, with each area weighted according to their observed value in actual MLB games. They do not favour strikeouts as a style, and in fact, do not favour them at all when they come with high walk and HR totals. It is fully possible to have a poor FIP as a high-strikeout pitcher.

FIP, in general, does not favour strikeouts in any way other than how their observed value relates to the other 3 areas included in the formula. Now that that is out of the way, FIP does overvalue strikeouts if a team's defense is very good, but only slightly, and, conversely, it actually undervalues strikeouts if a team has poor defense. To see how this works, let's revisit the table of values for the 4 types of events covered by FIP:

HR		1.40
BB		0.30
out		-0.27
BIP		-0.04

Remember that the coefficients used by FIP are derived by subtracting the value of a BIP from the value of each other event and multiplying by 9. FIP is dependent on the value of these 4 types of events being close to those in the above table. When team defense differs significantly from average, it has no effect on the value of HR, BB, or SO (actually, that's not true, because those values are all sensitive to the run-environment, so when a team allows more or fewer runs, those values will change slightly, but for our purposes, we'll ignore that). Defense does, however, have an obvious effect on the value of a ball in play. With a good defense, a BIP will be worth less (to the offense) than -.04. With a bad defense, a BIP will be worth more. So the coefficients in FIP are in fact slightly off when a defense is not close to average, because FIP is tuned to fit the average value of a ball in play.

Here enters the beauty of FIP. Because of how the coefficients are derived, the formula can be easily tuned to fit any level of team defense. FIP, it turns out, is not wrong for pitchers in front of good or bad defenses, it just has to be tuned differently. All we have to do is recalculate the value of a ball in play and re-derive the coefficients.

Imagine a pitcher had as good a defense as he could possibly have. Say, for example, he had the 2009 Mariners' defense. This defense was worth 85 runs above average, or roughly .02 runs per BIP. An average BIP is worth -.04 runs, so an average BIP pitching in front of this defense is worth -.06 runs (remember that lower is better for the defense). We recalculate our coefficients and get:

13.13*HR + 3.23*BB - 1.90 SO*

Strikeouts did indeed lose value, and walks and home runs both became more costly. How much difference does this make, though? Let's revisit our hypothetical 6.7 strikeout pitcher. Let's say he also walks 2.1 per 9 (after adding in HBP and subtracting out IBB) and allows .33 HR per 9.

Using the traditionally derived coefficients, he'll have an FIP of about 2.84. Keeping in mind that we also need to calculate a new league constant to scale FIP to ERA since we changed the coefficients, we find that he would have an FIP of about 2.79 using our defense-specific coefficients. Traditional FIP underestimated him by about .05 ER/9, or about 1.2 runs per 200 innings, compared to defense-sensitive FIP.

Since we are already adjusting for defense in our calculations, we can go a step further in incorporating defensive context into our valuation. The league average ERA this year was 4.32, but we know that that won't be the case given a +85 run defense. A league average pitcher, given +85 defense, will have about a 3.84 ERA. Using that figure, we can recalculate our constant for FIP and calculate a new number that is an estimate of actual ERA, not of ERA minus defensive support. This means that we would expect our 2.79 FIP pitcher to have an actual ERA of about 2.31. This is, of course, not as valuable as an ERA of 2.31 in front of an average defense, so we have to account for that as well. If average is 3.87, then a replacement level starter (using a .380 winning percentage as replacement level) will have an ERA right at 5.

A 2.31 ERA is good for a .749 winning percentage against a league average 4.32 ERA. Our replacement level pitcher, who is normally .380, is not .380 against the league with his +85 defense, however. He is .437. That means that our pitcher is worth about 6.9 WAR per 200 innings. Using his traditional FIP, we would give him a .679 winning percentage over a .380 replacement level, which comes out to 6.6 WAR per 200 innings. Our hypothetical** pitcher actually gained .3 wins once we considered the nuances of pitching to contact in front of a stellar defense. That's actually quite a bit. It's worth over a million dollars to the pitcher on the open market.

At the beginning of this article, I said that the traditional coefficients were only slightly off with an extreme defense. Here, we find that they can be off by as much as .3 wins if we take a Cy Young caliber contact pitcher and put him in front of the best defense on the planet. Can we really write off .3 wins as slight enough to use traditional FIP as a stand-in for defense-sensitive FIP if we want to capture the value of pitchers separate from, but in the context of, their defense?

If the value were ever really that high, I'd say no. It isn't, though, at least not if what we want to measure is how defense affects a pitcher's approach. Everything we plugged into the calculations above were purely after-the-fact measurements, but the only thing a pitcher can leverage in adjusting his approach are expected values. That means that if our +85 defense only projects to be worth 60 runs a year going forward (I'm making that number up for illustration purposes), then the pitcher can only leverage 70% of those 85 runs by adjusting his approach. Even though the defense ended up saving 85 runs, there is no way the pitcher could have leveraged the 25 they saved over their projection without knowing they would outperform the projection in advance (which, by the loose definition of a projection, you can't). He also can't leverage his full home run rate, which in this case is probably at least to some extent anomalous. If he knew ahead of time that he would only allow .33 HR per 9 giving up that much contact, he could leverage contact quite a bit (the .3 wins arrived at above being "quite a bit" in this case), but only knowing his projected home run rate, he can only leverage up to his projection, not beyond.

For these reasons, our hypothetical pitcher is never going to actually be undervalued by .3 wins per 200 innings using traditional FIP just because he pitches to contact, even if we give him by far the best defense in baseball.

This also means that just because a pitcher's ERA is better than his FIP, even if that difference is because of defensive support, it does not mean the pitcher was utilizing a better defense if his team defense was not far above average overall. Let's create a new hypothetical pitcher who has the same FIP as the one above and an ERA in the 2.2s, but whose team defense we measure to be a bit below average. In this case, we don't know that the difference between the pitcher's FIP and ERA is because the pitcher got better than average defensive support, but he might have. Let's assume that he did. Does he get credit for pitching to contact and using that good defensive support? In this case, no, because whatever his defensive support ends up being, we expect it to be below average, so deciding to pitch to contact is a bad choice. In terms of how this pitcher can leverage his defense, strikeouts are actually slightly underrated (slightly enough that we can basically ignore it, but they are underrated) even though the pitcher's ERA over-credits him for good defensive support, because he has no way to leverage that defensive support based on decisions about pitching approach made before the fact. Our pitcher is now far overrated by ERA because of defensive support and not at all underrated by FIP because of an ability to leverage good defense.

We return to our initial question about using FIP for pitchers who receive good defensive support: should a pitcher be punished for pitching to contact in order to leverage a good defense? The answer is somewhat complicated. No, a pitcher should not be punished for leveraging good defense if he is doing it properly, but FIP can actually be tweaked to account for that pretty easily because the methodology for deriving the coefficients lends itself perfectly to adjusting the formula for differing values of balls in play. Traditional FIP and defense-sensitive FIP track very closely together, though, to the point that the difference is mostly negligible and not worth not using FIP in almost any conceivable case. Even in cases where defense-sensitive FIP is a bit off from FIP, FIP will still capture the context of pitching to defense, while still separating the actual value of the defense, better than ERA (note how much closer defense-sensitive FIP, after we recalculated the coefficients to take defensive context into account, was to the traditional measure than it was to the predicted ERA once we also added in the value of the defense). Furthermore, you can't tell if a pitcher even had the opportunity to properly leverage his defensive support just by comparing FIP to ERA, even if you assume that the difference is due to defensive support. A pitcher with a 2.8 FIP and a 2.2 ERA, even assuming that his ERA includes a lot of defensive support, did not necessarily ever have the opportunity to leverage that support by choosing to pitch to contact. In fact, the degree to which a pitcher can leverage his defense has nothing to do with his defensive support itself, but with the projected value of his defensive support before the fact and with his expected rates of HR, BB, and SO given a certain pitching strategy.

*NOTE: You won't be able to exactly replicate any of these values from the numbers given here because of rounding discrepancies, so if you are trying to work through the math on your own and find some differences, that is probably why.

**NOTE: This pitcher truly is hypothetical. Don't believe me? What real life pitcher threw in a 4.32 ERA league in 2009? That's mostly why the value doesn't match up at all with the pitcher you looked up, by the way.
Continue Reading...

Did Wainwright and Carpenter Split the Vote? and other Cy Young Stories

Congrats to Tim Lincecum, who was awarded his second Cy Young in as many seasons yesterday. His selection is intriguing for a number of reasons, in particular the apparent shift it signals in the BBWAA. For the first time ever in a non-shortened season, the voters have dipped their minimum win threshold to as low as 15 to reward a starter for simply out-pitching everyone else in the league. And to top it off, this came a day after resoundingly favouring Zack Greinke's superiority in spite of his only 16 wins.

This shift comes in part, I'd imagine, because of the shifting electorate. Keith Law and Will Carroll, both internet based writers, voted for the first time. Both turned in the only ballots that didn't have the same three names as everyone else. However, Carroll still voted Wainwright #1, and that's just 2 guys. There's more to it. Maybe there are other new voters in the newspaper ranks as well, but I suspect some of the older guard is changing as well. There have always been writers who have been more open to more detailed and less haphazard analysis, but with the advent of FanGraphs and other stat sites, good numbers are easier to come by than ever before, all gathered in one hugely popular place that most voters have probably heard of and even visited. Now those voters are better armed than ever before. There are also those who look at ever-shrinking win totals and the endangerment of the 20-game winner and ask if starting pitchers still have the same extent to influence wins and losses in this age of extensive bullpens and high-powered, deficit-erasing offenses. Those voters have begun to question if 15-7 is really the best measure any longer.

I'd love to write about all that, because it's really the most interesting thing about this vote to me. I'd love to, but, unfortunately, I won't. Two paragraphs are all I can spare right now because, as interesting as that is, everyone I encounter is so damn fixated on painting controversy all over the award.

Maybe I hang around too many Cardinal fans (being one myself). Maybe I shouldn't go near message boards this time of year. Whatever it is, I have heard and read way too much from people who are flat out angry (example of actual message board topic: "Simple Poll: If you could punch Keith Law in the face, would you?" - I'd link it, but it seems to have been deleted). On the one hand, many people are now concluding, as many in the analytical world did long ago, that these awards are meaningless. On the other, rather than just ignoring them, they are yelling and sending hate mail to voters and publishing for others outlets where they can do the same. So how should you, as a reasonable person, address these concerns?

For a lot of people, you shouldn't even bother. Just let them be angry and don't worry about throwing yourself in their path. Some people really do want explanations though, and don't understand how it worked out this way, or really think some unfortunate circumstances robbed their guy of a rightful award. The first, and probably most basic, thing I want to address is the idea that the two St. Louis candidates split the vote, essentially giving Lincecum the award by default. This idea was floated around before the winner was announced and has been repeated a lot since (it was even discussed on MLB.com's live award show-which, by the way, had to be the most anti-clamactic announcement I've ever heard: long corny segment with Captain Morgan mascot, cut to anchor desk, small, forced laugh at Captain Morgan mascot, and, out of nowhere, with no build-up, simply stated as if it were casual discourse, "Tim Lincecum wins the Cy Young." Fine).

This concept has no basis in reality, however. I've yet to see any evidence that voters vote regionally as the theorists claim, but even if they did, this would be impossible. Imagine, for a moment, that voting were entirely provincial, and voters split their allegiance within their own division and voted purely for their guy. What would happen? The NL West has 10 votes, so they all go to Lincecum. Let's assume worst case scenario for the St. Louis two and split the 12 NL Central votes evenly between them. Of course, the 12 Central voters also put the one they didn't vote for 2nd. Assuming the same three are all on each ballot, and the West voters are equally split between Carp and Wainwright for 2nd and 3rd, that leaves us with the following scenario:

	First	Second	Third	Total
Lincecum	10	0	12	62
Carpenter	6	11	5	68
Wainwright	6	11	5	68

This is the most favourable a split of purely regional voting can go to Lincecum, where the other two divide everything as evenly as possible. Assuming the East splits its votes between all 3, there is no way for Lincecum to win a vote where all three are considered equal with the deciding factor being regional splitting of votes, because the Central division has more voters. Whichever between Carp and Wainwright can draw the extra first or second place vote from the East would win.

The three-deep ballot makes such a splitting of votes impossible. If voters are really split between Wainwright and Carpenter because of their regional proximity, then those voters simply put the other guy second and push Lincecum to third, and he makes up no ground by the other two splitting the first place votes.

Of course, we also know that voters don't vote purely regionally. Without comprehensive data on who voted what, it's hard to say exactly how much regionalism appears to affect voting, but I doubt it's at all a significant factor, and anyway, even if it were, it wouldn't matter. The current ballot makes the issue of splitting votes moot.

There's also the issue of fans who feel that the ballots should not go 3 deep, and the most first place votes should win. The basic idea is actually to prevent splitting the vote (so don't let anyone try to use both arguments on you; if they are concerned about splitting of the vote, then extending the ballot past 1 name is essential to deal with that concern). This year, there was likely some splitting of the first place votes along many lines-not regional, but based on evaluation methods. Lincecum and Carpenter split the votes of those who looked past wins. Wainwright and Lincecum split the votes of those who considered IP an important factor. Etc. Lincecum and Carpenter were superior to Wainwright in a lot of important stats, but similar to each other rate-wise, so they split a lot of votes between them. There were more people who voted that felt that Wainwright was the best pitcher, but there were also more people who voted who felt that Lincecum was better than Wainwright than felt Wainwright was better than Lincecum. If you asked the voters, instead of picking one, to choose between those two, Lincecum would win, so is going only by first place votes a better representation of the sentiment of the voters? Should the winner be the guy who got 1 additional first place vote when it's also true that an additional 6 voters felt that guy was the worst of the top 3 candidates?

In practice, it is rare that the most first place votes don't win. The only time it happens is when there are a group of candidates who a lot of voters feel are better than the one who gets the most first place votes, but who split the first place votes among themselves. The last time it happened, Ivan Rodriguez won the 1999 AL MVP while Pedro Martinez carried more first place votes. The voters who considered pitchers more heavily had only one candidate, so Martinez carried all the votes of a minority. The majority of voters considered Pedro's season short of several non-pitchers, but their first place votes were split between multiple candidates. The only other time it happened with the Cy Young, Tom Glavine beat out Trevor Hoffman in 1998 despite Hoffman having 2 more first place votes. Hoffman carried every vote from the voters who weighed relief efforts relatively more than the other voters, whereas those who felt Hoffman's contributions in limited innings fell short of a plethora of other candidates split their first place votes between several starters. As a whole, the voters felt Glavine was better than Hoffman, just as had been the case with Pudge in 1999.

There is also the issue of one-name only voting discouraging voters from putting down the name they truly feel is deserving. Would all of the voters who felt Kevin Brown was the best pitcher in the NL in 1998 have written his name down if they also thought Glavine was easily better than Hoffman, but that Hoffman would win the award over both Glavine and Brown if they didn't put their support into the guy they thought could win the Award? Having only one name per ballot asks voters to choose between voting based on who they think will give the award to the most deserving candidate and voting for who they think is the best pitcher. This also means that it's impossible to know for sure who would have had the most first place votes under a one-name-ballot system. Would any voters who went Carpenter-Lincecum have looked at Carp's innings, thought, no way he gets more votes than Lincecum, and then put down Lincecum's name as the most deserving candidate between him and Wainwright? No way to know, but it's feasible, so it's impossible to consider Wainwright a guaranteed winner under that system anyway. You can't collect the votes under one set of rules and then change the rules to decide what the votes really said. Of course, this is seldom if ever raised as an issue exept when done so by someone complaining that it robbed his/her favoured candidate. People seem to be searching for arguments against the result by attacking the method rather than arguing against the method itself.

What really seems to set people off is the matter of Vazquez and Haren being on ballots at all. People immediately began deploring their least favourite voters, thinking it could only have been that idiot who did this. THT noted that John Heyman blasted "dumb sportswriters" on his Twitter for their omission of Carpenter, only to later back off and apologize, as commenter Zach Sanders wrote, once he found out that one of them was Will Carroll. Even other BBWAA voters, even ones who consider themselves statistically versed (including one who cited day/night splits as a reason for voting Carpenter over Lincecum), have spoken out against their colleagues here and implied that they don't belong in the process.

I don't really know what to say to someone so adamantly opposed to differing opinions, except to show them how it is reasonable to think Dan Haren and Javier Vazquez were in a class with the other top pitchers in the league (besides Lincecum, anyway). I wrote privately my picks with some explanation nearly a month ago, and to be honest, I'm shocked at the uproar against anyone for putting one of those two in the top 3. I rated them as much closer to Wainwright/Carpenter than either of those guys were to Lincecum, so outrage for 2 votes for Haren or Vazquez over Carpenter or Wainwright and nothing but total apathy toward Wainwright getting 12 first place votes baffles me. It really does. I don't have the most faith in fans or writers when it comes to analyzing players, but I just don't understand the venom. Here's what I wrote last month:

NL Cy Young:

1. Tim Lincecum
2. Chris Carpenter
3. Javier Vazquez

Despite what all the pundits are saying about this being as tight as a race could possibly be (and maybe they're right as far as the actual voting goes, I don't know), the NL race also has a clear winner to me, although not as clear as in the AL, and then a pack bunched behind him. Carpenter may have made a run at Lincecum if he hadn't gotten hurt, but with over a 30 IP difference between them, Lincecum pulls well ahead.

I'm sure you want to know how I can put Vazquez ahead of Wainwright. There were 4 pitchers I considered behind Lincecum: Carpenter, Wainwright, Vazquez, and Haren. For all of them (and Lincecum), I looked at their pitching from several perspectives of run prevention, including both traditional perspectives (ERA and RA) and defense-independent perspectives (FIP from FanGraphs, xFIP from THT, and tRA and tRA* from StatCorner). For each perspective, I considered pure rate production (an expected winning percentage unweighted by IP) as well as total production (W% converted to wins based on IP, both above average and above replacement). ERA, RA, and FIP were also park-adjusted using Baseball-Reference's multi-year pitcher park factor for each players' home park (the others already have adjustments that neutralize park to a large extent). I also looked at WPA and WPA/LI with the wins above average figures.

Overall, the things that stuck out were:

-Lincecum was clearly ahead of the pack whether I looked at rate production, wins above average, or wins above replacement.

-Carpenter's rate production was closest to Lincecum's. He was dead even rate-wise with Vazquez in just the defense-independent stats, but he pulled away in the traditional stats. However, Carpenter's win value production, especially above replacement (IP have more weight above replacement than above average) fell back to the pack. He was about even with Wainwright in WAR production just behind Vazquez and Haren, and either slightly ahead of or slightly behind Vazquez for second in WAA depending on how I averaged the different perspectives.

-Haren picked up the most ground from the park adjustment. Pitching in Chase Field probably hurt him quite a bit in most people's eyes because his raw stats aren't as good as he actually pitched.

-Wainwright, and this is the kicker, was consistently at the back of the pack no matter what I looked at. Rate wise, WAR, WAA (both with and without the WPA stats) all had his production rated at the bottom of the group. He was really good this year, but if you just look at how well he pitched and not just at his traditional numbers, he wasn't quite as good as the best pitchers in the league. When everything I was using to evaluate each pitcher was pointing to Vazquez having pitched better than Wainwright, I just couldn't write him in ahead of Vazquez.

And that's all I can really say to people who question selections of Vazquez or Haren as damning one to unworthy idiocy. When I look at each pitcher's production, I just don't see it that way. It's certainly close enough that it shouldn't incite this kind of reaction. Also, looking back on my work now (not what I wrote above, but where I worked it out), I may have given too much consideration to rate production. I'm still undecided on how to handle "best" in terms of rate (regressed to some extent for those with lower playing time) vs. pure value. Point being, I can easily see Vazquez ahead of Carpenter for second. In fact, if you are most concerned with WAR, I can see any of them ahead of Carpenter. But uproar over Vazquez or Haren over Carpenter and apathy toward Wainwright ahead of everybody? I can't see it. If you are willing to concede that methods can legitimately differ enough to produce any order of Lincecum, Carpenter, and Wainwright, how can a method that puts Vazquez or Haren in the top three be so far off as to be grounds for expulsion from the process?

Continue Reading...

A Turning Point

Game 3, series tied at one game apiece.

Bottom of the second. Bases loaded. Shane Victorino at the plate. Andy Pettitte pitching, and on the ropes.

This single at-bat, which began with Pettitte’s 51st pitch of the game, may have been the moment that turned the World Series in favor of the Yankees and lost it for the Phillies.

Let’s review this a bit. By game three, Joe Girardi had already announced his intention to go the rest of the Series pitching his key starters on three days rest. It was a gamble that the pundits were already not only debating, but berating Girardi about long before the Series would play itself out. Some were already arguing that if this gambit failed, Girardi would lose his job in the off-season.

And so, as this game opened every interested eye was focused on the performance of the oldest of the three-man tandem comprised of C. C. Sabbathia, A. J. Burnett, and Andy (the only one with a first name) Pettitte and singled out by their manager to perform under these circumstances.

After giving up a single on his first pitch to Jimmy Rollins, Pettitte would get Victorino to pop out weekly to third, then strike out Chase Utley and Ryan Howard. On the surface, this would look like a good start to the game for Pettitte. But it took him 24 pitches to get through his first trip to the mound. This trend could not continue if he were to succeed not just in this game, but for a manager who had already told the world that even this aging veteran would pitch next on short rest.

We come now to the second inning. The Phillies are at home, and are looking to press this narrow advantage to its limits. Having split in New York, this is their first Series game in front of their own fans. This inning would give them something to cheer about, and lift their hopes for brighter prospects as the Series continued.

Jason Werth led off the inning with a homerun. Not good for Pettitte, but not the whole story of this at-bat, either. First pitch ball. Second pitch ball. Third pitch ball. 3 and 0 to one of the legitimate power threats in this rich lineup is not a recipe for success. Taking 3 and 0, Jason times the next pitch and fouls it off. On the 6th pitch of the at bat, Werth would send the ball out over the wall for the first run of the game. Pitch count now over 30 with no one out in the second.

Pettitte would battle back to strike out Ibanez on four pitches, showing how this veteran, on the road in a big game, was not going to be easily rattled.

Pedro Feliz would come up next, and would double on a hard hit ball deep into the outfield reaches of the park.

Carolos Ruiz would come up next and walk on 5 pitches.

Had the Phillies won the Series, this next at-bat could have been cited as the turning point. Cole Hamels comes up with a one run lead, runners on first and second, and one out. He is called on to bunt, sacrificing both runners into scoring position with two out, but with the star short-stop on deck who had already singled in the game.

Hamels lays down a decent bunt, about 30 feet out in front of the plate and to the third base side of the pitcher’s mound. Charging in to field the bunt was Andy Pettitte. Charging out to field the bunt came Jorge Posada. In a classic case of “I got it, you take it,” both pulled away from the bunt at the last second assuming the other would field it. The ball rolled untouched while all three runners advanced without a play.

Bases loaded. Jimmy Rollins strolls to the plate.

Pitch one: ball.

Pitch two: ball.

Pitch three: ball.

Pitch four: taken 3 and 0 for a called strike.

Pitch 5 - the 50th pitch of the game, with one out in the second inning: ball four.

The second run scores while the defense twiddles their thumbs, unable to do a damn thing about it.

Let’s review the second inning: homer, out, double, walk, bunt single, walk with the bases loaded.

Here comes Victorino - in what I believe is the turning point of the Series.

The first two pitches are the key to this at-bat, and in some ways a testimony to the grit and resolve of this sly veteran (who else, after losing control and walking a batter with the bases loaded, would start the next batter with two sliders low and away?).

Both pitches were sliders in the dirt - low and outside.

Neither pitch was even close to the strike zone.

In what is still for me an inexplicable mystery, the count after those two pitches? 0-2.

Yep, Shane Victorino swung meekly and rather sickly at two pitches that not only never threatened the strike zone, but which offered him no hope of even making contact. With a weakening Pettitte fumbling to maintain control of his pitches and the Yankees hopes in game 3, Victorino took him out of the dog house. With two strikes, he would hit a weak fly ball to left field, deep enough to score another run to give the Phils a 3-0 lead, but effectively end what could have been a back-breaking inning for them. Utley would follow with a strike out to completely end the rally.

Pettitte would come out again in the third, already having thrown 57 pitches and down 3-0, and find his stuff again. His Yankee teammates would score two runs in the 4th to make it a ball-game again, then three in the 5th to take a lead they would never relinquish.

Pettitte would go six, throwing 104 pitches (only 59 for strikes) and picking up the win.

There is no way to predict what would or could have happened if... . But, Shane Victorino swung mysteriously at the first two pitches thrown by a man who was on the ropes, who had walked the previous man with the bases loaded, and then ended up trading an out for a run and killing a rally. It is not to to see how this single at-bat ultimately changed the outlook for both the Phillies and the Yankees, and could have gone very differently but for two inexplicable swings and misses. I’m not saying the Phils would have won the Series, or even for that matter the game. But for me, this was the turning point in the Series - the point after which, and perhaps even because of which, the Yankees withdrew triumphant. Continue Reading...

Converting OBP/SLG to wOBA

Statistical analysis has crept into the mainstream consciousness. It is no longer difficult to find fans who look at slash lines before Triple Crown stats or who can properly choose (after looking it up, of course) between Brian Giles' and Carlos Lee's careers at the plate (just try to do it without considering walks). Unfortunately, the face of this new wave of stats has become OPS. As frustrating as it can be that the past few decades of work have been essentially boiled down to the equivalent of rummaging through the pantry, picking out a mishmash of the best ingredients, and throwing them into a casserole rather than looking up a real recipe, it is a step forward. It may be tiring having to explain that, even without considering park effects, Kenny Lofton still hit better than Vinny Castilla, OPS be damned, but it's infinitely better than trying to have a discussion with someone who insists that Gary Gaetti and his 360 HR and 1341 RBI are Lofton's superior. Plus, it's nice to be able to go to a ball game and look up to the scoreboard when an unfamiliar hitter comes up and see something that will give you a better idea of how well he's hit than AVG/HR/RBI.

Luckily for those of us who don't care to stop at OPS, linear weights, et al have become widely available. Baseball-Reference has Jim Palmer's Batting Runs. BaseballProspectus has EqA. FanGraphs and StatCorner have wOBA. All these are designed to model runs, not to stab in the dark at them. These are the stats that looked up the recipe first. But can we still make OPS work? The Book says, "for you OPS lovers, you will note that (OBPx2+SLG)/3 is a close approximation of wOBA." The problem with this is that if you run that calculation with the idea of wOBA for scale, you will find a lot of hitters to be much better than you thought, as average for that calculation is a good .030-.035 points higher than for wOBA. So if all you have are a player's OBP and SLG, and you want to know how good a hitter he is, 2OPS/3 will be on an unfamiliar scale. You know how good a .350 wOBA is, but not how good a .350 2OPS/3 is.

The upside of 2OPS/3 is that it retains the simplicity of OPS. It's easy to calculate, and, because it is still arbitrary, it looks clean and simple. So if you want to use it, know that average (for this decade) is around .365, not .330. That's decent enough, I guess. We can still do better, though. Of course, to be honest, should we? The previous paragraph listed several alternatives that are readily available and, frankly, already do everything we can hope to with OPS, but better. So why bother? There is a reason, after all, that The Book wraps up its lone paragraph on OPS rather succinctly: "This is the last time we will talk about OPS."

But say for some reason all you have are OBP and SLG. Maybe you're at a game and the scoreboard flashes a player's slash line and you want to know how good that is beyond what OPS can tell you (and you brought a calculator or pen and paper, or at least picked up some spare napkins at the concession stand to jot down your calculations on). Maybe you're having a real-life, offline discussion where the stats come up. Maybe you have a projection that only gives you the traditional slash line, or you are looking up some sort of split that isn't presented on the sites that give you linear weights. Whatever the reason, you find yourself trying to decide if a .400/.420 line is better than, worse than, or about the same as a .370/.460 line, and just how much difference there is. How do you do it?

One way is to look at all players who have a specific breakdown of OBP/SLG and see what those players' corresponding wOBA was. You can look at all players who had roughly a .330 OBP and .420 SLG and see that they had, on average, a .326 wOBA. And then you can do the same for every semi-common combination of OBP and SLG. Sound like a plan?

Now take every player-season from 1993-2008 and round the OBP and SLG to the nearest hundredth, and group all player-seasons with the same truncated OBP and SLG together. Limit combinations to only those with a combined 2500 PAs. Then, you can look at the average wOBAs for all slugging percentages with a set OBP to see how much each point of SLG is worth and vice versa to determine the value of a point of OBP. You end up with something like this:

wOBA by SLG, OBP=.330

trOBP	trSLG	actOBP	actSLG	wOBA	PA
.3300	.3200	.3289	.3192	.2930	2755
.3300	.3300	.3296	.3295	.3007	6635
.3300	.3400	.3299	.3398	.3019	8002
.3300	.3500	.3296	.3499	.3061	15144
.3300	.3600	.3297	.3599	.3077	13466
.3300	.3700	.3297	.3701	.3124	11015
.3300	.3800	.3299	.3800	.3132	16022
.3300	.3900	.3302	.3908	.3181	15032
.3300	.4000	.3300	.3999	.3208	23948
.3300	.4100	.3309	.4101	.3243	19832
.3300	.4200	.3290	.4195	.3256	17874
.3300	.4300	.3301	.4303	.3296	17440
.3300	.4400	.3302	.4393	.3321	16495
.3300	.4500	.3302	.4506	.3364	14757
.3300	.4600	.3298	.4601	.3392	17508
.3300	.4700	.3293	.4701	.3416	10955
.3300	.4800	.3303	.4794	.3476	13828
.3300	.4900	.3306	.4899	.3501	7501
.3300	.5000	.3308	.4987	.3528	6732
.3300	.5100	.3310	.5092	.3571	5783
.3300	.5200	.3309	.5207	.3620	4285
.3300	.5400	.3307	.5392	.3637	3024
.3300	.5500	.3332	.5494	.3751	2722

And so on for every other truncated OBP, and then repeat grouping by truncated SLG instead of OBP. From here, we can look at how much wOBA changed for each .010 points of SLG when we hold OBP constant and see that .010 points of SLG is worth about .003 points wOBA. Note from the following graph that the relationship is roughly linear:

We can also do the same to see that .010 points of OBP is worth .005-.006 points wOBA (the graph for OBP vs. wOBA when SLG is held constant looks similar, just with a steeper slope). This is pretty close to the 2:1 rule of thumb for OBP:SLG (it's actually around 1.8 by this method). This value is relatively constant whether OBP and SLG are high, low, or average, as illustrated by the following graph:

Again, the graph is similar whether you hold SLG or OBP constant. There is a slight downward trend as SLG and OBP rise, but nothing major.

That is the first result to note: .010 points of SLG are worth roughly .003 points wOBA, while .010 points of OBP are worth .005-.006 points wOBA. You can use this rule of thumb to compare two players by taking the differences between their OBPs and SLGs.

What if you want to actually replicate a wOBA figure, though? This is a bit messier. Really, this isn't worth it unless you just don't have access to wOBA itself for whatever reason. But say you need to do it. We want to complete the formula:

wOBA = A*OBP + B*SLG + C

We already know A and B to be .56 and .31, but we don't yet know C. So we go back to our original table and calculate .56*OBP + .31*SLG for each combination of OBP and SLG, and then subtract that from the wOBA for each combination:

C = wOBA - (A*OBP + B*SLG)

Here, we introduce a problem. C is not really a constant. It changes when OBP and SLG change. Honestly, did we really expect anything related to OPS that wasn't arbitrary to be mathematically simple? Here's what the graph of C looks like for each predicted wOBA:

Lovely. The formula for the line of best fit is printed on the graph. That is our value for C. The x in that equation is really (A*OBP + B*SLG). So our formula for converting OBP and SLG to wOBA is now:

x = .56*OBP + .31*SLG
wOBA = -.53x^2 + 1.35x - .045

This can be combined into one equation with substition if you prefer, but it looks a bit ugly, so we'll just leave it be for now. This is now to the point where anyone who would care to do the calculation would almost be better off just calculating wOBA directly from raw stats, but whatever. Go only as far into the calculations as you need. If you want to go this far, this is how you do it.

Does this formula work? For the most part, yeah. The scale and league average match pretty closely with wOBA, and for most players, it works out to be pretty close. This estimate is within .010 points of the actual wOBA for over 95% of player seasons since 1993. About 3 quarters of player seasons are within .005 points wOBA. Half of them are within .0027 points wOBA. The average absolute difference between predicted and actual wOBA is .0036 points. Not bad, especially considering wOBA counts stolen bases and our estimate doesn't.

Obviously, the players this works most poorly for are those with a large effect from either stolen bases or from intentional walks, as wOBA handles those in a fundamentally different way from OBP and SLG (in that it considers SB/CS at all and that it differentiates IBB from nIBB). For example, the two biggest discrepancies between predicted and actual wOBA were Bonds in 2004 (120 IBB) and Willy Taveras in 2008 (68/75 in SB attempts). Both were over .025 points off.

So there you have it. Can you convert OBP and SLG to a reasonable estimate of wOBA? Yes. Should you? Probably not, unless it's all you have and you really need a reliable way to convert to actual runs. If all you want to do is get an idea of how good someone is at the plate or who is better than whom, there's little point in going all the way through the conversion. But you could do it. So take that, OPS.

Now I just need to convince the scoreboard operators at my local stadium to substitute my definition of OPS for theirs. Then we'll really be in business.
Continue Reading...

Competitive Balance, Reprise

In the wake of the Yankees' latest World Championship (congrats to the Yankees and their fans), the issue of financial disparity has resurfaced with renewed vigour, with the most common refrain being in support of the simplest idea available in the public domain: a salary cap. In light of such, I would like to direct all readers of this blog to the report on competitive balance (the subject of my first post on this blog) that MLB commissioned of the independent Blue Ribbon Panel on Baseball Economics (<-PDF), headed by George Mitchell, at the beginning of this decade. Note that nowhere in the report is any form of a salary cap suggested, and that most of the recommendations have yet to be implemented or even pushed for by MLB. Continue Reading...

3-D Baseball

Pitching to Contact and FIP

Did Wainwright and Carpenter Split the Vote? and other Cy Young Stories

A Turning Point

Converting OBP/SLG to wOBA

Competitive Balance, Reprise

Javier Vazquez K-Watch

Links

Retrosheet Credit

Lahman Credit

Contributors

Blog Archive