Baseball Statistics Are Crap

October is almost upon us, which means that we've been subjected to a bunch of long segments on Mike & Mike about baseball. These serve to remind me just how little use I have for baseball, and baseball statistics.

I've long thought that baseball fans are stat-obsessed dorks, but my opinion changed somewhat when I started learning the definitions of those statistics. Now, I think they're foolish stat-obsessed dorks.

It's a shame, because baseball is one of very few sports where you have a chance of doing meaningful statistical analysis, owing to the approximately three billion games played every season. If nothing else, they have large sample sizes. Which makes it almost tragic that the core statistics of baseball are such... bullshit.

This was brought back to my mind recently during the mini-kerfuffle over CC Sabathia's near no-hitter a few weeks ago. Watching the highlights as a sensible person, I would say "close, but not quite." A guy hit the ball with the bat, and ended up safe at first. Your no-hitter is over.

But no-- it's all up to the official scorer, whoever that is. If the guy hit the ball, but somebody "should've" caught it, then it's an error, not a hit, and the "no-hitter" would be preserved. So the whole thing turns into a debate over whether Sabathia bobbled the ball enough for it to count as an error or not.

This sort of nonsense carries over into the core statistics of the game.

Take "batting average," for example. You'd think there would be nothing more objective and unambiguous than that: You look at how often a player came to the plate, and how many times he reached base safely, and the ratio of those is your batting average.

You'd be wrong to think that, though. That statistic exists, but it's "on base percentage." "Batting average" doesn't include plays judged to be errors, and doesn't include walks. If a batter reaches base on a walk, that at-bat doesn't even go into the denominator-- it's like it never happened.

Similar foolishness afflicts pretty much every statistic that gets talked about on ESPN. The whole concept of "wins" and "losses" for a pitcher is a little silly, given that a minimum of eight other people are involved in the game, but if you wanted to keep track of the number of times a team won when a given guy was on the mound in the first inning, that might mean something. That's not what they do, though-- the actual process by which the "winning" and "losing" pitchers are determined is completely insane.

And don't talk to me about "saves." Or "Earned Run Average."

The striking thing about this is how central these statistics are to discussions of baseball. You get bullshit statistics in other sports, but they're usually somewhat peripheral-- in baseball, the core statistics of the sport are all bullshit.

Basketball has three somewhat major stats that involve the same kind of dubious judgment calls as the core baseball stats: assists, steals, and blocks. The real core stats, though, are absolutely unambiguous: when a shot goes up, either it goes in, and somebody is credited with points, or it misses, and somebody is credited with a rebound. There's no leeway for the official scorer to insert himself, no "well, the opposing center really should've blocked that shot, so those points don't count toward your season total.".

Pro football has the lowest bullshit level of any major sport. The only commonly-cited statistic that involves a judgment call by the official statistician is "sacks." Everything else is clear and unambiguous-- somebody threw the ball, somebody caught the ball, somebody ran with the ball. No bullshit.

Weirdly, the only major American sport to rival baseball in bullshit stat-geekery is college football, with its ludicrous power rankings. But that's a rant for another time.

Tags

More like this

In this post: the large versions of the Life Science and Physical Science channel photos, comments from readers, and the best posts of the week. Life Science. A cheetah in the San Diego Wild Animal Park. From Flickr, by HBC4511 Physical Sciences. A Foucault pendulum in Milan, Italy. From…
One of the chapters of the book-in-progress, as mentioned previously, takes the widespread use of statistics in sports as a starting point, noting that a lot of the techniques stat geeks use in sports are similar to those scientists use to share and evaluate data. The claim is that anyone who can…
After Thursday's post about sports and statistics, a friend from my Williams days, Dave Ryan, raised an objection on Facebook: There's an unstated assumption (I think) in your analysis: that there is some intrinsic and UNALTERABLE statistical probability of getting a hit inherent in every hitter.…
One of the under-reported effects of cheap and widely available personal computers is the increasing dorkification of sports. I'm talking here about the rise in obsessive stat-geekery across the board, with the accompanying increase in "fantasy" sports. Those phenomena have hardly been ignored, but…

You should try checking out either "Baseball Between the Numbers" or "The Book". These are written by baseball stat-obsessed dorks who have basically come to the same conclusions you have. That the oft-reported stats are essentially worthless. They still get reported however because they've been around long enough that just about everyone knows what they mean. You can't talk to a casual baseball fan about things like VORP (value over replacement player), or WAR (wins above replacement), or wOBA, or any number of new stats that people have been putting forward in the last 20 or so years, because these are only well known amongst the most obsessed of stat-obsessed dorks. Many of these actually have useful meaning when talking about something such as when to intentionally walk a hitter, or when to attempt a sacrifice bunt, or when just debating who was a better hitter.

There really is a lot of good statistical research going on in baseball. It just doesn't get very much face time.

Check out the official definition of "on base percentage". You would think that the figure would be the number of times the player makes it to base, by whatever means, divided by the number of at-bats.

Nope. A player who gets on base every single time he comes to bat would have an OBP of less than 100%.

By Swingin' Amis (not verified) on 26 Sep 2008 #permalink

Pro football has the lowest bullshit level of any major sport. The only commonly-cited statistic that involves a judgment call by the official statistician is "sacks." Everything else is clear and unambiguous-- somebody threw the ball, somebody caught the ball, somebody ran with the ball. No bullshit.

Not quite. What about plays involving whether or not the player had "control" of the football. Or whether his knee touched the ground before he dropped the ball. Those may or may not be catches, depending on the judgement of the refs. Even throwing is sometimes not clear. Was the ball thrown legally? Was the QB's arm moving forward as he got hit or not? Was he in the pocket when he grounded the ball or not? Or did he even ground the ball or could there have been a reciever who potentially could have caught it?

Those stat obsessed dorks are the people MOST aware of and annoyed by the pointlessness of statistics like batting average, pitching wins/losses, RBI, etc. No one doing meaningful analysis cares about those at all, nor have they for quite a while.

I totally agree with Ike. Baseball fans who are statistically oriented know how terrible the 'traditional' stats are, but they are historical and people in the baseball media and the average fans know what they mean, and so they have a fundamental trust of them, and hence distrust of the others. They have "truthiness", one might say. The level of analysis that has gone into baseball and finding stats which are good predictors of team success, and assessing individual contributions to success is truly staggering and fascinating (to me any way, huge baseball fan and stat nerd I am...)

Amis, what exactly are you talking about? If one gets on base every time one comes to bat, then your on base percent is indeed 100%. The only exception I can think of is if you got on base in the course of producing an out (like on a fielder's choice), and clearly you don't deserve credit for getting on base because you got someone else in better position OFF base.

As an aside, I'm a long time reader and first time commenter, so thanks for all the great writing (except your hostility to baseball. Boo on that)

By Jonathan S. (not verified) on 26 Sep 2008 #permalink

I've sometimes seen top 3 starters' ERA's, bullpen ERA's, scoring avg's etc used to predict outcomes. Never quite bothered to follow up and find out if they are right at least a little bit more than half the time.

Yeah, wow have you missed a big part of baseball analysis over the last 20 years. Starting with Bill James, moving on to Rob Neyer, Baseball Prospectus, Baseball Primer, The Hardball Times, and a hell of a lot more, baseball stats analysis is hell and gone beyond where you seem to think it is.

By Johnny Chimpo (not verified) on 26 Sep 2008 #permalink

You should read the history of baseball statistics in the Total Baseball Encyclopedia (it's in the 8th edition--it might also be in other editions). That article will tell you why batting average is defined the way it is, and why your interpretation of it ("You look at how often a player came to the plate, and how many times he reached base safely, and the ratio of those is your batting average'') is just completely wrong. Incidentally, as noted above the ratio you're interested in is the OBP, conveniently located in almost every table of stats these days (see Manny Ramirez , for example).

what about quality starts????

Those stat obsessed dorks are the people MOST aware of and annoyed by the pointlessness of statistics like batting average, pitching wins/losses, RBI, etc. No one doing meaningful analysis cares about those at all, nor have they for quite a while.

The really hard-core stat nerds use more sophisticated analyses, but the general broadcast community still subjects us to hours and hours of garbage numbers that don't measure anything sensible. I'm not talking about the really hard-core people, I'm talking about SportsCenter.

What about plays involving whether or not the player had "control" of the football. Or whether his knee touched the ground before he dropped the ball. Those may or may not be catches, depending on the judgement of the refs. Even throwing is sometimes not clear. Was the ball thrown legally? Was the QB's arm moving forward as he got hit or not? Was he in the pocket when he grounded the ball or not? Or did he even ground the ball or could there have been a reciever who potentially could have caught it?

I'm not talking about referee judgement calls, I'm talking about things that are determined completely external to the game. I have no objection to, say, strikeouts as a pitching statistic, even though those depend on a call by the umpire. The call is made on the field, and is an unambiguous part of the game, in the same way that referee calls in football become part of the game.

There is another bullshit stat in football that you failed to mention: tackles. Often there are three guys converging on a ballcarrier and only one of them gets credited with the tackle, the other two with an assist. It's totally a judgment call who really brought down the carrier.

I'm not talking about the really hard-core people, I'm talking about SportsCenter.

Certainly, it's true that SC caters to the slowest sort of baseball fan -- John Kruk and Steve Philips work as "analysts," after all -- but the Baseball Prospectus crowd has made inroads. When I watch the Mets on SNY, OBP is included (admittedly right after RBIs), and you hear people talking about OPS (OBP + SLG) on SC now. But really, complaining about the general state of baseball statistical analysis seems odd when every politico in the country is reading Nate Silver's election breakdowns. Silver's claim to fame? He's the mind behind BP's remarkably accurate PECOTA player evaluation system.

Passing yards in football make no sense. They should tell me how many yards that the QB threw the football for a completion. Instead, they tell me how far the receiver got from the line of scrimmage. So a pass where the ball traveled 20 yards through the air where the receiver was tackled immediately will give the passer fewer passing yards than a 5 yard screen pass when the receiver breaks through the secondary for a huge gain. It makes very little sense to give the QB credit for the work of his receiver (yards after reception).

There are more in-depth and useful stats making the rounds in basketball as well -- efficiency, plus/minus, adjusted plus/minus, true shooting percentage, wins produced, etc. Are there "stat geek" type stats going around in football too?

One of the things about football (and basketball, to a lesser extent) is that there are many important things that aren't represented by stats at all. Blocking is the biggest thing that comes to mind in football. A receiver that does a good job of spreading the field to set up a run play won't get statistical credit, either. Defense is hard to quantify for both sports. In baseball, just about everything can have some sort of statistic attached to it, even if it's based on a judgment call.

...
At least sport stats in general and baseball stats specifically are applied over a wide population and over those "approximately three billion games played every season."

While they may not be comparable across years, decades, seasons, or even ballparks, they do have some comparative value. Obviously they are not the be all and end all.

But my nine-year old was able to follow his favorite players' wins and BAs. So to me, and him, they had great value. Even if I do know about Bill James, et al..

...tom...
.

You've reminded of a bit of pointless baseball trivia I heard a couple of weeks ago on ESPN. While discussing the no-hitter Zambrano pitched for the Cubs, the caption below the screen read something to the effect of, "Zambrano first pitcher with last name starting with 'Z' to pitch a no-hitter."

The idea that baseball statistics are less meaningful than other sports seems backward. Yes, they are inherently flawed. The percentage of plays recorded as an error is very small however, and often loses its impact over the course of a season.

OBP is much more important than batting average and wins /losses and ERA are only meaningful as a secondary indicators of a pitchers value. But the relevant stats are slowly becoming more mainstream and are far and away superior to any other sport in terms of evaluating players (which is why they are kept).

In other sports, you can gameplan for a team's strength or weakness and bury certain players in the statbook. In baseball, there is a great deal of independence between each pitch and individuals can be assigned responsibilities for the outcomes. No sport other than baseball can rival this independence of events.

A receiver on an awful team will see his stats go up because his team needs to throw more often and against a softer defense in the 4th quarter of blowout games. That doesn't happen in baseball, teams don't give up easy singles when they are ahead by 5 to avoid the chance of that guy hitting a home run.

A quick look seems to indicate that between 1-2% of plate appearances result in the baserunner reaching by error. I think that gets muddled in with other variances during serious evaluations.

Have you ever tried to decode the QB rating? I hear QB ratings every Sunday, and no one even knows what it means, much less whether it makes any sense.

If you cared about baseball and statistics at all, it's trivial these days to find good stats for baseball. If you don't like baseball, that's fine, but don't denigrate the good work that has been done over the past 30 years by people who do like baseball. Yes, I've spent my time as a stats-obsessed dork (though we prefer the term "stats-drunk computer nerd"), so maybe I take it a little personally, but baseball has been way out ahead of the other major sports in statistical progress. Even before the web existed, play by play data for baseball games was available to anyone who wanted to analyze it; check out www.retrosheet.org to get some sense of the history. (Disclaimer: I was a board member for Retrosheet for 10 years.)

... but if you wanted to keep track of the number of times a team won when a given guy was on the mound in the first inning, that might mean something. That's not what they do, though ...

Oh, they keep track of that also. And they keep track of who the official scorer was if you wanted to look it up for that game that concerns you.

One of the most impressive pitching performances I remember watching (on TV, unfortunately) was a "perfect one hitter". The pitcher gave up a single in an early inning, but erased that runner with a double play. No walks, no errors, 27 batters faced, no runs, no runners left on base.

By CCPhysicist (not verified) on 26 Sep 2008 #permalink

But no-- it's all up to the official scorer, whoever that is. If the guy hit the ball, but somebody "should've" caught it, then it's an error, not a hit, and the "no-hitter" would be preserved. So the whole thing turns into a debate over whether Sabathia bobbled the ball enough for it to count as an error or not.

Lots of statistics come down to judgment calls. I know you try to exclude outliers in your experiments; do you have a completely objective and unbiased way to do that? In my field, everything we calculate statistics over is a judgment call, so we just measure how much the different judges agree with each other. You just have to hope the judgment calls average out over time, and since they play 162 games, that seems likely.

(Granted, this is a bit different: since a no-hitter is at stake, the official scorer is probably more likely to score it an error. On the other hand, whether the game is called a no-hitter or not doesn't have much effect on any stats that really matter.)

First, it is not entirely clear that judgement calls made by refs in real time during the game are different from judgement calls made by scoring officials made in real time during the game. A judgement about whether an error was made or a catch was juggled before the receiver stepped out of bounds seem pretty analogous.

Second, however, I would think that variation in these judgements would contribute to random, rather than systematic, error. Therefore, over time, players should be credited when they shouldn't and not credited when they should have been, a roughly equal number of times. The main point of accumulating statistics is to judge players' skill relative to each other. In my mind, a person who is credited with a hit 35% of the time is a better player than one who is credited with a hit 15% of the time. Of course, the old statistics were problematic because the data violated the assumption that observations are independent of each other. The new statistics seem to be taking into account that other players on their team, the skill level of an opposing player, and so forth make a difference in one's own success on the field.

By The Dude Abides (not verified) on 27 Sep 2008 #permalink

You will no doubt be pleased to know that cricket batting averages are totally sensible: total career runs divided by number of times the player got out. Both of these numbers are instantly derived from match records. Some runs in cricket are known as "extras", and not credited to a batsman's score (though they're credited to the team's score), but this determination is made on the ground by the umpire at the time.

And the numbers are sensible: Bradman with an average of nearly 100 was clearly a freak; modern players who manage averages in the 50s or low 60s (Tendulkar, Steve Waugh etc) are the game's top batsmen.

By Michael Norrish (not verified) on 28 Sep 2008 #permalink

"Batting average" doesn't include plays judged to be errors, and doesn't include walks. If a batter reaches base on a walk, that at-bat doesn't even go into the denominator-- it's like it never happened.

Because it doesn't count as an "at-bat." Neither do sacrifice flies or bunts, and neither do fielder's-choice plays. This all makes perfect sense from the perspective of comparing hitting performance per se, though somehow I'm not surprised that people who don't like and don't understand baseball think it's stupid. *shrug* More peanuts for the rest of us.

By Sven DiMilo (not verified) on 29 Sep 2008 #permalink

What a bunch of rubbish.

"A guy hit the ball with the bat, and ended up safe at first. Your no-hitter is over."

And your oversimplification of the matter is mindboggling. You are speaking from ignorance.
you don't know what a "hit" is.

"If the guy hit the ball, but somebody "should've" caught it, then it's an error, not a hit, and the "no-hitter" would be preserved. So the whole thing turns into a debate over whether Sabathia bobbled the ball enough for it to count as an error or not."

Starting to sound like you know what you are talking about...starting..

"You'd think there would be nothing more objective and unambiguous than that: You look at how often a player came to the plate, and how many times he reached base safely, and the ratio of those is your batting average."

Why would this be logical?
if the hitter gets hit with the first pitch, it isnt like he had an opportunity to get a hit. That would hurt batting average...and thus walks, hits by pitches and others dont count towards at bats.

"You'd be wrong to think that, though. That statistic exists, but it's "on base percentage." "Batting average" doesn't include plays judged to be errors, and doesn't include walks. If a batter reaches base on a walk, that at-bat doesn't even go into the denominator-- it's like it never happened."

This is why any baseball fan worth his salt uses OBP over batting average.

"The whole concept of "wins" and "losses" for a pitcher is a little silly, given that a minimum of eight other people are involved in the game,"

It's more foolish because it's a counting stat and does not incur time (a rate).

"And don't talk to me about "saves." "

So you dont like counting stats? Thank god.

"the core statistics of the sport are all bullshit."

The only statistics that can be deemed "bullshit" are counting statistics.
Read up on OPS+ and tell me it's bullshit.

"Basketball has three somewhat major stats that involve the same kind of dubious judgment calls as the core baseball stats: assists, steals, and blocks. The real core stats, though, are absolutely unambiguous: when a shot goes up, either it goes in, and somebody is credited with points, or it misses, and somebody is credited with a rebound. There's no leeway for the official scorer to insert himself, no "well, the opposing center really should've blocked that shot, so those points don't count toward your season total."."

Wait, so now you like counting stats?
Seems like you just hate baseball because it makes you feel intelligent.

"Pro football has the lowest bullshit level of any major sport. The only commonly-cited statistic that involves a judgment call by the official statistician is "sacks." Everything else is clear and unambiguous-- somebody threw the ball, somebody caught the ball, somebody ran with the ball. No bullshit."

Except for...like...when they rule if he was in bounds or not, if his knee was down or not, if the guy next to him was holding someone and thus his 40 yard run doesnt matter, etc.

The fact that you are a "professor" was the only reason i really had any desire to respond to this ignorant silliness.

Am I the only one laughing at this guy for thinking getting on base on error helps (instead of hurting) your On Base Percentage?

A clueless article written by a clueless man who's so clueless he honestly believes he knows what he's talking about.

There is no bigger BS call in sports than a pass interference call in football. A judgement call which can be from 1-50 yards depending on the where it takes place.