Barry Bonds is on the verge of breaking Hank Aaron’s all time home run record in Major League Baseball. It’s inevitable. In fact by the time you read this it might have already have happened. But, eh, nobody seems to care much. Well, nobody outside of the Bay Area or Bristol, Connecticut.
This probably stems from the fact that hardly anybody not being paid by Barry seems to like Barry very much and that people think he’s cheated his way to the record through the use of various anabolic steroids, growth hormones (both human and cow apparently), and a couple of BALCO products known as “the cream” and “the clear.”
And of course flaxseed oil.
But other than once for amphetamines, Bonds has never tested positive for any PEDs. Still, this hasn’t ended the speculation.
For a few reasons there is still rampant suspicion that Bonds’ inflated power numbers come from better living through chemistry. First, there is his involvement with BALCO, Victor Conte, and trainer Greg Anderson. Then there’s this. Finally, just look at Barry. There has clearly been a dramatic change in his physique from his earliest days as a skinny kid in a Pittsburgh Pirate uniform to the Giant (both senses) he is today.
To quote Stuart Mackenzie, “Look at the size that boy’s heed. I’m not kidding, it’s like an orange on a toothpick.”
Still most of the reporting remains focused on the feds, the remnants of the BALCO scandal, the toothless investigation by the Mitchell committee, or some other informant du jour. It’s almost as if the media and MLB are both waiting for someone to hand over a smoking gun registered to Barry with his prints on the still warm grip (“Hey look, Godot!”). What there hasn’t been much of is an examination of the probability that what Barry has done was legit from a purely statistical standpoint.
The below is attempt to do just that. It’s a analysis of Bonds’ 2001 single-season record-breaking home run mark of 73. It’s long. Sorry. But, statistically, it’s interesting because Bonds really didn’t hit 73 home runs in 2001. Or at least you can almost prove it.
Admittedly that’s a bit of an irresponsible use of the word “prove.” The numbers don’t prove anything. And Bonds actually did hit 73 baseballs that cleared the fence during that season. What the numbers do show is that it was so improbable that it would almost be more rational to believe it didn’t happen.
Chicks Dig the Long Ball
It’s difficult to pinpoint exactly when steroids or other PEDs became a problem enough to compromise the historical integrity of stats in baseball, but the strike-shortened season of 1994 provides a decent enough breaking point. The numbers from that year really don’t mean much as the season was halted in August. And the following year, players started knocking the ball across zip codes.
In the entire history of baseball from 1876 to 1993, only two players—Babe Ruth in 1927 and Roger Maris in 1961—had ever hit 60 or more home runs in a single season. From 1995 to 2004 it happened six times.
Additionally, from the inception of the game through the 1993 season only 123 players had hit 40 or more home runs in a season. It was actually fewer players because, for example, Babe Ruth did it 11 times. So to be technical it’s 123 player-seasons. From 1995 through the 2004 season, there were 93 player-seasons of 40 or more home runs.
Put another way, the first 100 years of baseball had about 125 40-plus home run seasons, at the current rate, the next 100 years will have 900.
So clearly, despite baseball’s love of its own timelessness, something about the game changed. The ball is juiced. The players are just bigger and stronger. The ballparks are smaller. Expansion has diluted the pitching. In 1993 the Colorado Rockies franchise had its inaugural season, so since then, National League teams would have been launching shots in the thin Denver air. There has been all kinds of theorizing, but even every team playing every pitch of every game in Denver probably wouldn’t get you to 900.
What Bonds did, though, was so statistically outlandish that it can’t reasonably be explained by any or all of those (except perhaps *ahem* “bigger and stronger”). To demonstrate this, a little statistics lesson is necessary. It’s pretty straightforward as there are only three things you need to understand: a mean, a standard deviation, and a normal distribution.
If you never took a basic stats class or even went to college but you’ve watched the Price is Right, you’re two-thirds the way home.
The mean is just the arithmetic average that you learned in grade school. Add up all of your data points, and divide the sum by the number of points there are. Cake.
Standard deviation is more or less a mathematical measure of how “spread out” your data are from the mean. As a quick example, look at these two sets of ten numbers:
A) 25, 28, 29, 21, 20, 17, 29, 33, 24, 24
B) 10, 0, 190, 3, 7, 28, 4, 2, 5, 1
Both sets have the same average of 25, but set B has a much larger standard deviation because the data are much more spread out from that average. If you calculate it out, A has a standard deviation of about 4.6, while set B has one of 58.5. That’s a relatively large difference.
(The formula for standard deviation is not given here, but for the intellectually inquisitive, it’s not hard to look up. Or if you are even lazier than I am, any stats package or spreadsheet application is likely to be able to do it in a couple of keystrokes).
The last concept, the normal distribution, is probably more familiarly identifiable as a “bell curve.” If that doesn’t do it for you, either peek down the page or think of the game Plinko from the Price is Right. If you drop hundreds of pucks down the center of the Plinko board, most of them would end up bunched around the middle, some spread to the sides about the middle, and a few would be out toward the tail ends. Each puck would represent a data point, and cumulatively the pucks would look “normally” distributed. There is a mathematical function to describe the normal distribution but simply even printing it here would probably put you to sleep. Just know that it is indeed shaped kind of like a bell and it is symmetrical around the middle.
But there is a really important phenomenon of normally distributed data (and also really elegant, given how it combines all three relevant concepts here): About 68% of all your data points (your Plinko pucks) will be within one standard deviation of each side of the mean; and about 95% of your data will be within two standard deviations from the mean.
So look at the graph above. It’s of a normal distribution. The blue represents plus/minus one standard deviation, the pinkish represents plus/minus two. So for normally distributed data, almost everything sits within 2 standard deviations. If you go out to three (the green)? That’s 99.73% of your data. And that’s pretty much everything.
Don’t Know Much About History?
Back to Bonds. Again, he hit 73 home runs at the age of 37. Historically, as most baseball players have gotten into and past their mid-30s, they experience a drop in hitting power. That translates into fewer homers.
So, after going through the data, I found at least 38 baseball players who had three characteristics that made them similar to Bonds.
First, they were sluggers. This was kind of an arbitrary definition. But for these purposes, a slugger was someone who had at least one season of 30 home runs of more. There’s nothing particular to that number other than a 30-home run season is generally considered a solid benchmark of a power hitter.
Second, they played until at least the age of 37. That’s kind of self-explanatory. Bonds turned 37 in his record-setting year, so we are comparing him to players of the same age.
Third, in the season when the player turned 37, he was still an everyday player. Again, another arbitrary benchmark was chosen to define “everyday” but if the player appeared in around120 games (about 75% of the season) this was sufficient.
There are a couple of exceptions to these three criteria. Roberto Clemente is included even though his largest single-season home run total was 29. Mickey Mantle retired before age 37, so his HR total from age 36 is used. Also a few players did not play much at age 37 (Ott, Killebrew, Winfield). This is presumable because of injuries. Their numbers from age 36 were also used.
Additionally, with just a couple of exceptions, the players in the data set had a 30-HR season before 1994. This is so that the sample is (hopefully) free from having any players on PEDs in it.
In other words we are trying to get a non-steroid sample, and compare what Bonds did to that to show how statistically absurd his 73 home runs would be if he were indeed clean. Yes, players in the Seventies might have been on speed or cocaine. That’s a flaw that can’t be avoided and whether being wired on blow helps you hit more home runs or not is a separate argument.
The Older We Get, The Better We Were
Just meeting all the above criteria leaves you with a roster that reads like roll call at the Hall of Fame: Willie Mays, Mickey Mantle, Reggie Jackson, Babe Ruth, Ted Williams, Andre Dawson, Ernie Banks, Mike Schmidt, Hank Aaron, etc.
But by the time those guys hit age 37, collectively their power was clearly declining from their career highs. And the average number of home runs hit by the 38 players in the sample during the season they turned 37 was 22.32. For the same data the standard deviation was 9.19
Just think about this qualitatively for a second. For decades of baseball you’ve got all of these great hitters, just legends. And when they get old, they start hitting 22 home runs, 11, home runs, 29 home runs, etc. The only two players with remotely gaudy numbers at that age were the game’s two greatest home run hitters—Ruth with 41 and Aaron with 47. Then suddenly here comes this one guy and, when he gets to be that old, he hits 73. That’s more than a 50% increase over the next best guy!
Looking at it that way, just on the surface it seems a little outlandish.
Using that mean and standard deviation calculated above—22.32 and 9.19 respectively—and the normal distribution we can calculate precisely how outlandish it is.
Relative to that same data set, Bonds’ home run total of 73 was 5.51 standard deviations from the mean.
Actually that should read; 5.51 STANDARD DEVIATIONS FROM THE MEAN!!!!
So, what does that mean?
Well, remember from the diagram above that more than 99% of your data should be within 3 standard deviations of the mean in a normal distribution. So, based upon the statistics of players who played before the time PEDs were thought to have become a problem in baseball, we could play hundreds and hundreds and hundreds more years of baseball and we wouldn’t expect to find hardly anyone at age 37 hitting more than the mean plus 3 standard deviations, or about 50 home runs in a season (This is kind of born out in that even the greatest home run hitters ever, only managed 41 and 47).
Bonds is another 2-plus standard deviations away! There simply aren’t outliers that far out. Just liars.
Okay, cheap shot. Yes. But we can put an actual number on the improbability.
Take that 5.51 number (for those of you who have had anything beyond a basic stats class, you might also know this as a Z-score) and compare it to what’s called a standard normal. A standard normal is just a normal distribution with a mean of zero and a standard deviation of one. By using a standard normal distribution and a Z-score you can calculate the probability of what Bonds did. Ready?
It’s so remote it doesn’t even exist.
Sort of. Going up from zero, most standard normal tables stop at Z-scores of 3.89. Huh?
What that means is once you get something that is roughly 4 or more standard deviations from the mean, statistically it’s so close to a zero-probability event that statisticians don’t even bother. Remember that almost alll of the data points should be within 3 standard deviations and the farther out you go the smaller the tails get.
Bonds was 5.51. Think about that for a second.
In fact the two textbooks I own with Z-score tables both stopped short of 4. I had to go search online for a table with values large enough.
Once you get to 5.51 standard deviations, the probability of a 37-year old hitting 73 homers is .000000019 (that’s seven zeros before getting to that one-nine).
To put it another way, from a statistical standpoint, almost 53 million sluggers need to play meaningful baseball through the age of 37 before you would expect to see one guy who hits 73 home runs in a season.
Fifty. Three. Million.
So far, in the 125-year history of the game, there have been about 50 sluggers playing meaning baseball after age 37.
Again. Fifty-three million. Compare that to less than fifty.
If you think Bonds is 1-in-a-million, you are off by a factor of 53.
May 12, 2007 at 8:45 pm
very convincing, nice post
May 12, 2007 at 8:48 pm
Jeezus! That flaxseed oil is some amazing shite…where can I get me a box?
May 12, 2007 at 8:49 pm
This is a fascinating post, but aren’t there a lot of elite athletes who do things that the stats would say are nearly impossible? Let’s say you performed the same exercise with Babe Ruth’s 1927 season, using only data that was available at the time. What chances would you give him of hitting 60 homers?
May 12, 2007 at 9:34 pm
[…] statistical improbability of a man hitting 73 home runs at the age of 37, and lots of other stuff. [Kermit the Blog] • Just your average “fan dies after falling into a moat” story. [The Sports […]
May 12, 2007 at 9:45 pm
As a person who understands the full impact of what “Statistically Significant” means, I really appreciate your thoroughness, and your crunching these numbers.
It’d be interesting to run similar comparisons for perceived outlier seasons such as Gary Mathews Jr.’s 2006, or any number of standout seasons from average players.
May 12, 2007 at 9:48 pm
I was thinking along the same lines as the poster “MDS” but with regard to Babe Ruth’s 1920 season…how improbable would it be to hit more home runs than any other team?
Fascinating read though – I’m just not sure what to make of it.
May 12, 2007 at 9:54 pm
Dan: At somepoint I plan on doing something along the lines of what you suggest with players who had single seasons that were off the charts compared to previous seasons. Brady Anderson’s season where he hit 50 HRs as a lead off man also comes to mind.
Also I will try to post a link to the data used here. It’s all culled from baseballreference.com.
May 12, 2007 at 10:04 pm
This is a horribly flawed argument. For one thing, you ignore that players as a whole are bigger/stronger today (just like the population), so comparing what player X does at age 37 in 2001 against what player Y did at the same age in 1931 is apples and oranges.
Second, your sample is arbitrary as hell, adding Clemente (even though he didn’t hit enough homers to be in the group) and Mantle (even though you are now comparing his homers at 36, WHEN HE WAS ON THE VERGE OF RETIREMENT DUE TO INJURY, to everyone else at 37).
Third, you present the findings as if they are concrete facts. They are not. Bonds’ season is only 5+ standard deviations away from YOUR SAMPLE GROUP, not from any sort of concrete reality. That would be no different than if you’d created a group of, say, players who spent time in the NL East, lumped in Terry Pendleton and Dave Magadan, and then “concluded” that Bonds’ season was 345.32 standard deviations out.
Fourth, you seem to be in this dreamworld, where only Bonds was juicing. If the vast majority of players (including pitchers) were using PEDs, then his advantage is not this overwhelming amount that people like to claim. Maybe next, you can give us some sort of groundbreaking study about how his taking hGH was worth X numbers of homers that year.
May 12, 2007 at 10:11 pm
The biggest flaw in this is that you use sluggers-who-reached-37 as your baseline. It seems pretty arbitrary to work backwards from a narrow definition of Bonds in his record-breaking season. Of course, a full consideration may make it even look less likely, not more.
The reason that so few players are in his comparables list is because just making it to 37 as a starter is a rarity. If you want to just compare him to sluggers, compare him to EVERY slugger who ever hit 30 HRs in a season; if they didn’t make it to 37, just give them a zero (or whatever they got coming off of the bench). This would probably lower the mean and increase the standard deviation. You can tell me if that makes it more or less likely.
May 12, 2007 at 10:21 pm
First, this sample comes from across the years, not just 1931. So as people simply get bigger and stronger, that should be reflected in the data.
Second, yes, I full acknowledged some arbitrary definitions and explained them. “Arbitrary as hell” would be grossly innacurate. Mantle played in 144 games the season he turned 36. And for fun, I ran the numbers without Clemente. Bonds ends up being 5.48 standard deviations from the mean instead of 5.51 and since my z-score table doesn’t have values past the tenths (i.e. 5.4, 5.5, 5.6 etc..) I would have rounded to the exact same value used.
Third, I do not present this as anything more than data and am pretty transparent about my assumptions all the way through. I tried to construct a data set of players that would be comparable to Bonds as a power hitter.
Fourth, that is a remarkably erroneous inference on your part. In no way do I intimate that Bonds is the only player who is “juicing.”
May 12, 2007 at 10:48 pm
I am 100% confident people used and are using PEDs. I’m not sure how your calculations show that PEDs must be the cause for the homeruns. First, Tom House admitted to using in them in the 60s and labeled their use widespread. Second, other factors have changed in baseball making the home run more likely – smaller ballparks, smaller strike zone, etc.
May 13, 2007 at 12:01 am
[…] Beisbol Been Barry Barry Good to, uh… Barry Lies and Damn Lies[image] Barry Bonds is on the verge of breaking Hank Aaron’s all time home run record in Major […] […]
May 13, 2007 at 3:53 am
I would love to see Brett Boone put through this. The most obvious juicer in this or any era in my opinion. He is Bond’s White Mini-me.
May 13, 2007 at 6:44 am
Interesting analysis… I wrote on the same topic, in a different manner, at my site recently. I contend, however, the true story of Barry’s leap is more complex than we’re normally presented. If you peruse “Game of Shadows” you find the authors arguing that Bonds went on PED sometime during his injury-plagued (still 34 HR/102 games) 1999 season. Yes, he smacked 49 HR, again injury-shortened, while playing in a new, pitcher’s park in 2000. OK. Yet, for an interesting analysis chart his performance against lefties, which always hounded him, and you’ll see it drops from his early-1990s MVP peak before bottoming out at around .200, mind you, during a year he finished second in the MVP voting. Now, look at his numbers v. LHP from 2001-2004, he DESTROYED them. Suddenly, a prime weakness of a great player became an added strength. How do we account for this? I’m more than willing to listen to arguments claiming steroids or HGH led to this extraordinary rise in productivity–far more, I would humbly suggest, than your study of HRs. Increased power even in the late 30s I can reconcile, but suddenly hammering left-handers? Why would that occur, so very late in one’s career? Barry has always employed a highly compact, choked-up power stroke. I tend to think he figured something out in the winter of 2000 and returned to post the most eye-popping numbers (batting-wise) in baseball history. Also, you focus on 73 HR, admittedly, I’m not that impressed, even as it stands as the record. 73 HR in 476 AB, though, is a far different matter. His high strikeout total (93, highest since his 1986 rookie season) seems to indicate he “went for it” that year more than any other–certainly more than his 46 HR/41 SO season of 2004, for example. He’s in a bit of slump now, but he remains (possibly) on course for MVP #8, at the age of forty-three by the end. If he does it, 35-65 odds perhaps, that would have to rank among the most incredible accomplishments for any sports figure. As I explained in the post, April 2007 “boosted” him to the top of the best I have ever seen.
May 13, 2007 at 2:21 pm
Just another variable, but “age 37” means something different now from the time Ruth played as well.
Life expectancy at the turn of the 20th century was about 49. Today it’s pushing the mid to high 70s, with general acknowledgement that people, not merely athletes, who would be expected to be more physically able than the average Joe, are healthier, more “youthful,” at later ages.
That said, the whole Bonds-circus makes me wish Griffey would have stayed healthy, so that a likable guy never suspected of using PEDs was breaking the record.
May 13, 2007 at 5:12 pm
That Bond’s hr total that year differs greatly from what other players his age have achieved is clear. And it is interesting to try to ask the kinds of questions you are of the hr data in order to understand the probability of his 73 hr season. To be able to use standard deviation and then a z-score, t-test, chi-square, or whatever other statistical test for similarity or difference on the set of data you describe is something altogether different. Statistical tests will always give you an answer but there are all sorts of assumptions your data must meet in order for any specific test to be valid, such as whether the data is parametrically vs. non-parametrically distributed, the nature of the sample and population, etc. You assume that your data set, among other things, is normally distributed and I am not so sure that it is. With advances in training, strategy, technology, etc. there seems to something going on so that where perhaps stats during any single season may be distributed normally (I don’t know, but maybe) stats across years might not be. Who knows. All I learned in my stat classes is that whenever I want to apply a statistical test to data I need to talk to a real stats person because that stuff gets so darn complicated.
May 13, 2007 at 7:53 pm
uh, who established that these numbers come from a normally distributed population? and certainly your sample is hardly large enough to invoke any large number laws of normality….c’mon, you gotta try harder than this.
May 13, 2007 at 8:26 pm
That’s perhaps the biggest flaw in the argument, that this would be normally distributed data. But, having plotted it out, it passes an eyeball test. It’s probably more like a Weibull distribution with a little bit of left skew. Without trying to actually calculate things like kertosis and the like, qualitatively Bonds mark of 73 would still be an outlier several standard deviations from the mean.
Also, once you get above a sample size of 30 error terms become pretty small. For statistical inference in the case of a normal distribution, it might not be ideal, but there just aren’t a plethora of people who play everyday and hit with power at age 37, which is kind of the underlying point of the argument.
As for “gotta try harder”… Uh, nobody is paying me to do this, my man. And what you see above already represents plenty of hours of time.
May 13, 2007 at 9:18 pm
I feel that Bonds, did, in fact, at some time use PED’s, and most likely for more than a brief period of time. I’m not convinced by this that his home run total is terribly affected. If I were to have done this reserach (and I won’t, because I’m just not good enough at stats yet, although am trying to teach myself), I would have found it more useful to see how many of them eclipsed their previous career high in home runs by as many standard deviations as Barry must have. Because not many players in history have probably bettered their previous seasonal home run high by 50%. So that would be interesting, if you had the time or desire.
May 14, 2007 at 6:41 am
I’m sure you could use a similar “statistical analysis” to prove it was “impossible” for kobe bryant to score 81 points in a game last year or for Roger Clemens to have career best 1.87 ERA at 42 years old (ok maybe that further proves the power of steroids?) – regardless whenever there is a special player in any sport his numbers are going to be statistically improbable compared to every other player, that alone is proof of nothing other than that stats can made to support almost any argument. and regardless of any advantage PED’s may give it’s clear to me that Babe Ruth had more advantage from Polo Grounds in 1920 & 1921 than Bonds had from any use of PEDs and the advantage Helton has got from Coors field is far greater than PEDs as well. Ban Coors Field and then you can talk about the sanctity of baseball stats.
May 14, 2007 at 6:44 am
maybe it’s impossible for magic johnson to have 3 times as many triple doubles in the playoffs than any other player ever. it seems quite strange that anyone player would have such a huge advantage in any one stat. i dont think you can really compare what playesr did at old ages before now. it’s like comparing what 80 olds are like now to how they were in 1820. things have changed. people know how to take care of themselves better – again – regardless of PEDs – using completely legal substances and better training methods – athletes can perform at a higher level later into their careers now than they could in the past.
i also strongly believe that barry bonds rested on his godlike skill early in his career – it wasnt till late in his career that he finally harnessed his god given skills.
barry bonds is simply the greatest athlete to ever live.
May 14, 2007 at 6:46 am
and seriously think about what i said regarding roger clemens – CAREER BEST ERA at age *42* … how many pitchers in the history of baseball can say that? i would not be at all surprised if clemens was the FIRST. so if you are going to play these types of statistical bullshit games – do it with everyone, buddy.
May 14, 2007 at 6:46 am
also small donkeys live in the goddamn sand !
May 14, 2007 at 10:35 pm
Awesome job. Regarding the 1.87 ERA at age 40+, there’s really no question in my mind that Roger Clemens is suspicious, if not outright obviously guilty. The one thing that I find with steroid arguments is that people don’t accept that there are different LEVELS of dosages. There’s really no question in my mind that the stuff is ALL over baseball and basketball and football. Period. Just run the footage from as recently as 15 years ago; the human race hasn’t evolved THAT much. But science has.
I have spent 20 years on a baseball diamond as a hitter and a pitcher. I’ll tell you, if you don’t have intimate experience with the science of hitting, you can’t comprehend the advantage Bonds has in his so-called ‘legendary batting eye’ due to his enhanced strength.
The reason sluggers used to generally be low-average guys is that they had to use more of their body’s strength to hit a ball out, meaning their head was nearly impossible to hold still. That’s where the strikeouts come from, a mighty swing displacing your swing axis and moving your eyeballs even a fraction of an inch, creating an optical illusion where you swing right through where you perceived the baseball to be.
Barry, with at least double the strength necessary to hit a home run, simply never has to swing hard. He can wait a split second longer thanks to his enhanced bat speed, and when he swings, his eyes don’t move ’cause he ISN’T swinging hard.
Players used to have to swing really hard to hit the ball like Bonds does. But the pitching was nowhere (and i mean NOWHERE) near as good in the 20’s, 30’s, 40’s, or 50’s as it is today. Incomparable. Strides were made in the 60’s and 70’s, but meaningful comparison is only possible starting in the late 1980s.
May 23, 2007 at 12:59 am
I find baseball and statistics both boring as hell. But this is a damn good post and a lot of food for thought. I wish some of those jock-sniffers on the cable networks and sports radio would take as careful a look at stuff like this as you have.
If you are a commentator on radio or cable I’m gonna sound stupid, ain’t I?
June 19, 2007 at 3:01 pm
I think the main arguments to this are the life expectancy thing (37 was considered a lot older back then than now. To make it more accurate wouldn’t you have to use it as a % of life expectancy? 37 is 78 as ___ is to ___ from back then?…); and also how you could use this in all sports anomalies like Wilt’s 100 point game, Brady Anderson’s 50 HR season, the Brett Boone season, the Clemens season, Kobe’s 81 point game, etc.
June 21, 2007 at 2:15 am
I found your article fascinating, but it doesn’t exactly require college level statistics to proclaim Barry Bonds an extreme outlier; he might even be without his power spike post age 37.
However, what percentage of his power spike can be attributed to any PEDs he might be using? This has always been my biggest problem with the whole Barry Bonds situation. He plays in a pretty extreme pitcher’s park on top of his age.
My problem with PEDs (which I discuss on my blog) is that we’re continuing to focus on a problem that has been as solved as it’s going to get. I think MLB made huge strides with the new guidelines on steroid punishments. Bringing more attention to the problem won’t help anybody right now. Mark McGwire can’t even be seen in public now, and he didn’t even do anything against MLB’s policies at the time!
The best solution is for Bud Selig, as the public face of MLB, to apologize to the public (not Jason Giambi) for baseball’s unwillingness to do anything about the steroid problem, say they’re taking steps to resolve it, and let the matter drop.
August 10, 2007 at 4:41 pm
I agree with the 2-3 people who mentioned Babe Ruth’s 1920 and 1921 seasons…why don’t you run those seasons through your little statistical analysis and find out how “impossible” it was that one player would outhomer almost every other TEAM in the league. Hey, maybe Babe was on steroids!
August 10, 2007 at 6:18 pm
Re: Ruth’s 1920 season. There were a series of very good reasons that Ruth started smashing the ball out of the park which could be observed qualitatively. The two most obvious include
1) Ball quality/end of the spitball. Pitchers used to chew tobacco and spit on the ball, which would make it curve erratically (water on the ball) and also make it harder to see (the brown juice). There was also only one ball used per game. After Ray Chapman died in 1920 after being beamed with a ball he couldn’t see, they started using more balls per game, which were thus whiter, easier to hit, and not covered in a foreign substance which made their flight path unpredictable.
2) Swing style. The way baseball was played included a lot of speed-based batting tactics like the “Baltimore Chop” and bunts, a lot of stealing, and very few home runs. The idea was to get on base and move a guy over by sacrifice and trick plays. Ruth came in with an uppercut swing and a combination of right time, right place, and right superhuman freak of a baseball player made him what he was.
You can verify this with simply observation. Not long after Ruth played you started to get other sluggers with a smooth uppercut swing, guys who hit for power instead of relying on speed. So home runs went up. Can anyone point to one easily observable historical fact that made home runs shoot up in the mid-nineties, like I just did for 1920? And don’t say “bigger and stronger,” because vitamins and weight lifting techniques didn’t increase so much during the offseason 1994-5 that everyone became a superhero.
Obviously Bonds was a great player beforehand, with a swing that was a work of art and the plate patience to draw walks and get pitchers mired in ugly counts. If one were to add PEDs to that mix, you might get what Barry became.
August 10, 2007 at 6:20 pm
To the people who want to see Ruth’s numbers run for ’20-’21, please keep in mind that those years coincided with the introduction of the lively ball, so it just isn’t meaningful to compare Ruth to what people had accomplished up until that point in time.
Ruth’s numbers were different than prior players because he played the game differently (he tried for home runs when everyone ‘knew’ it was a sucker bet to swing for the fences) paired with vastly different conditions (i.e. lively ball).
As to the various criticisms leveled about the analysis (e.g. comparing guys from different eras, being 37 in 1920 was different than being 37 in 2000, using Mantle’s stats at 36 instead of 37, is the data truly normal, etc.), it seems to me a bit like focusing on a tree and missing the forest.
The point of the article is that Bonds at 37 performed a crazy, silly high level, and the extent to which outperformed other sluggers of historical stature (as well as his mostly younger contemporaries) could easily lead a rational person into thinking he used steriods. And no amount of data slicing/preparation will change that conclusion.
August 10, 2007 at 6:33 pm
Interesting, thoughtful analysis.
I’m fine with the normal assumption, but ideally to demonstrate the (un)likelihood of such an outlier, you’d have to run some kind of regression to account for park effects, increased longevity of modern players, era effects, etc. In light of all that, sure it’ll still be an outlier, but not 5+ standard deviations. This is, of course, if you want to attribute some of Bonds’ performance to cheating. If you simply want to show that, in a vacuum, something is very different from the all those other sluggers’ age 37 seasons, then yes, you’ve done that quite well.
August 11, 2007 at 12:04 am
Followed the linky from FJM. Awesome stuff. I’d done an easier lookup by just comparing numbers of homeruns hit by all the game’s top sluggers when they were 36 and 37, in an effort to convince people how fake Barry’s 73 was.
Once I showed them that Ruth and Aaron hit 85 and 87 for those two years, Mike Schmidt hit 72, Raffy – Steroid User – hit 90, and nobody else even reached SIXTY, and Barry hit 119 (!), coins started dropping here and there.
Next time, I’ll just provide them this article. :D
Great stuff.
August 11, 2007 at 12:05 am
Just a quick response to the comment above mine
“you’d have to run some kind of regression to account for park effects”
Barry played in about the most difficult park to hit in. Doesn’t that make his 73 even more ridiculous? I mean, how many homers would he have hit in a single season had he been in Denver? 100+?
August 11, 2007 at 1:04 pm
Building off of post #32 (Paul), another aspect of Bonds’ performance which isn’t talked about much was his level of performance in his late 30’s relative to the rest of his career.
Nobody has an aging pattern even remotely close to Bonds. Bonds’ best 5 consecutive seasons as a player were between the ages of 35-39, and that is by a wide margin, which is no mean feat given that:
a. he was an awesome player prior to 35, and
b. his level of performance was remarkably consistent prior to age 35.
Virtually all players have their peak years occur earlier – typically ages 26-30. I know of no player (and am fairly certain one doesn’t exist) who increased in value as a player as much as Bonds did in their late 30’s. Hell, most players are doing well just to keep their job.
Can anyone cite another player who had a career of significant length (i.e. 10+ seasons) that:
a. had their peak 5 years at ages 35-39, and
b. their peak years were significantly better than their average season outside of the peak years?
August 12, 2007 at 6:57 am
I completely agree with the previous comment. Combine the fact that Bonds’ peak years occured between 35-39 with the information in Game of Shadows indicating that that was EXACTLY the age that Bonds began juicing (the timetable is deatailed in the book with great specificity). I don’t see how a rational person could possibly conclude that Bonds’ jump from great to superhuman isn’t due almost entirely to PEDs.
August 13, 2007 at 1:23 am
Regarding pot #18, Precious Roy is concerned about the data being normal. Worry not, Precious Roy, something called the Central Limit Theorem comes into play. Simply put, it states that regardless of the nature of the actual data, the measurements (mean, std dev, etc) will assume tendencies consistent with a normal distribution. Bottom line: it doesn’t matter how normal the base data is if the sample size is large enough. How big is large enough? Remarkably, at about 25 or larger, tings fit nicely. You’ve done good work, Precious Roy, you need not apologize!
August 13, 2007 at 1:25 am
And if I could type, I would have said POST # 18, and, in the penultimate sentence, THINGS fit nicely.
August 14, 2007 at 12:29 am
There are a few other thing you should be looking at when it comes to B*rry R**ds.
1.) He is at a point where he can no longer work out like he used to, but somehow he continues to get bigger. Logic tells us that older + working out less = getting smaller or fatter, not bigger. But somehow he’s not taking any illegal enhancements?
2.) Even though he’s never tested positive for anything, the tests aren’t as efficent as most would believe them to be. That’s because the drugs update as fast as the tests do. The newer version of the drugs would then be undetectable by any of the newer tests.
3.) B*nds is a cheater, not an idiot. And anyone who’s smart enough would keep his supply updated.
If you wish to continue this, feel free to email me!
August 14, 2007 at 5:16 am
This is an interesting proposal, but statistically it has a slightly flawed analysis. It is impossible to straight compare numbers of hitters from the 70s and before to barry bonds in 2001. As the author notes earlier in the post, the balls are juiced, parks are smaller, bats have changed, pitching strategies have changed; there are too many outside variables that have an effect on a batter’s hitting ability that you need to make sure that you include all of them to do a proper analysis.
I would reccoment looking at Baseball Prospectus, it’s an online baseball statistics newsletter and they do a great article about a real statistical list for the greatest single-season HR lists and career HR lists. It does a better job of normalizing as many hitting factors across time as possible. It’s interesting.
Also, anabolic steroids were synthesized in the 1930s… so, we dont know if Mantle was using them or not. But in the end, it doesnt matter, becasue they weren’t made illegal until 2005, so even if Barry was juicing when he hit 73, the record should still stand.
August 14, 2007 at 1:57 pm
Oh. Here is the link for that Baseball Prospectus article:
http://www.baseballprospectus.com/article.php?articleid=6454
check it out
August 16, 2007 at 4:37 am
Post #34: Roger Clemens age 35-39.
And nice analysis showing how people could think Bonds took steroids…….What?
Of Course people think Bonds took steroids; with what he did, it would be asinine to not consider steroids.
But, how about these other factors: The NL West terrible pitching & unbalanced schedule, McCovey Cove being perfect for Bonds’ swing, the smaller strike zone which gives pitchers little room for error, the fact that these days athletes have been getting better later in their age (Jerry Rice, for one, and a guy Bonds worked out with before the 2000 season).
Oh, and he hit 73 in one year, and then his home run totals went back to about his average(during his SanFran tenure).
I think he accidently took steroids in 2003, like he said. I can think that. You can think he took steroids before. We’re never going to convince the other of our opinion, so how about we stop talking about it?
August 16, 2007 at 4:39 am
When I say getting better, and then cited Jerry Rice, I should’ve said stayed in prime shape, for I know Rice did not in fact get better, but he did stay in prime shape. That was my bad.
August 17, 2007 at 7:12 pm
Posts #’s 41 and 42:
Please, please read “Game of Shadows.” Please.
Also: citing Clemens as an example of someone defying normal aging patterns isn’t very convincing, considering he’s surrounded by his own cloud of steroid suspicion and was (allegedly) named in his teammate Jason Grimsley’s affidavit as a steroid user.
August 17, 2007 at 8:03 pm
With regards to Post #41:
Roger Clemens best 5 year stretch is not ages 35-39. This time frame represents a bit of a low point in Roger’s fine career, and contain 2 of his worst seasons (based on ERA+ and innings pitched). It may, in fact, be his WORST 5 year stretch.
Clemens was certainly a better pitcher from ages 25 to 29 than he was from 35 to 39.
As to external factors you cite as possible causes for Bonds’ improvement (i.e. bad pitching, small strike zone, ball park (note:AT&T is a poor HR park), unbalanced schedule, etc.), why has Bonds seemingly been the only one to benefit (to anything close to the extent he has) from these factor?
Lastly, as to the claim that, “Oh, and he hit 73 in one year, and then his home run totals went back to about his average(during his SanFran tenure).” – I am afraid you are wrong again.
Bonds had the following HR totals ages 35-39: 49, 73, 46, 45, and 45 (average: 51.4 HR). Only once in his career, before or since, did he have 45 or more HR (46 HR in 1993). 1993-99 (SF years), Barry averaged 38.4 HR. Even if you throw out the 73 HR season, he averaged roughly 46 HR between 2000-2004, which is a 20% increase over the balance of his pre-alleged steriod SF career.
In closing, please check stats before making claims. It takes like 5 minutes. They can be found at baseball-reference.com.
August 20, 2007 at 7:53 pm
As linked to in #40, BP did a translation that shows that Bonds’s season was not at all unlikely. Given modern playing conditions Ruth & Gehrig would have surpassed his totals and there would have been many 70+ seasons.
The problem with this analysis is that it assumes conditions are the same. Clearly there are many more home runs now than in previous eras.
Any sort of record is always going to be unlikely. The odds of DiMaggio’s hitting streak are much worse than Bonds’ HR record as you calculated. Yet it also happened. Did he also use PEDs?
August 22, 2007 at 11:19 am
To post #45:
Yes, Ruth and Gehrig’s best season’s translate to the equivalent to a 70+ HR in today’s environment, but those seasons did not occur when they were 37 years old.
The whole premise of the above article was how unlikely Bond’s season was for a 37-year old.
I don’t think this point can be over-stated.
September 22, 2007 at 7:15 am
It’s not just the HR totals, but the rate at which he was hitting them. Because we all know that even if his HR totals “only” went up 20%, he was also doing it in a lot fewer at bats because of all the intentional walks. Take his numbers from the ages of 34-39 (leaving off the season he hit 73) and you get one home run for every 9.14 at bats (219 hrs in 2001 at bats.. add in his season with 73 homers and the number jumps to one hr for every 8.29 at bats)
now take his next best five seasons (ages of 27, 28, 29, 31 and 32) and you get one hr for every 12.32 at bats (199 in 2452 at bats)
January 31, 2008 at 1:49 pm
I read this article last year and then took steroids for 6 months. Look for me next year in MLB, hitting 60-70 HR!