Scouts versus Stats: A Case Study in the Diffuse Nature of Knowledge
Baseball is a game that lends itself especially well to the compilation of statistics. Almost any important event in baseball is strictly distinct, discrete, and countable. The cost of capturing the data is very low in comparison to other sports. In effect, if any sport may possibly be understood empirically and scientifically, it is baseball, after setting aside pseudo-sports such as poker. The study of the compilation of the data and its subsequent analysis has been coined “Sabermetrics” by its patriarch, Bill James. The best history of the movement is The Numbers Game by New York Times journalist Alan Schwarz.
In that book, Schwarz notes the interest fans shared in the countability and the elegantly discrete events of the game found in the box score. These numbers were eventually added together at the conclusion of the season, so the fan could learn who led the league in any countable category. It took until the 1960s for a scientist named George Lindsey to dig into the data. He discovered many tenets of Sabermetrics that were later independently re-discovered two decades later. Lindsey quickly faded into obscurity, or more accurately, never escaped from it. It took for a bored security guard named Bill James to get discovered by chance for such works to get published to a large audience.
The form of Sabermetric works were to challenge the prevailing wisdom of announcers, journalists, front office executives, and yeomen scouts. The archetypes of such works include,
- On-base percentage and Slugging Average, or how many men you put on base and how often you hit extra base hits, are the key determinants of offensive production.
- Managerial actions and “small ball” strategies were either pointless or deleterious to one’s chances of winning. This includes, but is not limited to, the choice of batting order, the stolen base, the hit-and-run, and the sacrifice bunt.
- “Clutch hitting”, situational hitting, and subjective factors were minuscule in effect, if they exist as a skill at all.
- Performance evaluation is more accurate than subjective, qualitative measures scouts use.
These points of view, the books and websites dedicated to them, and the rise of fantasy sports and companies like STATS, Inc. cemented their influence, which entreated upon the conscious of the mainstream baseball fan by the late 1990s. The watershed event, the publication of Michael Lewis’s Moneyball, came in 2003.
Moneyball follows the history and philosophy of Oakland Athletics General Manager, Billy Beane. It portrays him as a nearly psychopathic maven of Sabertmetric principles dedicated to winning, despite his franchise’s meager budget. Beane exploits the purported market inefficiencies of the undervaluation of players who walked frequently, players who had good defensive range, and college players in the amateur draft.
What is most memorable to many is the book’s abrasive attitude towards scouts. They are treated as stubborn, irrational traditionalists who don’t give teams anything that statistics could already tell them. They stupidly scorn certain categories of amateurs and minor leaguers such as short right handed pitchers, those who have “soft” bodies, and players without plus potential in either speed or power. Beane ignores these biases and concentrates solely on the performance of their amateur careers.
“The guy’s an athlete, Billy,” the old scout says. “There’s a lot of upside there.”
“He can’t hit,” says Billy.
“He’s not that bad a hitter,” say the old scout.
“Yeah, what happens when he doesn’t know a fastball is coming?” says Billy.
“He’s a tools guy,” says the old scout defensively. The old scouts aren’t built to argue; they are built to agree. They are part of a tightly woven class of former baseball players. The scout looks left and right for support. It doesn’t arrive.
“But can he hit?” asks Billy.
“He can hit,” says the old scout, unconvincingly.
Paul reads the player’s college batting statistics. They contain a conspicuous lack of extra base hits and walks.
“My only question is,” says Billy, “if he’s that good a hitter why doesn’t he hit better?”
“The swing needs some work. You have to reinvent him. But he can hit.”
“Pro baseball’s not real good at reinventing guys,” says Billy.
There was almost immediate backfire from the old guard. An analogous book, taking the opposite perspective, was written in 2005. Joe Morgan, a Hall of Fame second baseman and announcer, became known for criticizing the book every chance he had. The anti-intellectual mob grasped its moment to subjectively and stubbornly assert itself over the scientific, evidence-based minority favoring truth over tradition.
Then something weird happened.
Nate Silver of Baseball Prospectus began developing a projection system known PECOTA. The algorithm would look at the historical data of all players to find the most similar to today’s to create aging profiles. All explanatory variables were considered to build the most accurate model of the “shape of the player”. Being the good scientist, Silver, included a few variables that were often ignored by the sabermetric community, such as height and weight, in making these calculations. These variables were considered to be irrelevant and a vestige of the stubborn opinions of scouts.
Both variables were statistically and practically significant in the model.
A favorite prospect of the sabermetrics crowd in the early 2000s was Jackie Rexrode of the Arizona Diamondbacks. His performance was everything such analysts wanted in a leadoff hitter. He walked frequently, which, when coupled with a reasonably high batting average, portended to a perennial all star candidate when he reached the big leagues. He even at good speed, which while immaterial to most analysts, was pleasing aesthetically for someone at the top of the order. Scouts couldn’t get what the big deal was, insisting that his frequent walks were only the result of a decent batting eye. Once he reached a level where pitchers could throw strikes consistently, he would “get the bat knocked out of his hands” when forced to swing.
Jackie Rexrode never made the major leagues, not even with the Oakland A’s.
These two episodes, where the analysts we so irrefutably and unambiguously wrong about ideas they were very excited about, were not the only sharp objects maiming the hubris of sabermetrics. The pretensions to science attracted empiricists who actually deserved their pretensions to science. Major League Equivalences, statistics that attempted to morph the statistics of minor or foreign leagues into what they would be “equivalent” in the major leagues, were derided by Tom Tango on theoretical grounds. These equivalences, developed initially by Bill James and expanded later by others (PECOTA uses a modified form of Major League Equivalences known as Davenport Translations), are a keystone to many statistical projection systems, whether that be PECOTA or the assumptions that Billy Beane’s then-assistant, Paul DePodesta, made in Moneyball. Sabermetrics additionally became criticized for its over-reliance on linear regression elsewhere. A decidedly un-profound statement hit the community. Analytics is hard.
Shortly after Moneyball was published, Dayn Perry wrote an article that framed the irrationality of the “controversy” in what quickly became a race to state the obvious.
A question that’s sometimes posed goes something like this: “Should you run an organization with scouts or statistics?” My answer is the same it would be if someone asked me: “Beer or tacos?” Both, you fool. Why construct an either-or scenario where none need exist? Heady organizations know they need as much good information as possible before they make critical decisions. Boston under Epstein, for example, is a veritable clearinghouse for disparate ideas and perspectives, and so far it’s working just fine.
Silver himself later suggested the over-ambitiousness of earlier sabermetric studies.
Being willing to admit when you are wrong, or at least when your knowledge is limited, tends to help one’s credibility when pressing the really important points. This is a little piece of psychology that all good politicians (and all good poker players) recognize. There is, in fact, a sort of feedback mechanism at work here: as sabermetrics moves more comfortably toward the orthodoxy, it can acknowledge more freely those places where it performs imperfectly, just as a standing president with a high popularity rating can withstand a scandal that would kill the careers of a thousand lesser-knowns in the party primaries. That admission, in turn, should help to increase the sympathy that traditionalists have for analysis, enhancing dialogue and pushing both sides toward the center.
Baseball Prospectus co-founder Gary Huckaby wrote an almost defeatist understanding of how an objective history may portray Moneyball.
Think about it—what are the real lessons, the ones that can actually be applied, that one can take from baseball analysis? Let’s go through the biggies:
- OBP is good. [...]
- Don’t slag pitchers’ arms. [...]
Really, those are the two big lessons. There are a ton of other lessons, many of them valuable, many of them related to correcting the dysfunctional behaviors created by managing to baseball’s accounting system, rather than to winning games and championships.
And later in the same article,
Another way to look at the issue brings the point home much more directly—the scouts versus stats battle never really existed, and that the scouts won. People making their living in front offices have played Oracle and IBM to the analysis community’s “open source.” Those companies (and others) are happy to let a self-directed, competent, and uncompensated gaggle of fragrant, bearded unix gurus take time out from watching Mystery Science Theatre 3000 to develop a fantastic piece of software for the masses, then adopt it as their own without having to spend a huge amount of their own resources on the project. In terms of baseball analysis, the front office folks have learned the lessons, at least the most important ones, and have already internalized the key points that can make their clubs better. In short, the real cause of death for baseball analysis is that it just isn’t very difficult to do, particularly if what you want is a 20/80 solution—80 percent of the maximum available benefit for only 20 percent of the investment.
There just wasn’t much to the benefits of empirical analysis that wasn’t known already through experience. There were a few inefficiencies that required empirics to fix, but scouts were rightfully skeptical of the broad assertions made by analysts. Players who walk in the low minors or college and do nothing else don’t develop into big league players, normally. It’s very hard for a short right handed pitcher to succeed since it’s hard for him to gain velocity as he matures. The irrational stubbornness of traditional baseball wisdom had absolutely no reason to be right scientifically, but it was.
Moreover, other sabermetric axiomatic pillars posses scarce empirical foundation. The ability to hit in the clutch, clubhouse chemistry, the importance of game calling for catchers, and the importance of big league coaches have been frequently marginalized. After several years of such browbeating the notion that such subjective factors hold importance, empirics are finally beginning to emerge suggesting they may actually exist. David Glassko of The Hardball Times presented intriguing evidence that the right managers can increase the performance of individual players in the Hardball Times Baseball Annual 2008. Nate Silver proposed a statistically significant metric demonstrating certain players’ ability to hit in the clutch. Bill James wrote a heavily criticized article condemning those who suggest we know such subjective factors do not exist. The realization that absence of statistical significance in even multiple regression of subjective factors does not preclude the existence of those factors seriously complicates the answers provided by Sabermetric analysts.
The “Scouts versus Stats” debate and its failure to correctly incorporate the difficulty in teasing truth out of empirics is representative of an age-old aphorism. The absence of evidence is not the evidence of absence. The fact that the data set in hand cannot prove that experiential beliefs have truth does not imply that they are incorrect. In a hypothetical vacuum, or a world where you can assume the irrationality of experience, the null hypothesis should be that any factor has no effect on another variable. In practicality and reality, the null hypothesis must be that the position favored by experience and “common sense” is true. There is no need to hold back evidence simply because it is formed on intuition, experience, and “subjectivity”.
Knowledge is a difficult, diffuse entity willing partially to present itself in the minds of many and completely in none. While some of its quantitative, unknown truths may magically appear in empirics, the truths more difficult to articulate may be hidden indefinitely simply because there is no reasonable way to measure them. It was perfectly rational for Beane’s scouts to question his “science”. The assumption with which we should enter the discussion is that the intuition-based, subjective evaluation is true, not that we know absolutely nothing from them. In our own lives, we temper “statistics” we hear with a dose of common sense and skepticism. Why should we discount and ignore theoretically it simply because don’t have the studies to back it up?
In Blink, Malcolm Gladwell discusses our ability to “think without thinking” and that there exist an amazing array of calculations our minds instantaneously go through when presented with a question. He cites several examples where our conjectural, instantaneous reactions are more accurate than those presented by difficult, time-consuming analysis. In the words of Gladwell, scouts are the experts of “thin slicing”. They know exactly what to look for and know it when they see it without a model to tell them.
In many ways, Sabermetrics may be a classic example of the failures of scientism. The fact that we should use science whenever we can to evaluate a theory does not mean that we can always use science to evaluate that theory. F.A. Hayek spoke against this type of thinking in the scope of economics in his Nobel Prize acceptance speech.
Unlike the position that exists in the physical sciences, in economics and other disciplines that deal with essentially complex phenomena [such as baseball], the aspects of the events to be accounted for about which we can get quantitative data are necessarily limited and may not include the important ones. While in the physical sciences it is generally assumed, probably with good reason, that any important factor which determines the observed events will itself be directly observable and measurable, in the study of such complex phenomena as the market, which depend on the actions of many individuals, all the circumstances which will determine the outcome of a process, for reasons which I shall explain later, will hardly ever be fully known or measurable. And while in the physical sciences the investigator will be able to measure what, on the basis of a prima facie theory, he thinks important, in the social sciences often that is treated as important which happens to be accessible to measurement. This is sometimes carried to the point where it is demanded that our theories must be formulated in such terms that they refer only to measurable magnitudes.
We know that subjective factors have some effect on the performance of a player. The question is how much. Since Sabermetrics has not been able to disprove empirically the traditional null hypothesis by using any method that does not presume the fallaciousness that “the absence of evidence implies the evidence of absence”, we must imperatively assume that, for example, Jason Varitek’s ability to manage Boston’s pitching staff is an important aspect of its run prevention.
It may well be that the Sabermetric movement only confused itself by attacking scouts. There was nothing intrinsically painful or unintuitive to the informed fan that perhaps it is easier to hit for power if you weigh more. In contrast, the narrative platitudes of announcers and journalists, or most charitable the insiders they selectively quote and interview. The fourth estate has an interest in keeping things interesting for the public, not to identify which players who may help a team win.
It’s not even that surprising how long it took for executives to come to terms with a group of arrogant outsiders who think they know something without ever playing the game. The three primary concepts of Sabermetrics- On-base percentage is important, a young pitcher’s arm is a terrible thing to waste, and college players were undervalued in the 90s- were incorporated relatively quickly into the market once they were made apparent.
The prevailing wisdom was true. It is not for stupidity that people cling to the knowledge of traditions, but because more often than not those who insist that they know more than those who spent their lives studying the issues are what they appear to be, arrogant pricks. There’s nothing wrong with looking at the world empirically. In fact, you’ll know more because you did. However, it’s understanding where those numbers falter and that the truth may be hiding where the numbers can never may gain visibility that quantitative wisdom may begin. Knowledge is a fickle, amorphous notion that strives to invisibly squirm out of any top-down definition, model, or generality that may claim to understand what is really going on. The subjective opinions of the many, incorporating all aspects of the vagueness suggested by the “diffuse nature of knowledge”, is categorically more effective in estimating and predicting than the objective understanding of the few.