by Ron Shandler
January 2009
Ashley-Perry Statistical Axiom #3: Skill in manipulating numbers is a talent, not evidence of divine guidance.
Ashley-Perry Statistical Axiom #5: The product of an arithmetical computation is the answer to an equation; it is not the solution to a problem.
Merkin's Maxim: When in doubt, predict that the present trend will continue.
The quest continues for the most accurate baseball forecasting system.
I've been publishing player projections for more than two decades. During that time, I have been made privy to the work of many fine analysts and many fine forecasting systems. But through all their fine efforts at attempting to predict the future, there have been certain constants. The core of every system has been comprised of pretty much the same elements:
- Players will perform within the framework of their past history and/or trends.
- Their skills will develop and decline according to age.
- Their statistics will be shaped by health, expected role and environment.
These are the elements that keep all projections within a range of believability. This is what prevents us from predicting a 40-HR season for Wily Tavares or 40 stolen bases for Adam Dunn. However, within this range of believability is a great black hole where precision seems to disappear. Yes, we know that Alex Rodriguez is a leading power hitter, but whether he is going to hit 40 HRs, or 45, or 35, or even 50, is a mystery.
You see, while all these systems are built upon the same basic elements, they are constrained by the same limitations. We are all still trying to project...
- a bunch of human beings
- each with their own individual skill sets
- each with their own individual rates of growth and decline
- each with different abilities to resist and recover from injury
- each limited to opportunities determined by other people
- and each generating a group of statistics largely affected by tons of external noise.
As much as we acknowledge these limitations intuitively, we continue to resist them because the game is so darned measurable. The problem is that we do have some success at predicting the future and that limited success whets our desire, luring us into believing that a better, more accurate system awaits just beyond the next revelation. So we work feverishly to try to find the missing link to success, creating vast, complex models that track obscure relationships, and attempt to bring us ever closer to perfection. But for many of us fine analysts, all that work only takes us deeper and deeper into the abyss.
Why? Because perfection is impossible and nobody seems to have a real clear vision of what success is.
....
Selection of the study methodology: Even if a comparative analysis includes all relevant test subjects and somehow finds a study variable that makes sense, there is still a concern about how the study is conducted. Does it use a recognized, statistically valid methodology for validating or discounting variances? Or does it use a faulty system like the ranking methodology used by Elias to determine Type A, B or C free agents? Such a system -- which ironically is the basis for Rotisserie scoring -- distorts the truth because it can magnify tiny differences in the numbers and minimize huge variances.
As such, unless the study uses a proven methodology, it cannot be completely objective.
And bias immediately enters into the picture. You simply cannot trust the results.
The only legitimate, objective analysis that can filter out the biases is one that is conducted by an independent third party. But the challenge of conducting such a study is finding a level playing field that all participants can agree on. Given that different touts have different goals for their numbers, that playing field might not exist. And even if one should be found, there will undoubtedly be some participants reluctant to run the risk of finishing last, which could skew the results as well.
Other challenges to assessing projections
Ashley-Perry Statistical Axiom #4: Like other occult techniques of divination, the statistical method has a private jargon deliberately contrived to obscure its methods from non-practitioners.
As users of player projections, and in a hurry to make decisions, we want answers, and quickly. We want to find a trusted source, let them do all the heavy lifting, and then partake of the fruits of their labor. The truth is, the greater the perceived weight of that lifting, the greater the perceived credibility of the source. Only the small percentage of users who speak in that "private jargon" can validate the true credibility. The rest of us have to go on the faith that the existence of experts proficient in these 'occult techniques' is proof enough.
Well, so what? That's why we rely on experts in the first place, isn't it? What is the real problem here?
Complexity for complexity's sake
One of the growing themes that I've been writing about the past few years is the embracing of imprecision in our analyses. This seems counter-intuitive given the growth in our knowledge. But, the game is played by human beings affected by random, external variables; the thought that we can create complex systems to accurately measure these unpredictable creatures is really what is counter-intuitive.
And so, what ends up happening in this world of growing complexity and precision is that we obsess over hundredths of percentage points and treat minute variances as absolute gospel. To wit...
It has been shown that a simplistic forecasting system that averages the last few seasons with minor adjustments for age is nearly as good as any advanced system. The simple system is called "Marcel" (named after the monkey on the TV show Friends) because any chimp with an Excel spreadsheet can do it. The truth is, if 70% accuracy is the best that we can reasonably expect, Marcel alone gets us to about 65%. All of our advanced systems are fighting for occupation of that last 5%.
Still, those conducting comparative analyses will crow about one system beating another 68% to 67%. This is a level of precision that can often be rendered moot across the entire player pool by a handful of wind-blown home runs and a few seeing-eye singles. Still, there has to be a "winner," right?
But we forget such "hard" baseball facts such as:
- The difference between a .250 hitter and a .300 hitter is fewer than 5 hits per month.
- A true .290 hitter can bat .254 one year and .326 the next and still be within a statistically valid range for .290.
- A pitcher allowing 5 runs in 2 innings will see a different ERA impact than one allowing 8 runs in 5 innings, even though, for all intents and purposes, both got rocked.
Gall's Law: A complex system that works is invariably found to have evolved from a simple system that works.
Occam's Razor: When you have two competing theories which make exactly the same predictions, the one that is simpler is preferred.
Those systems that try to impress us with their complexity as proof of their credibility may be no better than a room full of monkeys with spreadsheets. At minimum, they generate projections that are 'close enough' for our player evaluation purposes and yield draft results that are virtually indistinguishable from any simian-driven system.
Married to the model
It's one thing if the model has a name like Christie Brinkley, but quite another if a tout is so betrothed to his forecasting model that "it" becomes more important than the projections.
Whenever I hear a tout write, "Well, the model spit out these numbers, but I think it's being overly optimistic," I cringe. Well then, change the numbers! The mindset is that you have to cling to the model, for better or for worse, in order to legitimize it. The only way to change the numbers is to change the model.
On occasion, I will take a look at one of my projections and admit that I think it's wrong. Then I change the numbers. Because, in the end, is the goal to have the best model or to have the best projections?
The comfort zone
Given the variability in player performance, a "real world" forecast should not yield black or white results. Some touts accomplish this by providing forecast ranges, others by providing decile levels. But most end up committing to a single stat line to describe their expectations for the coming year.
In October, reality will be black or white. In March, it's all shades of grey. But it's far easier for fantasy leaguers to draft their teams from blacks and whites, so touts have to commit. Grey is out, even when a projection carries great uncertainty.
One of the best examples from March 2008 was Andruw Jones. This was a hitter coming off a 26-HR, .222 BA season after having gone 41-.262 and 51-.263 in the two years prior. The questions on everyone's mind... Was 2007 an aberration? Would he bounce back? If so, how far would he bounce back?
....
The Hedge
The hedge is used to formally straddle the fence rather than commit to anything, and typically takes place in the player commentary. In that aspect, the hedge might be a good thing because it embraces the "greys."
However, some touts use the commentary as a hedge against the numbers they've committed to, and in doing so, can negatively impact your ability to assess a projection.
...
Andruw Jones actually batted .158 with three home runs in 209 AB in 2008. While nobody came even remotely close to this, the least optimistic projection holds the most fantasy relevance. As noted in the Baseball Forecaster:
"The best projections are often the ones that are just far enough away from the field of expectation to alter decision-making. In other words, it doesn't matter if I project Player X to bat .320 and he only bats .295; it matters that I projected .320 and everyone else projected .280.
"Or, perhaps we should evaluate projections based upon their intrinsic value. For instance, coming into 2008, would it have been more important for me to tell you that Adam Dunn was going to hit 40 HRs or that Juan Pierre would only get 290 at bats? By season's end, the Dunn projection would have been dead-on accurate, but the Pierre projection — even though it was off by 85 AB — would have been far more valuable."
...
Finding relevance
Berkeley's 17th Law: A great many problems do not have accurate answers, but do have approximate answers, from which sensible decisions can be made.
Maybe I'm a bit exasperated by this obsession with prognosticating accuracy because the Baseball Forecaster/HQ projections system is more prone to stray from the norm - by design - and thus potentially fare worse in any comparative analysis. The HQ system is not a computer that just spits out numbers. We don't spend our waking hours tinkering with algorithms so that we can minimize all the mean squared errors. Our model only spits out an objective baseline and then the process becomes hands-on and highly subjective.
From the Projections Notes page at BaseballHQ.com:
"Skills performance baselines are created for every player beginning each fall. The process starts as a 5-year statistical trend analysis and includes all relevant performance data, including major league equivalent minor league stats. The output from this process is a first-pass projection.
"Our computer model then generates a series of flags, highlighting relevant BPI data, such as high workload for pitchers, contact rate and PX levels trending in tandem, xERAs far apart from real ERAs, etc. These flags are examined for every player and subjective adjustments are made to all the baseline projections based on a series of "rules" that have been developed over time."
The end result of this system is not just a set of inert numbers. As mentioned earlier, the commentary that accompanies the numbers is just as vital a part of the "projection," if not more so. Think of it this way... The numbers provide a foundation for our expectations, the "play-by-play," if you will. The commentary, driven by all the BPIs and component skills analysis, provides the "color." Both, in tandem, create the complete picture.
Admittedly, a system with subjective elements tends to give classic sabermetricians fits. But that's okay because, at the end of the day we're still dealing with...
- a bunch of human beings
- each with their own individual skill sets
- each with their own individual rates of growth and decline
- each with different abilities to resist and recover from injury
- each limited to opportunities determined by other people
- and each generating a group of statistics largely affected by tons of external noise.
Now here's the kicker... In the end, my primary goal is not accuracy. My goal is to shape the draft day behavior of fantasy leaguers. For certain players with marked BPI levels or trends, we often publish projections that are not designed to reflect a "most likely case" but rather a "strong enough case to influence your decision-making." Sometimes there are reasons to stray beyond the comfort zone.
.....
Baseball Variation of Harvard Law: Under the most rigorously observed conditions of skill, age, environment, statistical rules and other variables, a ballplayer will perform as he damn well pleases.