Pages

Sunday, June 17, 2018

Battling Expertise with the Power of Ignorance | Articles | Bill James Online

Battling Expertise with the Power of Ignorance | Articles | Bill James Online
Image result for Battling Expertise with the Power of Ignorance

There's a lot to like in this article, not the least of which is this:

But everything depends upon recognizing what you do not know, and this gets back to the Power of Ignorance.   The great mistake that analysts make is that we always want to focus on what we DO know; we want to make inferences based on what we have studied in the past.   We like to do that because, like everyone else, we are trying to purchase credibility based on the work we have done.
Enjoy the rest.

from billjamesonline.com
https://www.billjamesonline.com/article1373/

Battling Expertise with the Power of Ignorance

            Since I am speaking to the statistics department, I will make every effort to make this a statistically-oriented speech discussing specific issues of statistical research.    However, in order not to be misunderstood, there are several other things that I will have to say first to establish parameters, and then I'll get to what you might recognize as relevant issues in five or ten minutes.

            The first thing that I should say is that I have no credentials whatsoever as a mathematician or a statistician.   I have been identified countless times as a statistician, for reasons that I understand, but I have never, ever been self-identified as a statistician, for reasons that I will explain.   I don't really know anything much about the workings or applications of statistical methods.   I could not describe myself as a statistician because I could not meet the standards that the people in this room would expect a professional statistician to meet.  I don't call myself a statistician for the same reason that I don't call myself a plastic surgeon or an auto mechanic:  I am afraid that somebody might ask me to tighten their jawline or to fix their Honda, and I wouldn't have a clue how to do it.   I have always chosen to call myself a writer because, well, hell, anybody can call himself a writer.

            There is more to it than that, actually.   Self-definition is dangerous for a public figure, because it indirectly places limits on what one can attempt within the definition.   Although I often write about baseball history I don't call myself a historian, either, in part because saying that I am this or that or the other adds limits, but not abilities.   If I call myself a historian that doesn't make me any better historian, but am I still allowed to write about the future of the game, or if I call myself a statistical analyst am I still allowed to propose theories that have nothing to do with statistics?    I have always thought that it was best not to define oneself, but to let the world say about you whatever it is that the world chooses to say.   This is my first reference point for the Power of Ignorance.   By not claiming to know exactly what it is that I am doing, I remain able to attempt whatever it is that I feel like attempting.   It's a great advantage.

            I should say, unless there be misunderstanding about this, that I am in no way in favor of ignorance or against the advance of knowledge.   I have worked my entire life for the advancement of knowledge, trying to increase respect for reason and respect for research in the world of sports.    I am embracing ignorance here in this sense and for this reason:   that we are all, in my view, condemned to float endlessly in a vast sea of un-answered questions and unknown reference points—a Sea of Ignorance, if you will.   The example that I like to use is a chess board.   How many moves ahead can you see on a chess board?   I can see about one move ahead of myself in a chess game.    If you can see 3 or 4 moves ahead on a chess board, you can beat 99% of chess players, and if you could see 7 or 8 moves ahead in a chess game, you would be a world-class chess champion.

            Well, suppose that a chess board was not eight squares wide and eight squares long, but a hundred squares wide and a hundred squares long, with a thousand moving pieces, rather than 32.   How far ahead could you see on a chess board then?     The world is like a chess board that is a million squares wide and a million squares long with hundreds of thousands of moving pieces and hundreds of thousands of different players moving them.    In my view, anyone who imagines that he can anticipate what will happen next, in any area of life, is delusional, and people who think that experts should be able to do this are children and fools.

            If the world was 10% more complicated than the human mind, or even if it was 40% more complicated or ten times as complicated, then the difference between an intelligent person's ability to understand the world and a less intelligent person's ability to understand the world would be very meaningful.   But since the world is billions and billions of times more complicated than the human mind, individual intelligence is almost entirely irrelevant to the understanding of the world.   What is critical to understanding is humility and co-operation.   What is critical to gaining more understanding of the world is to learn to accept and appreciate the vastness of our ignorance, and to understand that one can only survive in a sea of ignorance by working with others to make our small lifeboat a little bit stronger.   Only by embracing the fact of our limitless ignorance can one position oneself to increase the store of knowledge.

            Now, getting finally to statistics.   The way that baseball was understood 35 years ago, and the way that is understood today, is largely by the interpretations of experts.   I don't in any way want to speak disrespectfully of experts, but experts are people who claim to know things, and who claim to understand how something works.   There are a vast number of things that the experts all know, based on their experience in the game and based on their education by others, older than themselves.   The experts all knew, for example, that the prime of a player's career was ages 28 to 32.   The experts all knew that when there was a runner on first and no one out, the percentage move was to bunt.   The experts all knew that speed was tremendously important, and that the difference between good teams and bad teams was mostly in how they performed in clutch situations.   The experts all knew that a good starting pitcher would draw a few thousand extra fans to the game every time he took the mound.

            Through 1970, through 1975, there was essentially no one in the world who was in the habit of submitting these axioms of expertise to objective test.    When I began writing about baseball in 1975, the first thing I did was to say, "Well, I don't know anything.   I'm not an expert.   But perhaps I could contribute to the conversation by finding a way to take these things that the experts know, and look to see, as best I can, whether they are objectively true."

            If you want to know who I am and what I have done for a living for the last 35 years, I can explain it in one sentence.   My job is to find questions about baseball that have objective answers.    That is all that I do; that is basically all that I have done for the last 35 years.   I listen carefully to what is said to be true about baseball, and I try to find elements in those claims which are capable of objective answer.   For example, when it was suggested that baseball players peaked from ages 28 to 32, we asked, "OK, do players hit more home runs at ages 28 to 32 than at other ages?    How many home runs are hit by players at age 27?  At age 22?  At every other age?  How many doubles are hit, how many games do pitchers win at each age, how many strikeouts do they record, etc.?

            It turned out in this case that what the experts all knew to be true—that baseball players are in their prime from ages 28 to 32—is just totally, wildly and complete untrue.   It doesn't match the data in any way, shape or form.   27-year-old players hit 68% more home runs in the major leagues than do 32 year-old players—thus, saying that 32-year-old players are in their prime and 27-year-old players are not is preposterous.

            When you formulate a question which has an objective answer and you go and find that answer, you almost always wind up with a set of numbers.   "Numbers", in baseball, are usually referred to as "statistics", even if they are not the kind of numbers that would ordinarily be described as "statistics" in any other area of life.   Because the questions that I asked led to the formation of new statistics, I became known as a statistician.

            It is quite astonishing to me, in retrospect, that no one before me had tried to make a living by doing this.  There was a large community of baseball experts who worked for baseball teams and wrote about baseball, based on this large, shared body of "expertise".    A very, very large percentage of the things that the experts all knew to be true turned out, on examination, to be not true at all.   It is not true that bunting increases either the number of runs that are scored or the expectation of scoring a single run.   It is not true that speed is a key element of successful baseball teams, clutch hitting is either 99 or 100% a chimera, and the identity of the starting pitcher has, except in a very few cases, no detectable impact on the attendance at the game.  I'll deal with a couple of these claims in more detail later on, but when I began to publish articles and later books reporting on research which demonstrated that some of the claims of experts were demonstrably false, this put me at loggerheads with the baseball establishment.   There was, in the first fifteen years of my career, a great deal of misunderstanding about what I was doing.   People thought—and, indeed, some people still think—that I was trying to supplant the experts, and become an expert myself.   Some people thought that I was anti-expert, or anti-scout.   This was never true.  In fact, I have always had great respect and great admiration for the scouts.  There are a large number of things about baseball that I have no way of studying, no way of knowing based on the records.    I admire the ability of scouts to look at a young hitter, and note things about his swing that may predict whether he will be able to adjust to higher levels of competition.   Having set next to scouts at hundreds of major league baseball games, I am always astonished by the things that they can see that I would never have seen in a million years had someone not pointed them out to me.   I also admire, and lust after, those really cool radar guns.  The only thing is, everything the scouts say is not the gospel truth.

            In my early career, people would attack me by pointing out that I had no credentials to be considered an expert.   I fell into the habit of saying, "that's right; I don't."
            I want to point out to you in passing that "getting the answers right" had almost nothing to do with the success of my career.   My reputation is based entirely on finding the right questions to ask—that is, in finding questions that have objective answers, but to which no one happens to know what the objective answer is.   That's what I did 35 years ago; that's what I do now.   When I do that, it makes almost no difference whether I get the answer right, or whether I get it a little bit wrong.   Of course I do my very best to get the answers right, out of pride and caution, but it doesn't actually matter.
             Why?
            Because if I don't get the answer right, somebody else will.   It is called "science."            Again, I am not qualified to lecture you or to lecture anyone about the scientific method.   In fact, my understanding of the scientific method is very rudimentary, very primitive.   Nonetheless, the scientific method has been the greatest ally of my career.   Basically, what I know about the scientific method would fit onto a bumper sticker, and, that being the case, I might as well read you the bumper sticker.   We design tests to see whether an assertion is compatible or incompatible with the evidence.   When you do that, someone else will always figure out some way to do another test, and a better test.    When that happens, it is my responsibility to acknowledge that the other person's research is better than mine or is an advancement from mine.   What is necessary to the advancement of knowledge, then, is humility—the capacity to recognize that other people have accomplished something that I have not been able to accomplish.   That, then, is the bumper sticker:   what is necessary to the advancement of knowledge is humility.

            When you go to an expert and you say that, "I don't think that what you are saying is true," that will be perceived as arrogance.   Who are you to challenge the experts?   But it is not arrogance, at all; it is grounded in the understanding that we are all floating in a vast sea of ignorance, and that much of what we all believe to be true will later be shown to be nonsense.   To recognize this is not arrogance; it is humility.

            When I was in Elementary School in the early 1960s, our principal was fond of telling us that, when he was a young man just after World War One, he took a college chemistry class, in which the professor told the students that they were studying science at the ideal time, because all of the important discoveries had been made now.   Everything that there was to be known about chemistry or biology or physics, he suggested, was pretty much known now.

            Well, I call the search for objective knowledge about baseball "sabermetrics", and you would be amazed how common it is for us to hear that everything worthwhile to be known about sabermetrics is known now, and everybody who cares about it knows it.   In reality, nothing has changed, at all; all we have done is to take a few buckets of water out of the ocean of ignorance and move them over into the small pond of real knowledge.   In reality, the ocean of ignorance is larger than it ever was, as it expands on its own.

            Baseball teams play 162 games a year.    I just realized last week that, sometime in the last 20 years, baseball experts have fallen into the habit of saying that a baseball team has about 50 games a year that you are just going to lose no matter what, 50 games a year that you're going to win, and it is the other 62 games that determine what kind of season you're going to have.   This is not ancient knowledge; this is a fairly new one.   A more inane analysis would be difficult to conceive of.   First of all, baseball teams do not play one hundred non-competitive games a year, or anything remotely like that.   Baseball teams play about forty non-competitive games in a season, more or less; I would be surprised if any team in the history of major league baseball ever had a hundred games in the season that were just wins or losses, and which the losing team never had a chance to win after the fourth or fifth inning.   The outcome of most baseball games could be reversed by changing a very small number of events within the game.

            But setting that aside, this relatively new cliché assumes that it is the outcome of the most competitive games that decides whether a team has a great season or a poor season.   In reality, the opposite is true.   The more competitive a game is, the more likely it is that the game will be won by the weaker team.   If the Royals play the Yankees and the score of the game is 12 to 1, it is extremely likely that the Yankees won.   If the score is 4 to 3, it's pretty much a tossup.   The reasons why this is true will be intuitively obvious to those of you who work with statistics for a living.   It is the non-competitive games—the blowouts—that play the largest role in determining what kind of season a team has.   Misinformation about baseball continues to propagate, and will continue to propagate forever more, without regard to the fact that there is now a community of researchers that studies these things.

            One of the most enduring debates about the applications of statistical analysis to baseball has to do with the role of speed on a successful team.   Speed in baseball is tied more closely to stolen bases than to any other statistical category.   By the late 1970s, we had studied the statistics of successful and unsuccessful baseball teams to such an extent that we could place values on each event.     The statistics of baseball teams predict runs scored so reliably that is extremely easy to see that teams that hit 150 home runs score more runs than teams that hit 140 home runs.   It is easy to see that teams that hit 240 doubles score more runs than teams that score 230 runs, and that teams that hit 230 doubles score more runs than teams that hit 220 doubles. 

            It is easy to see, in the records of baseball, that teams that draw 550 walks in a season score more runs than teams that draw 540 walks.   The end result of each isolated event is easy to see in the overall mix, so much so that it is very easy to place a value on one single, one double, one triple or one home run.

The only exception to this is that stolen bases appear to be nearly invisible.  Teams that steal bases not only don't score more runs that teams that don't steal bases, they actually score slightly fewer runs—or did, 35 years ago.

Obviously, stolen bases can correlate negatively with runs scored because stolen base attempts can lead either to stolen bases, which are positive, or to runners caught stealing, which are negative.   By contrasting the value of a stolen base with the cost of a runner caught stealing, one can calculate what success percentage is needed to break even.  It turns out that through much of baseball history, teams were attempting to steal bases at a success rate that was actually causing them to score fewer runs than if they had not attempted to steal any bases at all.   Our capacity to misunderstand the world is almost without limit.  In recent decades this has not been true, but even in modern baseball, the actual success rate is so close to the break even percentage that the runners caught stealing eat up almost all of the value of the stolen base attempts, so that the gain in runs per stolen base attempt is along the lines of one run per 25 attempts.   Stolen bases are essentially irrelevant to successful offenses.    If a baseball team can add a player who hits five extra doubles or a player who steals 50 extra bases, they're usually better off to add the player who hits a handful of doubles.

There are many other ways that one can study the value of a stolen base.   We can calculate the inherent run value—that is, the probable runs scored—when there is a runner on first, no one out, and when there is a runner on second, no one out, etc.

One can create simulations of baseball offenses in which we generate random sequences of events with and without stolen base attempts, and see what the change in runs resulting is when the stolen base attempts are added.

One can evaluate the stolen base attempt with a Markov Chain analysis. . .that is, you may be able to do this; I can't, but many other people have.

The thing is that no matter which one of these approaches one takes, one always comes back with the conclusion that stolen bases are essentially irrelevant to a successful offense.   Of course, this does not prove that speed is not important or that speed is not tremendously valuable; it merely demonstrates that stolen base attempts are relatively insignificant.   Speed is not the same as "stolen bases".

When I published research questioning the value of speed in the late 1970s—and other researchers did as well—we were confronted by a barrage of arguments from the experts offering a hundred different reasons why we had to be wrong.    This was entirely appropriate.   It is not the scientific method that when somebody publishes a few studies concluding that X is untrue, everybody accepts that X is untrue, and stops asserting X.   There are a wide variety of reasons why speed could be important, even though stolen bases were not.   It could be, for example, that the value of stolen bases was hidden by a cross-correlation—that is, that as teams got better hitters they had less need to steal bases, thus bad teams stole more bases than good teams, even though the steals themselves were a positive.   Well, yes, that's true, but it's also pretty easy to remove the cross-correlation and study the stolen bases of teams that are otherwise similar, and the conclusion remains that stolen bases tend not to be closely associated with good teams.
One can study the question of speed without looking at stolen bases by looking at other categories of performance that tend to be dependant on speed, such as triples and grounding into double plays, but that leads to the same conclusion:  there is little or no evidence that fast teams tend to be good teams.

One thing that we would hear often, and still hear sometimes, is that the value of speed is that it prevents double plays.   But the value of the stolen base attempt in preventing double plays is accounted for in many or all of the approaches that have already been outlined, so this argument is essentially simply a misunderstanding of how the conclusion was reached.

The central question of analytical research in baseball is "why do teams win?"  What are the actual characteristics of winning teams?    The rest of baseball analysis consists mostly of breaking that question down into a thousand smaller questions.   The most damning fact for speed teams is that there is essentially no correlation between speed and wins.    You can say anything you want to about why speed is important in baseball, but all this accomplishes logically is to make the mystery deeper.    If there are all of these advantages to speed in baseball, then why don't speed teams win?   The fact remains that they don't, but let's move on to another issue.

Our work can be divided into two areas.   One is efforts to answer the question, "What is the relationship between X and Y?",  and the other is efforts to answer the question "How can we measure that?"   The first half of my career was largely devoted to efforts to state in simple formulas the relationships between different things in baseball—the relationship between runs scored and wins, for example, or the relationship between the different types of hit elements and runs scored.   I developed in the years from 1975 to 1990 a large number of heuristic rules for addressing various problems in baseball research.   The two best known of these are the Pythagorean theory of runs to wins, and the Runs Created Formula.

The Pythagorean theory of runs to wins, which I first published in 1977, states that the ratio between a team's wins and losses will be the same as the relationship between the square of their runs scored and square of their runs allowed.    In other words, if a team scored four runs a game on average and allowed three runs a game, their winning percentage would be about .640, or a ratio of sixteen to nine.  Later research has demonstrated that the Pythagorean theory works better with an exponent other than 2.00, and still later research has demonstrated that it works better still if you modify the exponent for the level of scoring.   Still, those modifications give only tiny gains in accuracy, and the Pythagorean theory is now almost universally understood and is widely accepted in baseball.

The other heuristic that people know is the runs created formula, which states that the number of runs that a team will score can be predicted by the formula hits plus walks, times total bases, divided by (at bats plus walks).   I introduced this formula in 1978.  The essential question about a hitter is not how many hits he gets, or what his on base percentage is, or his slugging percentage, but how many runs he puts on the scoreboard.   I thus looked for two or three years for the simplest way to estimate how many runs each player had produced, and this was it.  There are now dozens of variations of the runs created formula in use, but the simple one from 1978 still works fine.   In 2009, 27 of the 30 teams came within seven games of winning the number of games predicted by the 1977 version of the Pythagorean theory, and 26 of 30 teams came within 5% of scoring the number or runs predicted by the 1978 version of the Runs Created formula.

I developed a lot of other heuristics in those years, many of which still survive, like the Power/Speed Number, Secondary Average, Game Scores, Similarity Scores, and something called the Favorite Toy.   The Favorite Toy is a way of estimating the chance that a player will get 3,000 career hits or hit 500 home runs or some such goal.   The method is so crude and so arbitrary that, at the time I developed it in the early 1970s, I was certain that I would figure out some better way to do this within a few weeks.  It's been almost 40 years, and I never have; the spooky thing about that stupid little formula is that it insists on working, although there are a dozen obvious reasons why it shouldn't.

In the second half of my career, what I have done more of is to figure out ways to define and measure things that people talk about, but which aren't measured because nobody has taken the trouble to figure out how to measure them.   Much of this research is more or less parallel to what a surveyor does.   You know what a surveyor does?  He puts a post in the ground and measures everything from where the post was.   At some point people forget that the starting point of the measurement was entirely arbitrary, and begin to accept the relative nature of the measurements.

Baseball announcers and experts often use terms that have no exact definition, like "manufactured run".    A manufactured run, more or less, is a run that a team scores by putting little parts of a run together, like a walk, a stolen base, a ground out and a single. 

  A walk and a single don't add up to a run, but when you add in the stolen base and the ground ball moving up the runner, you get a run out of it.   That's a manufactured run.There is a sort of general agreement about what is a manufactured run, but there is no data because there is no precise operational definition.   My contribution to this discussion has been to make up a specific operational definition that says what is and what is not a manufactured run.   I did this about four years ago, and, to this point, my definition has had no impact whatsoever on the discussion.    But that's just because, at this time, we haven't yet reached the point at which people have stopped focusing on the arbitrary nature of the starting point.

I didn't make up the definition of a manufactured run out of whole cloth.   What I did was, I listened very carefully to what people were saying, identifying the occasions on which people would use the term "manufactured run".   Then I looked back at what had happened, and tried to identify the circumstances that caused people to use the term.   I am not saying that I got it exactly right.   Perhaps I got it 80% right, perhaps less.   But I am saying that, in the long run, people will accept the definition and begin to use the data, simply because a concept is much more useful when it has a specific definition than when it does not.

A great deal of my work over the second half of my career has been to replace free-floating concepts with specific definitions—for example, I've made up specific definitions for "bombs"—that is to say, when an intentional walk blows up on a manager.   It's a common expression; it merely occurred to me one day to ask "What exactly does that mean?"   Once you realize that you don't know, then you can write a definition, then you can produce data based on the definition, then you can study the issue.

            Once the data is produced, of course, it becomes a "statistic", and I become known as the person who has invented yet another new statistic.   But is writing definitions really the work of a statistician?   I'll leave that up to you.   Call me whatever you want to call me.

            Probably the most useful thing that I have ever devised, in terms of practical value to real baseball teams, is Similarity Scores.   Similarity Scores consist of an entirely arbitrary set of values used to measure the differences between any two players—so arbitrary, in fact, that I usually choose to re-invent them every time I use them, rather than sticking with any set of values.           

But Similarity Scores are tremendously useful because, in order to study anything in baseball, you need to identify similar players or similar teams.   If you want to know how much money a player should be paid, the first thing you look at is how much money is paid to similar players.   If you are trying to figure out how long a player might last, how many years he has left, the most useful way to study the issue is to identify similar players, and study what happened to them.   If a player has a very poor year, and you are trying to figure out what his chances are of snapping back, it is very useful to be able to find similar players who had bad years at a similar point in their career.   Although it is grounded in nothing—"similarity" is an entirely subjective concept—the method turns out to be of ubiquitous value to real baseball teams facing real life issues.

But everything depends upon recognizing what you do not know, and this gets back to the Power of Ignorance.   The great mistake that analysts make is that we always want to focus on what we DO know; we want to make inferences based on what we have studied in the past.   We like to do that because, like everyone else, we are trying to purchase credibility based on the work we have done.

But the problem is, you don't learn anything by focusing on the stuff that you already know.    In order to expand the sphere of what is known about baseball, you have to find a question that has an answer, but you don't know what the answer is.    In other words, you have to learn to identify your own ignorance.   You have to get comfortable with ignorance; you have to learn to embrace your ignorance.   By doing so, you acquire the ability to expand knowledge.

If you take a bad baseball team, a team that makes bad decisions, and you ask, "Why do they do this?" it will never be ignorance that is the culprit.    The problem is not what teams do not know.   The problem is what they do know that isn't true.I have spent my career battling experts, working with the raw material of ignorance.   This has always worked for me because ignorance is an inexhaustible resource.   We are all so desperate to understand the world that we manufacture misunderstandings by the yard.    Creating knowledge to combat ignorance. .. .creating tools with which to study something. . .these are slow and time-consuming activities.   Making superstitious connections is quick and easy.   That sounds judgmental and it shouldn't.   The reality is that we're not capable of understanding the world, because the world is vastly more complicated than the human mind.    I don't know if that is a complete explanation of myself or not, but it's the best I can do, and I will be happy to take any questions that you may have.


Sent from my iPhone

No comments:

Post a Comment