Of Phrenology and Value Added Learning

A colleague of mine once told me that her intellectual hero had been the late Stephen Jay Gould, the Alexander Agassiz Professor of Zoology at Harvard from 1982 until his death in 2002. She was a biology teacher and admired his fruitful explorations of evolutionary theory in spite of its detractors (they called his idea of punctuated equilibrium “evolution by jerks”), while I admired him for the blessed clarity of his writing, which was clearly governed by Professor Barzun’s dictum that “writing is an act of courtesy” to the reader.

I especially admire The Mismeasure of Man, which I read before and reread after publication of The Bell Curve. Gould had personal and public reasons to be wary of the way that statistics can be bent in their use to justify false positions or to “enable” people to draw false conclusions. The personal reason: he heard, after being diagnosed with cancer, that the median period of survival of his kind of cancer was eight months after diagnosis. He survived twenty years, having made a complete recovery, which he discusses in his article “The Median Isn’t the Message.” One of his many public reasons for wariness, handled at length in the book, was that statistical methods could be (mis)used to enforce bad ideas or to justify bad public policy.

The key exhibit in this discussion was the chapter “The Real Error of Cyril Burt: Factor Analysis and the Reification of Intelligence[1].” It takes the statistical methods and concept-work behind traditional intelligence testing and subjects them to a careful, thorough, and transparent demolition.

And well-deserved. Teachers of a certain age will remember their halcyon pupil days, when IQ scores appeared next to students’ names, branding them like tattoos passed off as birthmarks. Mr. Smith, the math teacher of my own 7th-grade halcyon days, was a great inspiration to all of us, who looked forward to the end of the school-day because 8th period was when we would get—not have—to go to math class. A gifted teacher, he also and unfortunately had a tongue that sometimes led him down questionable paths. One day Student X had to be excused to go to the office. After he left, Mr. Smith said to the rest of us, “You know, you are all in this class because of your brains, but X’s IQ leaves you in the dust. When he grows up, he will do anything he wants.” So he did, though not quite in the way Mr. Smith had in mind. Before he settled into a modestly satisfying professional career in a small city in the Northwest, he had an extraordinary international career as a druggie, in which he smoked, sucked, ingested and injected anything, a living illustration of “Aldous Huxley Told in Gath” or Fear and Loathing in Las Vegas. He pursued part of this first career at the feet of Timothy Leary.

By contrast, my college friend H told me that as a boy he was classified a borderline moron[2] by his teachers, who advised his parents not to expect much from his adulthood. Against the expectation of Intelligence Scientists, his intellectual trajectory took him though the Bronx High School of Science[3], a successful undergraduate career in a rigorous pre-medical curriculum, and medical school. His IQ, at the level of a borderline moron in first grade, “increased” to 160 by the time he got out of Bronx Science, a thing that IQ was not supposed to do.

I would add another thing about IQ: it was not supposed to be used as a tool for predicting career trajectories or as a means of making fine distinctions such as the cutoff point above which my classmates took math with Mr. Smith and below which they took it with Mr. Dust. All of us of a certain age have stories like this to report about IQ tests, but few of us remember any critical examination of the idea behind the tests and labels. Professor Gould therefore did the world of intellect a service with his thorough debunking. Professor Barzun has done likewise in his intellectual-historical writing on social sciences whose ancestors include phrenology and physiognomy[4], which parallels in some ways Gould’s discussion of craniometry and its intellectual descendants in The Mismeasure of Man.

What we must carry away from this discussion into the contemporary world of education is the need to be very careful in our use of statistics to create categories and make educational decisions; above all, we must not reify statistical products whose underlying reality is fundamentally dubious.

Let us now move to a discussion of how to evaluate teachers and, as a part of that discussion, to New York City, home of much that is great and awful in American education. We will go not to the Bronx High School of Science but the Lab Middle School for Collaborative Studies in Manhattan. We will find there a teacher called Stacey Isaacson, who received degrees from Penn and Columbia and who “had a successful career in advertising and finance before taking [her] teaching job, at half the pay.[5]” Ms. Isaacson received glowing reviews from her principal and glowing tributes from former students. All but one of her students was rated as “proficient.” She came in voluntarily at no pay once a week during her maternity leave.

All the evidence that a normal rational human being would need to decide on the quality of her person and work suggests rather strongly that she is a paragon. Unfortunately, that evidence doesn’t matter. What matters is her value-added learning score, which shows that she is one of New York’s worst teachers. It means that when she is reviewed for tenure, she will almost certainly not receive it. It will probably soon mean that in layoffs she will be one of the first asked to go. She may beat them to the pink slip since she will probably have no trouble returning to her career in advertising and finance if she needs to. I think she ought to apply for a teaching position in Finland.

More seriously and more to the point, I think that value-added learning is precisely the same kind of misbegotten mess that Pearson’s g[6] turned out to be, if not a worse. We cannot even say exactly what value is in this addition: all we can do is infer its existence from the solutions to a formula. What is more, IQ is at least a simple quotient: contrast it with Ms. Isaacson’s score, which is the result of applying a formula[7] of astonishing impenetrability. This cloud of sigmas and enigmas, enveloping her work with data about her students and the school where she does that work, has determined that she is in the 7th percentile of teachers.

The 7th percentile of teachers. What does that mean? I suggest it means no more, and possibly much less, than a college-rating system of suspect reliability means when it says that College X is the 52nd-best in the country. The suspect system of ratings means little even if it has no margin of error, but it turns out that Ms. Isaacson’s score has a margin of error so great that her “actual rating”—whatever that is—could be anywhere from the 1st percentile of teachers to the 52nd. This helpful rating, taken with the needed caution, effectively says that she might be the worst teacher in New York, or better than half of them, or anywhere in between. Unfortunately, this helpful rating is not being taken with the needed caution because the margin of error is ignored in the definitive casting of Ms. Isaacson on the rubbish-heap. Where is Finland?

Or, for that matter, to keep things American, where is W. Edwards Deming? Schools should be run on an educational model, not on a business model, but Deming became the inspiration of post-WWII Japanese business and its economic miracle by advocating a remarkable series of principles of business management. If the educational leaders in schools are going to be replaced by businessmen—though they shouldn’t be—at least those replacements might consider applying some of Deming’s principles:

  • Drive out fear.
  • Eliminate slogans, exhortations, and targets.
  • Eliminate management by numbers, numerical goals. Substitute leadership.



