**[This is a reworking of an old posting.]**

Much has been made of a study[1] that shows a correlation between the “effectiveness” of teachers as determined by the scores of their students on “value-added metrics” and these students’ success in their later lives as determined by “markers.” This muchness put me in mind—again—of Flannery O’Connor’s remark that “[t]he devil of educationalism that possesses us is the kind that can be cast out only by prayer and fasting.” In my less sanguine moments I lack O’Connor’s optimism and wonder whether even prayer and fasting will always work, though hope springs eternal. I wonder what could possess whole communities of educators to be stunned by a complex statistical study incorporating years of data on millions of students when it concludes that children with good teachers do better than children with bad teachers. One of the devils in the legion seems to be rather dim, but in examining the report I will try to give the devil his due, and to draw attention to some of his partners.

The original report is impressive in its thoroughness and the care with which its authors make and qualify their claims, even to the point of lessening the value of their study. They note, for example, that teachers in the study were not “incentivized based on test scores.” That means they were not plagued by the temptation and results of cheating, teaching to tests, and other “distortions in teacher behavior” that make the basis of value-addition different from what it would be in a population whose members had been “incentivized”—that is, in the real world of Atlanta, Tennessee, and New York. We are expected, nevertheless, to believe a study of “non-incentivized” teachers has something to say to districts whose teachers were looking over their shoulders at the Value-added Reaper as he made his progress through their ranks. Two additional problems with applying this study to teachers in RAT[1] programs are that the use of “value-added metrics” encourages teaching to tests (the most-purchased books in the New York schools are books of preparation for tests) rather than to goals beyond the classroom, and that evidence in research as well as the educational experience of the human race shows that teachers who teach to tests get worse results than teachers who don’t.

The authors of the study caution that some elements of the value-added equation require “observing teachers over many school years” and may not apply in a “high stakes environment with multitasking and imperfect monitoring”—which is, precisely, the kind of environment in which hasty consequential decisions will be made on the basis of imperfect applications of the equation *over the short term*.

They point out as a justification for their aggregate numbers that “observable characteristics are sufficiently rich so that any remaining unobserved heterogeneity is balanced across teachers,” but those who want to use “value-added metrics” to make consequential decisions will be applying the equation to particular individuals without correction for “unobserved heterogeneity.”

They note that their study did not include the effect of peers and of parental investment in value-addition. While everyone agrees that the teacher’s effect on what students learn is pronounced, this seems like a tremendous omission that could have serious undeserved consequences for the teachers whose students’ peers and parents had significant bad effects on the learning for which the teacher is held exclusively responsible.

The authors state that the study’s assumptions “rule out the possibility that teacher quality fluctuates across years.” Can such an assumption be valid? Raise your hand if the quality of your teaching was as good in your first year of work as in your tenth. No hands? Of course not!

Turning from my bright class back to the study, I have some further questions. The study claims that “value added is difficult to predict based on teacher observables.” Does this amazing claim mean that the study advocates using a “metric” of evaluation with no observable connection to the behavior being evaluated? Or does it mean that unlike their students’ work, the teachers’ work cannot be observed, diagnosed, and corrected? What has happened to cause-and-effect and lifelong learning?

I want to understand in non-mathematical terms how “academic aptitude” is factored into the equation so that teachers will not be “penalized” for taking classes of difficult or refractory students. It seems to be a single number (η_{i}) in the equation, but how is it derived? A lot hangs on the way teachers are “made responsible” for students’ work when their aptitude for it may have a share in the responsibility.

I would like to know how many years’ value-added ratings they think a teacher should receive before the ratings can be said to reflect his or her actual performance, and I would like to understand the basis for this determination. It is one thing to say that we have some aggregate statistics that show teachers in general have certain effects on their students in the long run, and a rather different thing to say that these statistics can reliably rate individual teachers in one or two goes. This is particularly true given that the authors themselves say some elements of the value-added equation require “observing teachers over many school years.”

Having asked my questions I now make a couple of observations. One of the study’s authors, according to The New York Times, says that value-added metrics should be used even though “mistakes will be made” and “despite the uncertainty and disruption involved.” It is disturbing to see someone so fastidious in the drawing of conclusions become so sweeping and remorseless in applying them, particularly when the study itself has just spoken to the need to “weigh the cost of errors in personnel decisions against the mean benefit from improving teacher value-added.”

The problem with basing decisions on “mean benefits” is that they have particular consequences. The authors have said that they think it would be more cost-effective to fire ineffective teachers (even mistakenly ineffective ones) than to give bonuses to effective ones. It is time for people who say stuff like this to start “balancing” cost-effectiveness and ethos-effectiveness. Who is going to be attracted to a profession governed by such principles and assumptions? “Drifters and misfits,” as Hofstadter called them? And if no teacher behavior correlates to “value” “addition,” what prospective teacher will join a profession in which it cannot be said with confidence what he needs to do in order to be successful?

Time for a little exorcism!

Let me end with a note on the ad-hominem stereotype that people who are against value-added “measurement” are unionists, educational bureaucrats, or people with tenure to lose in a change of system. In my twenty-five years as a teacher I have never worked within a tenure-granting system. I have never been in a union shop, nor have I been a member of a teachers’ union. I have never held an administrative position in education except that of Department Head. I have never worked in a teachers’ college. If I am against the kind of practice discussed in this posting, it is not because I have a hidden interest. It is because it seems wrong. I mean both wrong-headed and culpable.

[1] “The Long-term Impacts of Teachers: Teacher Value-added and Student Outcomes in Adulthood” by Raj Chetty, John N. Friedman, and Johah E. Rockoff of Harvard. http://www.nber.org/papers/w17699.pdf