Much has been made of a recent study that shows a correlation between the “effectiveness” of teachers as determined by the scoring of their students on “value-added metrics” and these students’ success in their later lives as determined by “markers.” This muchness put me in mind—again—of Flannery O’Connor’s remark that “[t]he devil of educationalism that possesses us is the kind that can be cast out only by prayer and fasting.” I am not so sanguine as O’Connor: even prayer and fasting don’t often seem to work! I keep wondering what could possess whole communities of people to be stunned by a complex statistical study embodying years of data on millions of students when it concludes that children with good teachers do better than children with bad. One of the devils in the legion seems to be rather dim, but I will try to give the devil his due.
The original report is impressive in its thoroughness and the care with which its authors make and qualify their claims. They note, for example, that teachers in the study were not “incentivized based on test scores,” thereby skirting the effect of cheating, teaching to tests, and other “distortions in teacher behavior” that make the basis of value-addition different from what it would be in a population whose members had been “incentivized”—that is, in the real world of Campbell’s Law. There is no guarantee that results like this study’s would be similar to those in a district whose teachers were looking over their shoulders at the Value-added Reaper as he made his progress through their ranks. The twofold problem is that the use of “value-added metrics” encourages teaching to tests (the most-purchased books in the New York schools are books of preparation for tests), and there is evidence in research as well as the educational experience of the human race that teachers who teach to tests get worse results than teachers who don’t.
They caution that some elements of the value-added equation require “observing teachers over many school years” and may not apply in a “high stakes environment with multitasking and imperfect monitoring”—that is, precisely, the kind of environment in which hasty “consequential decisions” will be made on the basis of imperfect applications of the equation over the short term.
They point out as a justification for their aggregate numbers that “observable characteristics are sufficiently rich so that any remaining unobserved heterogeneity is balanced across teachers,” but those who want to use “value-added metrics” to make consequential decisions will be applying the equation to particular individuals without correction for “unobserved heterogeneity.”
They note that their study did not include the effect of peers and of parental investment in value-addition. While everyone agrees that the teacher’s effect on what students learn is pronounced, this seems like a significant omission that could have serious consequences for the teachers whose students’ peers and parents had a significant effect on the learning for which the teacher is held exclusively reponsible.
The authors state that the study’s assumptions “rule out the possibility that teacher quality fluctuates across years.” Can this be? Raise your hand if your quality was as good in your first year of teaching as in your tenth.
In addition to what the authors say in qualification and limitation of their results, I have a few questions. They say that “value added is difficult to predict based on teacher observables.” Do the people who want to use value-added metrics as the basis for personnel decisions want to go a step farther and assert that there is nothing observable that a teacher can actually learn or plan to do or avoid that will make a difference in how she or he scores? This seems like a bizarre position for someone who believes in life-long learning.
I want to understand in non-mathematical terms how “academic aptitude” is factored into the equation so that teachers will not be “penalized” for taking classes of difficult or refractory students. It seems to be a single number (ηi) in the equation, but how is it derived?
I would like to know how many years’ value-added ratings they think a teacher should receive before the ratings can be said to reflect his or her actual performance, and I would like to understand the basis for this determination. It is one thing to say that we have some aggregate statistics that show teachers in general have certain effects on their students in the long run, and a rather different thing to say that these statistics can reliably rate individual teachers in one or two goes. This is particularly true given that the authors themselves say some elements of the value-added equation require “observing teachers over many school years.”
Having asked my questions I now make a couple of observations. One of the study’s authors, according to The New York Times, says that value-added metrics should be used even though “mistakes will be made” and “despite the uncertainty and disruption involved.” It is disturbing to see someone so fastidious in the drawing of conclusions become so sweeping and remorseless in applying them, particularly when the study itself has just spoken to the need to “weigh the cost of errors in personnel decisions against the mean benefit from improving teacher value-added.”
The problem with “mean benefits” is that they have particular consequences. The authors have said that they think it would be more cost-effective to fire ineffective teachers (even mistakenly ineffective ones) than to give bonuses to effective ones. I keep wondering whether this kind of decision-making will be ethos-effective. I keep wondering who is going to be attracted to a profession governed by such principles and assumptions as those that lie behind value-added systems. “Drifters and misfits,” as Hofstadter called them? The authors of the study note that no observable teacher behavior correlates to value addition, so I wonder who will join a profession in which it cannot be said with confidence what he needs to do in order to be successful.
The moral and intellectual world in which the discernment of quality was a matter of finesse or connoisseurship and in which reward and reprobation follow particular deeds or ways of doing things is the same one in which we could say without a quantitative rationalization that the students of good teachers do better than the students of bad. That world is also a place where both teachers and administrators take their duties seriously, including the duty to counsel and correct when needed and to accept counsel and correction when deserved or needed.
It might be worth ending with a note on the stereotype that people who are against value-added “measurement” are unionists, educational bureaucrats, or people with tenure to lose in a change of system. In my twenty-five years as a teacher I have never worked within a tenure-granting system. I have never been in a union shop, nor have I been a member of a teachers’ union. I have never held an administrative position in education except that of Department Head. I have never worked in a teachers’ college. If I am against the kind of practice discussed in this posting, it is not because I have a hidden interest. It is because it seems wrong. I mean both wrong-headed and culpable.