Value Added Metrics and the Bíg, Bláck Blóck

“[VAMs] are not yet, I think, up to the task of being put into, say, an index to make important summative decisions about teachers.”—Morgan S. Polikoff

To sit in solemn silence in a smáll clássróom

While reading the VAM ratings that will séal óne’s dóom,

Awaiting the sensation of a shórt, shárp shóck

From a cheap and chippy chopper on a bíg, bláck blóck.

—with apologies to W. S. Gilbert

Now that a judge has thrown out California’s system of teacher tenure, it becomes more important than ever to ensure that decisions to fire “ineffective” teachers do not become arbitrary and capricious. Unfortunately, decisions based on “value”-“added” “metrics” are likely to be just that. Professor Polikoff, whose statement opens this posting, has carefully studied the correlation of VAMs and teacher effectiveness and found that it is at or near zero. If you don’t want to spend thirty dollars for his study, a Youtube clip is available in which Polikoff discusses his findings.

He notes in that clip that “state tests don’t seem to be sensitive to many of the things we think of as defining quality instruction.” If you have read my last posting, on Chesterton and the Beanstalk, you have a commonsense explanation why not. As it turns out, I can also oblige the Research Enthusiasts, this time with a study of science teaching[1]. This study opens with a bang: “Education reform usually arrives with fanfare, great expectations, and overconfidence. Truth be known, typical education-reform effects tend to be small. Evaluations, if done at all, burst the reform balloon, having difficulty finding effects. After some period of time, enthusiasm and financial support wane. The remnants of reform show but faint traces of the great expectations.” Does that sound familiar to victims, I mean implementers, of NCLB and RAce to the Top[2]?

Ruiz-Primo and her colleagues approached assessment by examining its proximity to the student. They rated assessments as having six degrees of separation, as it were, from the class, and they noted whether the assessments rated “declarative knowledge,” “procedural knowledge” or “strategic knowledge,” i.e., knowledge, skill, and understanding, in an “expanded idea of achievement”[3]. What they found was that “distant” assessments, e.g., state and national tests, were less likely to capture the varieties of learning than were “close” assessments, e.g., science notebooks. As they put it in their conclusions, “close assessments were more sensitive to changes in student performance, whereas proximal [i.e., distant] assessments did not show as much impact of instruction.” I have argued elsewhere that VAMs based on such assessments measure nothing at all, and that they are arbitrary and capricious in their effects.

When we compare using VAMs to make personnel decisions in education with the methods of Gilbert and Sullivan’s Lord High Executioner, we must regretfully conclude that His Lordship is on a solider footing than the VAM enthusiasts: at least he works for a “more humane Mikado” who “lets the punishment fit the crime.” In the case of VAMs, which bear little relationship to “teacher observables,” the people being punished have not even been detected in a “crime.”

[1] Ruiz-Primo, Maria Araceli et al. “On the Evaluation of Systemic Science Education Reform: Searching for Instructional Sensitivity.” The authors mean sensitivity to the effects of instruction, not sensitivity in instructors or their means. With becoming modesty they note that their conclusions are tentative.

[2] …and new math, and open classrooms, and whole language, and outcome-based education, and mastery learning, and …

[3] p. 374

Leave a Reply