Exorcise for Health

[This is a reworking of an old posting.]

Much has been made of a study[1] that shows a correlation between the “effectiveness” of teachers as determined by the scores of their students on “value-added metrics” and these students’ success in their later lives as determined by “markers.”  This muchness put me in mind—again—of Flannery O’Connor’s remark that “[t]he devil of educationalism that possesses us is the kind that can be cast out only by prayer and fasting.” In my less sanguine moments I lack O’Connor’s optimism and wonder whether even prayer and fasting will always work, though hope springs eternal. I wonder what could possess whole communities of educators to be stunned by a complex statistical study incorporating years of data on millions of students when it concludes that children with good teachers do better than children with bad teachers. One of the devils in the legion seems to be rather dim, but in examining the report I will try to give the devil his due, and to draw attention to some of his partners.

The original report is impressive in its thoroughness and the care with which its authors make and qualify their claims, even to the point of lessening the value of their study. They note, for example, that teachers in the study were not “incentivized based on test scores.” That means they were not plagued by the temptation and results of cheating, teaching to tests, and other “distortions in teacher behavior” that make the basis of value-addition different from what it would be in a population whose members had been “incentivized”—that is, in the real world of Atlanta, Tennessee, and New York. We are expected, nevertheless, to believe a study of “non-incentivized” teachers has something to say to districts whose teachers were looking over their shoulders at the Value-added Reaper as he made his progress through their ranks. Two additional problems with applying this study to teachers in RAT[1] programs are that the use of “value-added metrics” encourages teaching to tests (the most-purchased books in the New York schools are books of preparation for tests) rather than to goals beyond the classroom, and that evidence in research as well as the educational experience of the human race shows that teachers who teach to tests get worse results than teachers who don’t.

The authors of the study caution that some elements of the value-added equation require “observing teachers over many school years” and may not apply in a “high stakes environment with multitasking and imperfect monitoring”—which is, precisely, the kind of environment in which hasty consequential decisions will be made on the basis of imperfect applications of the equation over the short term.

They point out as a justification for their aggregate numbers that “observable characteristics are sufficiently rich so that any remaining unobserved heterogeneity is balanced across teachers,” but those who want to use “value-added metrics” to make consequential decisions will be applying the equation to particular individuals without correction for “unobserved heterogeneity.”

They note that their study did not include the effect of peers and of parental investment in value-addition. While everyone agrees that the teacher’s effect on what students learn is pronounced, this seems like a tremendous omission that could have serious undeserved consequences for the teachers whose students’ peers and parents had  significant bad effects on the learning for which the teacher is held exclusively responsible.

The authors state that the study’s assumptions “rule out the possibility that teacher quality fluctuates across years.” Can such an assumption be valid? Raise your hand if the quality of your teaching was as good in your first year of work as in your tenth. No hands? Of course not!

Turning from my bright class back to the study, I have some further questions. The study claims that “value added is difficult to predict based on teacher observables.” Does this amazing claim mean that the study advocates using a “metric” of evaluation with no observable connection to the behavior being evaluated? Or does it mean that unlike their students’ work, the teachers’ work cannot be observed, diagnosed, and corrected? What has happened to cause-and-effect and lifelong learning?

I want to understand in non-mathematical terms how “academic aptitude” is factored into the equation so that teachers will not be “penalized” for taking classes of difficult or refractory students. It seems to be a single number (ηi) in the equation, but how is it derived? A lot hangs on the way teachers are “made responsible” for students’ work when their aptitude for it may have a share in the responsibility.

I would like to know how many years’ value-added ratings they think a teacher should receive before the ratings can be said to reflect his or her actual performance, and I would like to understand the basis for this determination. It is one thing to say that we have some aggregate statistics that show teachers in general have certain effects on their students in the long run, and a rather different thing to say that these statistics can reliably rate individual teachers in one or two goes. This is particularly true given that the authors themselves say some elements of the value-added equation require “observing teachers over many school years.”

Having asked my questions I now make a couple of observations. One of the study’s authors, according to The New York Times, says that value-added metrics should be used even though “mistakes will be made” and “despite the uncertainty and disruption involved.” It is disturbing to see someone so fastidious in the drawing of conclusions become so sweeping and remorseless in applying them, particularly when the study itself has just spoken to the need to “weigh the cost of errors in personnel decisions against the mean benefit from improving teacher value-added.”

The problem with basing decisions on “mean benefits” is that they have particular consequences. The authors have said that they think it would be more cost-effective to fire ineffective teachers (even mistakenly ineffective ones) than to give bonuses to effective ones. It is time for people who say stuff like this to start “balancing” cost-effectiveness and ethos-effectiveness. Who is going to be attracted to a profession governed by such principles and assumptions? “Drifters and misfits,” as Hofstadter called them? And if no teacher behavior correlates to “value” “addition,” what prospective teacher will join a profession in which it cannot be said with confidence what he needs to do in order to be successful?

Time for a little exorcism!

Let me end with a note on the ad-hominem stereotype that people who are against value-added “measurement” are unionists, educational bureaucrats, or people with tenure to lose in a change of system. In my twenty-five years as a teacher I have never worked within a tenure-granting system.  I have never been in a union shop, nor have I been a member of a teachers’ union. I have never held an administrative position in education except that of Department Head. I have never worked in a teachers’ college. If I am against the kind of practice discussed in this posting, it is not because I have a hidden interest. It is because it seems wrong. I mean both wrong-headed and culpable.


[1] “The Long-term Impacts of Teachers: Teacher Value-added and Student Outcomes in Adulthood” by Raj Chetty, John N. Friedman, and Johah E. Rockoff of Harvard.

[1] RAce to the Top


Question Time

Most teachers, including me, somehow know that much in teacher training depends on the classroom practicum, where ideally we learn to put into practice what we theorized about in our teacher-prep lessons—or, sadly, where we learn what we should have been taught in those lessons, but were not. A third, even worse possibility is that the practicum is as bad as the classroom studies. My teacher preparation veered between the second and third kinds: though one of my four cooperating teachers was a brilliant model, the other three were absentee landlords. My supervising teacher, a nameless apparition, mysteriously appeared twice during my four months of preparation like the Angel of Bethesda except that she worked no miracles. If she was transparent, it is because she was invisible. Fortunately, I had a lot of compensatory support from my faculty colleagues during my first year of teaching. Two colleagues visited my classroom and commented on my lessons; they and others allowed me to watch their teaching, where I was like a dry sponge in Lake Superior.

A large study[1] released only this week offers the cold comfort of validating my impression, claiming that in general, the preparation of teachers in the U.S. is rather poor[2]. The study is particularly hard on the student teaching programs it reviewed[3], rating only 7% of them as providing “strong support [to student teachers] from program staff and cooperating teachers.” In case you wonder, “strong support” means certifying the quality of the cooperating teachers, requiring supervising teachers to make at least five classroom visits, each with written feedback, and having a clear plan for helping unsuccessful student teachers deal with the bad news they must receive. If you are shocked that only 7% of the teacher-training programs reviewed provided these seemingly sensible and necessary elements of good student teaching support, you clearly do not know what is going on in American teacher education. One wants to ask how things could be so bad.

But there are other questions to ask, not just of the targets of this study, but of the study itself. Why, if the methods it recommends for certifying student teachers work as well as they do, can we not apply those methods to the evaluation of already certified teachers? Why, if administrators would be spread too thin in doing so, could schools not adopt peer-review programs to complement review by administrators?  Why does this study prefer a narrative-based qualitative method for evaluating student teachers but then adopt the discredited “value”-“added” method of evaluating certified teachers and cooperating teachers? How can we be sure that an administrator knows good teaching when he sees it? Sauce for the goose is sauce for the gander, if I may include a non-quantitative consideration in this discussion.


[1]Teacher Prep Review”: At last, a clear and unpretentious title!

[2] Indeed, many of my American colleagues/friends seem to have regarded it as like the hazing one undergoes in order to join a fraternity.

[3] Standard 14, p. 50


Comparatives, with and without Superlatives

G.K. Chesterton said that, as used in modern times, the term “‘progress’ is a comparative of which we have not settled the superlative.” It would disturb the elegant compression of that line to add “…or, in education, the positive either[1],” but I will risk some inelegance to make the point that educational progressives or reformers often open campaigns whose ultimate objectives and sense of present deficiency are unclear or unsettled.[2]

Take for example the Common Core. Its ultimate objective is universal career and college readiness by the end of Grade 12. It sounds noble, but to any precise meaning of the aim we are far from having settled down. I delight in the thought that Common Core graduates in their millions will have read and understood Chesterton, as the curriculum requires them to do[3]; but I am skeptical that this is what will actually happen. In short, I see here a comparative without a superlative.

Nor is the positive from which we are mandated to progress very clear. Is the reason students need progressive measures that the curriculum they study is now unsatisfactory? That the teachers who teach it are unfit? That they themselves are feckless? To each of these problems—if they are real—different remedies would need to be applied. A new curriculum will do no good if the teachers are poor and the students unmotivated; while if they are good and motivated, it may be unnecessary. Diane Ravitch has repeatedly pointed out that there is not much study behind the Common Core to determine whether it will do what it is supposed to do.

But I want to talk about personal progress now. At the school where I teach, the teachers work for the most part in shared faculty offices rather than their classrooms. Two gains emerge from this way of working. One is the stronger sense of shared professionalism one finds in a faculty who literally as well as figuratively work together. The other is the constant stream of ideas, suggestions, discussions, resolutions that I encounter when speaking informally to my colleagues there. The likelihood that these kinds of improvement will occur in an atomized school where teachers decamp to their private rooms is far less than in arrangements like my school’s. If it is true that education schools produce undistinguished teachers, how will they learn to be distinguished in an environment with little opportunity to gain from what their colleagues have to offer?

There are times when I miss my privacy, and sometimes, to get it, I will move to a quieter spot for work and study. I also miss the decorative variety that some teachers bring to rooms that have become their turf. But I would hate to give up the flow of ideas about teaching and thoughts about students that a shared workspace brings, though obviously such communication can take place in all kinds of space[4]. And I appreciate that, unlike the grand superlatives discussed above, a highly achievable and notable improvement is taking place in my own particular workspace. I would even go so far as to call it progress.

[1] Richard Hofstadter may have had such vague degrees of comparison in mind when he said that “America was the only country that started with perfection and aspired to progress.”

[2] Charles Saunders Peirce said that “truth is that to which the community ultimately settles down,” but Bertrand Russell was willing to accept some unsettled truths provisionally. The test of whether he should have done so is pragmatic, not ideological.

[3] Or to have read and understood another, comparably challenging, author.

[4] Some of my colleagues and I used to meet for this kind of discussion, and others, in Mr. O’s classroom, or by the door outside it.


Reprise: A Philosophy of Baloney

[Time for another look at this posting of mine from a year ago.]

An old joke has it that when you mate a crocodile with an abalone you get a crock o’ baloney, but surely there must be other ways of producing it: how does such an abundance of baloney come to appear in the field of education? Why are so many educationists also balonists[1]?

One respected philosopher says that a balonist (not his word) is primarily concerned not with telling the truth but with promoting or protecting himself, or with keeping the boat he is on from being rocked. Such a person’s relationship with the truth is therefore accidental and opportunist; it yields truth claims that are phony. One current truth-tussle can illustrate.

Four professors, from Stanford, Cal Berkeley, and the University of Arizona, have been studying “value-added models” (VAMs) of evaluating teachers[2]. Here are just some of the results:

  1. At least seven factors other than the individual teacher figure in students’ success. These include home and community supports and challenges, peer culture and achievement, and of course the specific tests used to “measure” “achievement.”
  2. VAMs are inconsistent. Only 20% of teachers rated at the top or bottom of their district rankings retained those ratings in the following year, and when rated by different tests, 40 – 55% of teachers got “noticeably different scores.”
  3. Teachers’ value-added “performance” is affected by the students assigned to them. One set of figures documents the experience of an English teacher whose rating changed from the first (worst) to the tenth (best) decile from one year to the next. The change was attributable not to his sudden emergence from a vegetative state, but to the fact that his students in the second year numbered fewer English learners, Hispanic students, and low-income students and more students with well-educated parents.
  4. VAMs can’t disentangle these other factors influencing students’ (and “therefore” their teachers’) performance. Take for example an elementary school teacher who had been voted Teacher of the Month and Teacher of the Year in Houston, where her supervisor had rated her as “exceeding expectations.” She was fired as a result of her VAM scores, which showed wide fluctuations across and within subjects. These scores did not correct for her lower value-added in 4th grade, when English learners are mainstreamed in her school district. Take also the VAM scores of teachers that “flip-flopped when they exchanged assignments.” When such stories start to circulate, guess how many teachers will accept assignments to classes with disadvantaged students!

Other ways of evaluating teachers, discussed at length in this article and in passing in these postings, are available and have been shown to work. Why, then, do we see such reliance on VAMs?

One answer is in the nature of a balonist. If his primary purpose is to serve not truth but himself, he does not particularly care what the truth is. Another, in this case, is in the nature of this particular baloney. Though rank and gross in nature, it seems to simplify and explain so much, and to deflect blame so effectively from the balonists using it, that it is irresistible to them. Finally, it jibes with a public tendency to be satisfied with crude methods of identifying and punishing members of undesirable classes of people. A complex problem can be simplified. Villains can be “found” and eliminated. The phoniness of the baloney doesn’t matter. The balonists—say, a cabinet secretary or the superintendent of an urban school district—can be seen as “tackling problems” and “making tough decisions.” What could be more desirable, except the truth?


[1] This term, indispensable when talking about education, can be found in The Didact’s Dictionary. A balonist produces his own hybrid of humbug and b*******.

[2] “Evaluating Teacher Evaluation” by Linda Darling-Hammond et al., Phi Delta Kappan, March 2012. I thought this article well worth the five dollars it cost me to download it.


Objective Nonsense

Much mischief would vanish from educational discourse if the terms objective and subjective vanished first. Since that happy consummation is unlikely to take place, we must be careful how we use them. Currently, usage seems to coalesce along a continuum, on which the “subjective” side is the side of judgment, opinion, emotion, evaluation, flightiness, and interiority, while “objective” refers to measurement, fact, rationality, disinterestedness, groundedness, and consensus.

A moment’s thought will show us that this continuum is not very clear or very helpful. For example, many of us profess the value of statistical significance in scientific studies. Fine, but the .5 level of statistical significance as a gold standard for data is an entirely arbitrary construct—a subjective construct, if you will. I don’t mean to downgrade the validity of the concept of statistical significance, but to suggest that the usual subjective/objective=bad/good opposition is not a very helpful way of analyzing its value. The same goes for “value”-“added” “metrics,” whose sometimes huge “margin of error” is simply ignored by the people using the statistics as a way of evaluating teachers. One case highlighted a teacher whose “metrics,” when taken with the needed caution about margin of error, showed that she might be the worst teacher in New York—or better than half of them. Partisans of the idea that numbers confer precision were not even discomfited by such results: they simply ignored margin of error in using VAMs. Sounds rather arbitrary and flighty to me.

When The New York Times reports that “subjective” evaluations of teachers don’t work in Texas, one must turn over the story a bit to see that the real problem is not “subjectivity.” That is a red herring. The real problem is twofold.  Evaluators who are trapped in their offices by balls & chains of paperwork, or who stay there by choice in regal disdain of teachers, are unwilling or unable to get out to the classrooms very often. One solution would be peer evaluation. Another would be to cut the burden of administrative paperwork.Implementing such solutions has nothing to do with replacing “subjectivity” by  “objectivity;” rather, it requires replacing an inadequate and arbitrary system of judgment by an adequate one well grounded in good sense. The solution would also require people who see a spade to call it a spade. Gwendolyn Fairfax’s superb dodge won’t do[1].

The honest art of judgment lacks the magical appeal of numbers and formulae, but it allows—requires—the people using it to look each other in the eye and themselves in the mirror. That is not a question of objectivity vs. subjectivity; it is a question of intellectual and moral courage.

[1] In The Importance of Being Earnest: GWENDOLYN (satirically). I am glad to say that I have never seen a spade.


From Rapping to Teaching

At St. John’s College, Cambridge, they still say grace in Latin before dinner, a gong signaling the students when they may begin to eat. This is no surprise to one of my seniors, who will be going to another Cambridge college where they say grace in Latin. He is hoping to join one of the choirs there, perhaps even King’s College Choir.

But what moved the singer & rapper Niyi to throw over his musical career to study English and education at St. John’s so that he can become an English teacher? His chief inspiration came from the English teachers he had while studying his A-levels. And what keeps him going at Cambridge even though, he reports, his first exams were “a disaster”? He counts no fewer than eleven people there ready to help him out, from the woman who makes his bed to the chaplain, to his senior tutor, to his individual teachers.

This is obviously a place that cares, in its way, for the success of its students. While it would probably be impossible to duplicate this level of care in most colleges, it is worth remembering that Niyi thanks people for his success, not software. Something tells me that at the right school he will be a great success as a teacher, and that like the drama teacher in my recent posting on Rooms of Requirement, he will have a classroom in which humane values prevail.


Rooms of Requirement

The dream school would have a “Room of Requirement,” that deus ex camera in the Harry Potter books that provides whatever its users require. It is also the attic of the ages, a repository for everything from Harry’s dangerously annotated potions book to Professor Trelawney’s empty sherry bottles.

But if a school can’t have a Room of Requirement, the next best thing would be a room like my school’s “Drama Storeroom” presided over discreetly by the drama teacher from a curtained-off desk. Unlike the Room of Requirement, the Drama Storeroom, contrary to what its name suggests, is, yes, a storeroom, but it is also and maybe more importantly a refuge. Things in storage are not quite so heaped & crowded that there isn’t room for stuffed furniture, desks, and some equipment.

The last time I had to go in, something I rarely and reluctantly do, I bumped into one of my seniors. He is finished for the year, having taken his IB exams, but he was there to finish the editing of the movie he has been producing and directing for the last two years, which will have its premiere next month. An 11th-grader was on the far side of the room playing his guitar. Two other 11th-graders were half working on their project and half loafing at ease and inviting their souls.

The reason I am reluctant to go in is that the drama teacher is one of those individuals who can be cool without being lax. Her room is therefore an entirely comfortable place for the students who hang out there, whether they have a project, a song to sing, or just the need to decompress. Of course, in the run-up to a play or musical, the Drama Storeroom is a beehive, but even then it remains a room of ease as well as a locus of work.

Such it was recently, when the school musical played to full houses. Book, lyrics, and music were written by the drama teacher; the instrumentals were produced by a graduate; of course the vocals were live. It was a pleasant show, though it entailed a lot of work on the students.

One day one of those students asked to be excused from an essay test, so I wrote the drama teacher saying he was welcome to go if he really had stuff to do, but I didn’t want him to use the preparations as an excuse to cut a class. (I am not cool.) The drama teacher wrote back saying that he was indeed needed, so I in turn said “Fine, just make sure to wag a finger at him if he is being sly.” He ultimately showed up for the essay test, so I guess they had a little chat. He felt no upset or animus towards me, showing that whatever the drama teacher had said, it was just the right thing to say, as usual.

This teacher is clearly a paragon, but if she taught in Florida, or Tennessee, or other RAT[1] states, she would not be evaluated on her knowledge, on her hard work, on her discretion, on her sympathy, on her insistence on high standards, and on her humanity. Instead, she would be evaluated at least partly on how students did on “objective” tests, and these not in drama but in other subjects. Some of the students might not even be her students!

Not far from the drama storeroom is an incommodious but acoustically friendly place that I call the Cave of Music because so many of the school’s instrumentalists use it as a kind of practice space. Most of them are in the Chinese orchestra, but one plays his marimba there. One day he was practicing away, and I realized that the piece he was playing was the praeludium to Bach’s first suite for unaccompanied cello. I couldn’t resist going by, and I found him there. “Can you play that praeludium through?” I asked. “You mean…” and he played the first two measures. I nodded. He then played the entire movement for me with the poise of a polished performer. When he ended I applauded, and he took a bow. Though it was not quite like hearing the late Janos Starker play the same movement, it had its charms. The music that comes out of the Cave is a fine thing, whether Chinese classical music or Bach on the marimba. What the Cave of Music shares with the Drama Storeroom is that it is a humane space for something besides test preparation.

[1] RAce to the Top