Failure by the Numbers

As New York City prepares under a new mayor to make some basic changes in its “system” of holding schools accountable for their students’ performance, a few observations are in order. The first is that the “system” has generally been a failure. This much is admitted even by The New York Times, a qualified admirer of the “system,” whose editors say that only one in four of New York’s students now meet the Regents’ standard of “college readiness”—this although the “system” has been “in place” for a number of years. Amazingly, they task the new mayor with “bashing his predecessor” about these results. What should the new mayor have done? Thank him?

I would say not, after reading a report by the city’s own Independent Budget Office on the program. It is true that the IBO issues this faint praise of the “system”: that it “is a significant improvement on accountability methods based solely on standardized test scores.” The “system’s” first problem, of course, is that it used to be the “unimproved” kind—a kind that people of good educational judgment immediately recognized as worthless and harmful, an opinion later confirmed by research. But good educational judgment is what the businessmen and lawyers appointed by the old mayor lacked, as how could they not? And the “unimproved” “system” was never completely replaced.

The second problem is that this “significantly” improved “system” is itself not very impressive. I mean not just in that it has left three-quarters of New York’s students unprepared for college, but that its ostensible basis in statistical rigor ends up not meaning much.

Example 1: In the five years from 2006 to 2011, five “grades” of A to F were issued to nearly a thousand elementary & middle schools. Table 6 of the report shows that fifty-eight percent of these schools received three or more different grades in successive years. This extraordinary volatility of results was due not to successions of genius phases and vegetative states, but, as the report said, to the volatility of the method used to get them. The report specially noted that because of its volatility, the “system” had a very difficult time distinguishing the passing grade of C from the failing grade of D.

Example 2: The volatility was so extreme in some cases that schools changed by two or more letters in successive years[1]. The “system’s” answer to these changes eventually was to disregard them when the change was for the worse, and to accept them when it was for the better. As the IBO’s report drily notes, “observing such volatility should lower one’s confidence that the measure is capturing systematic rather than spurious differences between schools.” Indeed.

Example 3: The “peer-related” school evaluations, which supposedly rank like schools against like, are based on a formula that weighs the number of black and Hispanic children as 30% of its total and assigns the same weight to students with special needs (as signified by their having an Individual Education Plan (IEP))[2].  Thus are a school’s “peers” found. In the IBO report’s data we find negative correlations between test scores and numbers of students in these two groups. That is, more of these students mean lower scores. However, the coefficients of correlation, unlike the weightings in the formula, show that the components have rather different effects on the final numbers: -0.114 for black and Hispanic students, but -0.457 for students with special needs. A couple of things jump out at the reader: -0.114 is a rather weak correlation on which to base a third of a formula that can lead to consequential decisions, and equal weighting of factors with such different correlations seems evidently flawed. What is more, the table from which these data are taken shows that in many of the cells, the numbers lack an acceptable level of statistical significance (16 out of 40, or 40%).

Example 4: In “peer groups” of schools with like “results” in this demographic crap shoot, particular schools whose statistical “proximity to the horizon” of that group was more than two standard deviations from the mean were ignored as outliers. Teachers of a certain age will remember Jaime Escalante, the gifted Bolivian who taught at Garfield High School in East Los Angeles, helping to turn it for a few years into a comparative powerhouse of college preparation even among students who did not pass the AP calculus test. The difference between judgment and statistics is the difference between the movie Stand and Deliver and the blank screen of an outlier. Let the inquirer who favors judgment ask Escalante for his secret. He said, “The key to my success with youngsters is a very simple and time-honored tradition: hard work for teacher and student alike.”

Speaking of teachers: the last but not least problem, in the “value”-“added” teacher “metrics” that go along with the “system’s” school ratings, is that “value added is difficult to predict based on teacher observables.”  So we have a “system” of school ratings that do not strongly correlate even with the questionable data that they comprise, and we have a system of rating teachers with a weak connection to what they actually do. Bashing seems in order to me.

[1] Atlanta’s schools had such extraordinary changes, too, though we are now learning that those results were not due to statistical volatility.

[2] The other two factors are the number of students eligible for subsidized lunches (30%) and English language learners (10%).


Parents’ Day

A posting managed in the gaps between meetings with parents: I am writing this on Parents’ Day, a Saturday that my school (in Hong Kong) gives over to meetings with students and their parents at which report “cards” are distributed. Actually, they are not cards at all, but a page of scaled summaries and narrative comments by the teachers. Each teacher (like me) has a number of “advisees” whom he monitors, trying to get the big academic and extracurricular picture of each. The advisor hands out the reports at Parents’ Day meetings and is also available as a subject teacher for more detailed conversation on students’ subjects if parents wish it.

(The reports are prepared over a period of weeks, allowing for the accumulation of needed detail and the correction of mistakes. Exams are returned to students beforehand, also with ample time for care and detail in the marking.)

My duties are comparatively light this year: I am teaching only Grade 12, whose exams and reports follow a different schedule to accommodate the demands of exams by the Hong Kong Education Bureau and the International Baccalaureate Organization. I occasionally have a G12 drop-in, but it’s the younger students’ parents who attend these meetings more assiduously.

The usual Chinese practice is for the student and his parents to meet the teacher together. It is a great way to gain some understanding of the family dynamic that lies behind the student’s work. It also provides a chance, if need be, for the parents and teacher to conduct a full-court press on matters of concern. It is rarely or never a time for adversarial contests, confrontations or scenes. If the parents have some doubt about a teacher, they usually seek administrators’ advice in separate meetings, or meet with the teachers privately. And if by the time of Parents’ Day a serious academic or disciplinary issue has already developed, chances are that the parents would already have been called in for a meeting.

Having the meetings on a Saturday makes it more likely that both parents will attend, though some students come with mother or father only. One exception was my first meeting of the morning, with a Grade 10 student and his mother, who had set up a Face Time link with the father at work in Singapore. That was a first for me: talking to a screen as well as two live collocutors. The meeting was productive and helpful as a result of its being a four-way conversation, but I can see why people who can possibly do so prefer live meetings to e-meetings. (And why teachers and students should prefer live classes to e-classes!)





He who praises everybody, praises nobody.—Samuel Johnson

Since I first heard this line of Dr. Johnson’s, I have felt the rightness of it, and even as a boy I recognized or felt the wrongness of unmerited praise. Dr. Johnson thought that people who gave praise incontinently lowered its value. Some teachers and parents may believe that while this view is true in general, an exception would surely be praise for the child with low self-esteem.

As it turns out, a study reported in Psychological Science claims that extravagant or “inflated” praise is more harmful to children with low self-esteem than to those with high self-esteem. The harm is that the unfortunate children react to the over-praise by shying away from difficult tasks afterwards. The study did not clarify for me whether the kids refused to take risks because they became anxious about losing their praiseworthy status or because they treasured the comments and would rather have them than the satisfaction of a job well done. It was also unclear from the experiment whether extravagant praise over a long period of time has a different and more beneficial effect.

My own practice as a teacher is to acknowledge progress but to make strong or extravagant praise only rarely. I don’t like to become overblown, saying such things as “you are showing such wonderful subject-verb agreement!” or “I’m thrilled that you are using paragraphs!” Using descriptions like “workmanlike” or “solid and gets the job done” strike most students as honest and reliable, though the praise-addicts among my students would rather have more and sometimes resent me because they don’t get it.

For many years I used a stamp and an inkpad for summary praise. The stamp showed a fist with an extended thumb. For good work I would stamp “one thumb up,” and for outstanding work I would stamp “two thumbs up.” Thumb-up work would be posted on a special “Good Writing!” bulletin board. Because students believed I did not “praise everybody,” they looked forward to seeing new compositions posted—their own and others’. First-time postings for students who didn’t normally “make it” usually elicited pleasure, sometimes elation: “I’ve always wanted to get on that board!” said one pleased young man.

Since I opened with Dr. Johnson, let me close with him. Of published writing he could be a severe and caustic critic. During the controversy over the “ancient” Ossian “manuscripts” “found” by James McPherson, in which Johnson claimed that McPherson had written them himself, Dr. Hugh Blair asked him whether he really thought that any man of the modern age could have written them. JOHNSON: Yes, sir. Many men, many women, and many children. But he zealously promoted writing, too[1]. And with aspiring writers who came to him with the request that he give them literary advice and criticism he was usually gentle, though not dishonest. JOHNSON: I do not say that it cannot become good writing.

This seems like the right touch. One of my students, who would dearly love an IB grade of 7 in English, but who always gets 5s and 6s, came up to me and said, “I guess there is not much chance of my getting a 7.” I replied, “I don’t want to rule it out, but it seems unlikely, though I would be happy to have you prove me wrong.”

[1] There is a wonderful story of his helping Oliver Goldsmith get out of debtor’s prison by personally peddling The Vicar of Wakefield to the London booksellers and hurrying back to Goldsmith with the payment. His inspection of the manuscript is the subject of a famous painting that you can see at Dr. Johnson’s house in London.


Browsing for Teaching (and Otherwise)

This week I report on some reading I did on a couple of current stories related to teaching. There was a lot to read about one of them, the Beacon School lab fire in Manhattan. For those of you who haven’t read about it, a science teacher was doing a demonstration of combustion in a lab when fumes from the “combustion accelerant” (fuel) exploded outward, engulfing a student and critically burning him while injuring a classmate. It turns out that the school’s labs lacked needed safety equipment and procedures, and that this particular demonstration has a history of going dangerously awry.

One experienced science teacher and safety administrator was reported as questioning the idea of using a fuel known to produce inflammable fumes at room temperature for a show in which the students “look, look at the colors.” I like pretty colors as much as the next person, but I’d rather see them on a Matisse or a Helen Frankenthaler than in lab fires that can explode. I wrote a former colleague of mine, Dr. H., a retired chemistry teacher, to ask him about the demonstration. He said there is a safer way to handle these chemicals that involves making aqueous solutions of the salts and igniting them in a procedure he described using a Bunsen burner. He said that then the students can examine the controlled burns with hand-held spectrometers.

But one of the strange things about this demonstration is that there doesn’t seem to be much learning of chemistry involved.  Dr. H’s first assumption was that the fires would be set in order to provide students an opportunity to investigate the ignited chemicals’ properties. I don’t think he even envisioned lab fires as spectator sports. And the Beacon School’s website itself says that it focuses strongly on inquiry-based learning, though no inquiry is intended in this demonstration, which is not even an experiment. But I am not blaming the science teacher, apparently an earnest young woman who could not be expected to have Dr. H’s experience and understanding. I would like to know whether she and teachers like her have the chance to discuss their plans with other, more experienced teachers and to take advantage of this shared experience. One also reads that this demonstration was the object of a federal warning, not as widely circulated as it evidently should have been. Why not?

Properly equipped science laboratories are rather expensive, and I fear that some schools, in a misbegotten effort to save costs, cut corners. We do not know the explanation for Beacon’s unsatisfactory laboratories and must reserve judgment till we do know, but I have another former colleague, the science department chair at a school where he and I taught, who resigned rather than continue to teach in unsafe labs. The school accepted his resignation and did not make the needed improvements in its labs’ safety. So far no one has been caught in a lab fire there, but that is not very reassuring. I guess that New York City will be on the safety violations instantly, but it would obviously have been better to have a more effective pre-accident inspection and safety instruction program.

Speaking of investigations: more indictments and guilty pleas are coming out in the Atlanta cheating scandal. Having commented on it on and off for over three years, I am interested to see how things finally end up; but it looks as if the reprehensible former superintendent of Atlanta’s schools is in trouble. What puzzles me in retrospect is how she attained the great reputation she had before the tarnish started showing on her halo. See my posting linked above for some of my reasons, but consider what Stephen Jay Gould said in warning about being data driven: “If the data appear to be too good to be true, it’s because they probably are.” Look at the incredible improvements reported in Atlanta classrooms, and you will see what I mean—classrooms bursting with little Stakhanovs of learning who needed only a data-driven superintendent to “unleash” their potential, and nothing like the critical eye turned on them or their schools that was turned on the original Stakhanov when his heroic exploits in the mines were reported by Pravda. The wrongness of Atlanta’s data seems obvious now. Why didn’t it then?



Call Center Education vs. “the Living Voice, the Breathing Form, the Expressive Countenance”

As I was beginning this posting, one of my students came to me with a question. He had posted a draft paper on line, about which one of my written comments was that “your organization is a bit creaky.” I had not expected that comment to be precisely gettable, but I did expect it to spark a discussion, which is what it just did. In that discussion, following his question, I could tailor my remarks to his understanding, and he could then start to remedy the paper’s defect.

In such small matters (as well as large!) can we discern the difference between close learning and distance learning—online learning—call center education—whatever you want to name it. It is no accident that the desiderata quoted in the title of this posting come to us from John Henry Newman, James Joyce’s choice for best writer in English of the 19th Century, and an eminently able defender of liberal education. It is also no accident that corporate boards of directors still meet in person, that masses of people expert in the same things tend to congregate (finance: New York; IT: Silicon Valley; entertainment: Los Angeles), and that the best schools in the U.S. tend to be functional communities.

In a community, people see themselves as capable of communicating in a rich and precise way, tailored, as it were, to their particular concerns as expressed in sessions of question-and-answer or conversation. By contrast, what do people see themselves capable of doing or of getting when they direct questions to call centers? About as much as they would feel able to do in online courses “staffed” by “facilitators” instead of conducted by teachers.