Why Most Published Research Findings Are False

That is the arresting title of a paper by John Ioannidis, an epidemiologist at Stanford. Even more fascinating is Professor Ioannidis’s discovery that of thirty-four randomized controlled studies published in three medical journals and later replicated, the results of forty-one percent “had either been directly contradicted or had their effect sizes significantly downgraded[1].” It turns out that experimental results are subject over time and replication to “the decline effect,” whereby initially astounding or superb results become less remarkable or even unremarkable in successive tests. If all this weren’t enough to induce terminal modesty in experimenters, there is in addition the widespread problem of publication bias and other forms of prejudice that lead experimental research wrongly in expected or desired directions.

I would not want to say that research is useless—perish the thought[2]!—but I would say that it is subject to many of the same kinds of error as “less scientific” ways of thinking, and a few that these less scientific kinds are proof against. William James describes a psychological problem he calls “mental vertigo,” in which the sufferer enthusiastically embraces propositions without any of the caution that a good education and wide experience can confer on someone with a generous natural endowment of common sense and critical intelligence. Because “research” is commonly and incorrectly supposed automatically to provide authoritative results derived by foolproof techniques, it actually can set up credulous people for a case of mental vertigo.

It is therefore unsurprising yet deeply disturbing to contemplate the exuberant and uncritical acceptance accorded by educationists to research findings of dubious sensibility or even doubtful sanity, followed by the findings’ vanishment in the same kind of silence as the kind that greets a faux-pas at a garden party. Do you remember the research showing that students don’t learn any more in enriched classes than in ordinary ones? Do you remember the research showing that teachers have no discernible effect on the learning of the students in their classrooms? Do you remember the research showing that classrooms should not have walls? Do you remember the research ramifying the dominions of the left and right brains? Do you remember the research showing that there is no transfer effect? Do you remember the research endorsing “whole-language” learning? Do you remember the research showing that team-building activities are effective, and the research showing that they are ineffective? Do you remember the research showing how English was supposed to be taught as a second language twenty years ago (and do you remember the whirligig of successive acronyms to describe the students who are learning it)? Do you remember the research showing that everyone should be taught to write as if he or she were a gifted and talented writer? Do you remember the research supporting “New Math” instruction? Once thought earth-shaking, these research results have fallen into oblivion.

A sign in an Austrian restaurant where I used to go during my college years said, “Ve get too soon oldt und too late schmardt,” of which a special case consists in getting excited about experimental results that, older and wiser, we laughed out of the room. But some educationists, ostensibly mature, “get never schmardt.” Off they go behind the latest Pied Piper, ready to jump again into the River of Educational Innovation. Unfortunately, they are not just eternally young and foolish; they are also undead, and they keep coming back, compelling teachers to adopt the next fad.

And what will that fad be? America’s dikeless Low Countries of Learning seem to attract their unfair share of inundations, I mean innovations, and I have dealt with a few of them in prior postings. But one potential Eternal Truth of the Year is suggested by a study in which the researchers spent $45,000,000.00 to discover that students can tell a good teacher from a bad one.

I am glad to hear that the expenditure of $45,000,000.00 has ratified a truth that I have known since I was nine years old, for now I can repose in the stability of research results. Or can I? Though the subtitle of the New York Times article in which the study appeared was “Ask the Students,” that is not what the researchers actually did. Instead, they required them to answer questionnaires by checking/ticking canned comments, identifying whether the comments applied to their teachers. We are assured that a Harvard researcher who has spent ten years refining student surveys is the designer of this one and the author of the potted replies.

That set off warning bells. How do the researchers guarantee that the choices allowed for students’ response are not tendentious? What does the reporter mean by “refine”? Does he mean “get bugs out”? What are these bugs that need ten years to get out, and how do we know that they have been got out? Does he mean “subtilize”? How do we know that crudities and gaps do not remain in the picture the questionnaire draws of the effective teacher? How do we know that it identifies more than—or no more than—a few middling marker-techniques of quality?

This last is important if we are to avoid question-begging. What marks the techniques as effective? If we say that it is their success in “adding value” to teaching and then show a correlation between them and value-addition in order to validate them as components of a questionnaire, we are making a circular argument. It is also important if we are to avoid publication bias, one of the intellectual vices described in the New Yorker article, or if we are to avoid a walk in the Garden of Interlocking Assumptions, where so many of education’s mutually self-confirming studies take us. And it is important if we are to establish a truth that does not wear off.

Roger Bacon said that experiments are necessary because they “put nature to the question.” This remark is usually quoted approvingly even though “put to the question” means “torture.” One thing we should have learned in the seven hundred twenty years since Bacon and the three hundred seventy years since the abolition of the Star Chamber is that torture guarantees only that we will hear the answers we want to hear from the tortured victim. Now, Nature, when questioned experimentally, does not necessarily scream answers on the rack. But as the New Yorker article suggests, sometimes she does when asked in a biased or loaded way. I think it reasonable to assume that the intellectual vices that sometimes vitiate the results of experimental science can also vitiate those of experimental education. This was one reason why Richard Hofstadter was leery of accepting the results of experimental psychology over the “collective experience of the human race,” something that historians, not experimentalists, are qualified by temperament, training, and experience to discover.

Such generalists are also qualified to recognize a good teacher. If school administrators had received a sound liberal arts education completed in a place that requires academic residency with its attendant humanity, instead of training as a specialist in educational research—or, more commonly now, business and finance—they could sniff out good and bad teachers. It also helps not to corrupt the process of evaluation. If it is true, as the Times article states, that most teacher evaluations consist in giving full marks with only cursory awareness of what a teacher does in the classroom, then it goes against my own experience, but it also argues, perhaps vainly, for a remedy at the administrative level of reliance on such old-fashioned intellectual virtues, established by the collective experience of the human race, as honesty. In a climate of educational corruption in which single schools can graduate nine valedictorians and teachers can spend all their teaching time prepping their kids for standardized tests, maybe we should not be surprised, though we should be ashamed, that they harbor shoals of perfect teachers when their students cannot muster even average scores on the PISA tests. This is not a problem of bad teaching; this is a problem of educational leadership. Unfortunately, the remedy proposed in “value-added learning” says that these same administrative structures that cannot sniff out a bad teacher, will know how to remediate one when he or she is identified by a statistical technique of dubious value. I guess that they will end up doing more termination than remediation if Diane Ravitch’s discussion of New York’s District Two[3] is any guide. Just what is needed to attract the teachers of tomorrow: make schools like Stalin’s Ukraine. Why didn’t we think of that before?

But if, as the Times article suggests, we have the evaluation of teachers by their students to look forward to, we can also expect another instance of the applicability of Campbell’s Law of corruption. We have seen how schools have responded to No Child Left Behind by gaming test preparation. We can therefore anticipate what will happen when Educational Science, with all its intellectual shortcomings, has spoken and the student has been installed as the evaluator of his teachers. To go full circle: it is suggestive that Professor Ioannides, mentioned at the beginning of this posting, is an epidemiologist.

[1] If you subscribe to the digital edition of The New Yorker, look at the December 13, 2010 issue on page 56 under an article by Jonah Lehrer called “The Truth Wears Off.” Otherwise, ask your subscribing friend to print the article for you.

[2] Consider, for example, Jerome Bruner and Neil Postman’s brilliant experiment using playing-cards with red spades, cited approvingly by Thomas S. Kuhn in The Structures of Scientific Revolutions in his explanation of how shared paradigms (like the Garden of Interlocking Assumptions?) actually shape our perception.

[3] See Chapter 3 of The Death and Life of the Great American School System.

Leave a Reply