Most of us have seen descriptions of what the letter grades mean. My own favorites are the simplest: A is excellent; B, good; C, fair; D, poor but passable; F, failing. It might be worth examining our sense of the five grades by trying to understand what they mean in particular cases, especially quizzes and tests. I fear that teachers and other people in the field do not often do so. Since to give a grade is to make a judgment, we should have a sense of particular qualitative elements of each grade, or at least have the sort of expertise and connoisseurship that can explain itself. However we do it, in whatever subject, we should be able to make some distinctions between A work and B.
A good assessment must give us what we need to make these distinctions. If a student of English is to get an A, he or she should do things with a depth, a flair, a thoroughness, a subtlety that a B student doesn’t bring to the task. The assessment should therefore give students the chance to show these qualities. Of course it should allow an A student to demonstrate knowledge, but the knowledge is best set in circumstances that also allow the demonstration of skill or know-how and the display of uncommon understanding. Similar distinctions should be possible between the other grades, or why have them?
(In effect, some teachers and schools do not have them, for everyone who attends their classes gets A’s and B’s. I am speaking now of comprehensive schools, not specialized ones with selective admissions. I would be willing to bet that teachers and administrators at such schools giving mostly A’s and B’s cannot readily tell the difference between excellent or good work and fair work without baloney. Of course it is possible that, like the statistically improbable students of Lake Wobegon, the students at such a school are all above average. It is also unfortunately possible that such places are loci of pedagogical or intellectual scandal, as in New York, some of whose “proficiency” tests for promotion could be passed by random guesswork. But let us assume competence, good will, and normal distribution of aptitude for the sake of this discussion.)
A corollary of this idea suggests that awarding grades for tests on a numerical continuum may undercut qualitative distinctions between grades. Imagine a multiple-choice test of one hundred questions. In a standard procedure used by many teachers, a student who answers eighty-nine of them correctly will get a B+, while one who answers ninety will get an A-. On what basis can the teacher giving that test assert that the 90 was excellent work but the 89 merely good?
Perhaps the teacher has devised ten questions unlikely to be answered correctly by anyone but an excellent student. That is fine, but it is probably more than many teachers do, or many of the test banks that go with textbooks and randomly generate multiple-choice and matching questions. Furthermore, the teacher who undertakes the extra work of trying to gauge questions to qualitative distinctions could still be undercut by students’ lucky guesses. Where is the distinction between excellent and good then?
It is often hard to make. Let us take a sample question:
The musical composition called “Emperor” is a. an anthem b. a concerto c. a string quartet d. an aria e. an opera
The answer to this question, combined with those to ninety-nine others like it on a test, might determine whether a student in, say, a music appreciation class (if such things still exist) was excellent, good, or worse. What makes correct answers to ninety such questions excellent work but to only 89 merely good work? Put this way, I think the question has no defensible answer unless an almost incredible amount of forethought went into the design of the test. (Now, professional test-writers make two or three times as much as professional teachers, but I wonder if even they give this kind of thought to the questions they devise.)
We must also ask How is guesswork discounted? If it is like most multiple-choice tests given in class, the answer is not at all. But let us say that on this test the teacher subtracts .2% for each wrong answer from the total of right answers. A student might have reasoned that no opera would have the title Emperor without an article, that an aria is named after its first line, which Emperor is unlikely to be, and that an anthem would name something or someone more particular, leaving answers b and c. If test-taking is like gambling—and many courses in “test-taking skills” make it so—the student has increased his odds of a right answer from .2 to .5, making a guess worth the chance. What does that guessed right answer—in effect a coin toss—have to do with music appreciation or even just musical knowledge? A further problem of this question is that the most obvious answer, b, is not the only correct answer, since that name is given not just to the piano concerto by Beethoven but also to a string quartet by Haydn.
If the student had been asked a short-answer question such as Identify a work containing variations on a theme, and name its composer, the student might have written “Haydn’s ‘Emperor’ quartet,” and the teacher could have been confident that the student knew her stuff. The problem of bad questions would have been sidestepped, and the teacher could at least have been confident that guesswork had been eliminated. (Of course, the teacher would still have had to know that both Beethoven and Haydn wrote compositions called “Emperor,” and he would need a reason for supposing that this fact was somehow representative of the knowledge to be tested and therefore worth including.)
But the fundamental question remains. Does such a test offer a way to distinguish between excellent work and good? I have my doubts. It might at least certify that a mind can retain factual detail, which is important; but where is the assessment of skill and understanding?
The problem demands a solution, and I hope to touch on some in future postings.
 The idea for this example comes from Professor Barzun’s discussion of Banesh Hoffman’s book The Myth of Measurement.