Rick Hess’s Mistake: Failure of Test-and-Punish Is Not Limited to a Few Districts That Have Disappointed

Frederick M. Hess, the director of education policy studies at the American Enterprise Institute, has always been a corporate education reform kind of guy. That is why Hess’s honest analysis this week of the ultimate fraud of a succession of school district miracles—Washington, D.C.’s test score and graduation rate miracle under Michelle Rhee and those who followed her, Alonzo Crim’s Atlanta in the 1980s, Houston’s Texas Miracle under Rod Paige, Arne Duncan’s Chicago, and Beverly Hall’s Atlanta—is so refreshingly candid.

In all of these cases, as Hess points out, there was “a remarkable dearth of attention paid to ensuring that the metrics (were) actually valid and reliable.”  Second, it was “tempting for civic leaders and national advocates to accept happy success stories at face value—especially when they (were) fronted by a charismatic superintendent.” And finally “reformers and reporters (made) things worse with their lust for ‘celebrity superintendents’ and ‘model systems.’ Their fascination nurtur(ed) an echo chamber in which a handful of leaders (got) exalted, often for too-good-to-be-true results.”

One must give Hess credit for honestly admitting the failure of so much of what his own kind of school reformers have been exalting for the past quarter century—business school accountability for schools, driven by universal standardized testing, and evaluated by two primary outcomes—standardized test scores and graduation rates. But Hess makes a mistake when he attributes the problem to a few “model” school districts that have disappointed.

Hess’s explanation is inadequate.  Inadequate because the system itself—the whole idea of school reform based on high stakes testing—cannot work.  Daniel Koretz, the Harvard specialist on testing, tells us why in a recent book: The Testing Charade: Pretending to Make Schools Better.

Koretz defines the problem with high-stakes-test-based school accountability by exploring a primary principle of social science research. Forty years ago, Don Campbell, “one of the founders of the science of program evaluation,” articulated a core principle now known as “Campbell’s Law”: “The more any quantitative social indicator is used for social decision making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.” (p. 38)

How does Campbell’s Law describe the dilemma Frederick Hess identifies?  Koretz quotes Don Campbell himself describing the distortion that will follow when high stakes consequences are attached to a school district’s capacity to raise its aggregate test scores: “Achievement tests may well be valuable indicators of… achievement under conditions of normal teaching aimed at general competence.  But when test scores become the goal of the teaching process, they both lose their value as indicators of educational status and distort the educational process in undesirable ways.” (p. 39)

In The Testing Charade, Koretz provides extensive evidence about all the ways high stakes tied to test scores have triggered Campbell’s Law—to invalidate the test results themselves and to undermine our education system and the experiences of teachers and students trapped by No Child Left Behind and the Every Student Succeeds Act in a scheme to raise test scores at all costs.

One consequence is score inflation: “All that is required for scores to become inflated is that the sampling used to create a test has to be predictable… For inflation to occur, teachers or students need to capitalize on this predictability, focusing on the specifics of the test at the expense of the larger domain.” (p. 62)  We read about all the ways curriculum designers and teachers are incentivized to focus their classes on the specific elements of any particular academic discipline that have appeared on previous tests.

A second consequence, related to the first, is flat-out test-prep. Test prep narrows what is taught to students to the material that is tested and drills students about using clues in the test itself to come up with the right answers. Koretz identifies three kinds of bad test prep. Reallocation between subjects has been common when schools emphasize No Child Left Behind’s tested subjects—reading and math—and cut back on social studies, the arts, music and recess. Reallocation within subjects is when schools study past years’ versions of the state tests and ask teachers to focus on particular aspects of a subject.  Finally there is coaching. Schools and test-prep companies teach students to respond in a formulaic way to the format of the questions themselves. Koretz explains why all this has implications for educational equity: “Inappropriate test preparation, like score inflation, is more severe in some places than in others. Teachers of high-achieving students have less reason to indulge in bad preparation for high-stakes tests because the majority of their students will score adequately without it—in particular, above the ‘proficient’ cut score that counts for accountability purposes. So one would expect that test preparation would be a more severe problem in schools serving high concentrations of disadvantaged students…. Once again, disadvantaged kids are getting the short end of the stick.” (pp. 116-117)

And a third consequence, demonstrated in every one of Frederick Hess’s examples is cheating. Koretz examines the biggest cheating scandals, notably Atlanta, Philadelphia, and Washington, DC.  He notes: “Cheating—by teachers and administrators, not by students—is one of the simplest ways to inflate scores, and if you aren’t caught, it’s the most dependable.” Sometimes teachers or administrators erase and change students answers; sometimes they provide teachers or students with the test items in advance; other times teachers give students the answer during the test.  And finally sometimes schools “scrub” off the enrollment rolls the students who are likely to fail.

Koretz presents the questions around cheating by educators as morally fraught. After all, test scores are not simply a proxy for the quality of a school or a school district:  “One aspect of the great inequity of the American educational system is that disadvantaged kids tend to be clustered in the same schools. The causes are complex, but the result is simple: some schools have far lower average scores—and, particularly important in this system, more kids who aren’t ‘proficient’—than others. Therefore, if one requires that all students must hit the proficient target by a certain date, these low-scoring schools will face far more demanding targets for gains than other schools do. This was not an accidental byproduct of the notion that ‘all children can learn to a high level.’ It was a deliberate and prominent part of many of the test-based accountability reforms… Unfortunately… it seems that no one asked for evidence that these ambitious targets for gains were realistic. The specific targets were often an automatic consequence of where the Proficient standard was placed and the length of time schools were given to bring all students to that standard, which are both arbitrary.” (pp. 129-130)

In a system that, by its very structure, is guaranteed to trigger Campbell’s Law, Koretz wonders about the moral implications of cheating: “Just who is responsible?  Is it just the people who actually carry out the fraud or require it?  Or are those who create the pressures to cheat also culpable, even if not criminally?” (p. 91)

Like Frederick Hess, Daniel Koretz recognizes that although outcomes-based, test-and-punish school accountability has been hyped and celebrated, ultimately this kind of school policy has not improved schools as promised.  Koretz digs deeper, however, to expose that the system itself—not merely its abuse by particular educators in particular school districts—is deeply flawed.

Koretz concludes: “It is no exaggeration to say that the costs of test-based accountability have been huge. Instruction has been corrupted on a broad scale. Large amounts of instructional time are now siphoned off into test-prep activities that at best waste time and at worst defraud students and their parents.  Cheating has become widespread. The public has been deceived into thinking that achievement has dramatically improved and that achievement gaps have narrowed. Many students are subjected to severe stress… The primary benefit we received in return for all of this was substantial gains in elementary-school math that don’t persist until graduation… On balance, then, the reforms have been a failure.” (pp. 191-192)