How has high stakes testing ruined our schools and how has this strategy, which was at the heart of No Child Left Behind, made it much more difficult to accomplish No Child Left Behind’s stated goal of reducing educational inequality and closing achievement gaps?
Here is how Daniel Koretz begins to answer that question in his 2017 book, The Testing Charade: Pretending to Make Schools Better: In 2002, No Child Left Behind “mandated that all states use the proficient standard as a target and that 100 percent of students reach that level. It imposed a short timeline for this: twelve years. It required that schools report the performance of several disadvantaged groups and it mandated that 100 percent of each of these groups had to reach the proficient standard. It required that almost all students be tested the same way and evaluated against the same performance standards. And it replaced the straight-line approach by uniform statewide targets for percent proficient, called Adequate Yearly Progress (AYP)…. The law mandated an escalating series of sanctions for schools that failed to make AYP for each reporting group.” Later, “Arne Duncan used his control over funding to increase even further the pressure to raise scores. The most important of Duncan’s changes was inducing states to tie the evaluation of individual teachers, rather than just schools, to test scores… The reforms caused much more harm than good. Ironically, in some ways they inflicted the most harm on precisely the disadvantaged students the policies were intended to help.”
Koretz poses the following question and his book sets out to answer it: “But why did the reforms fail so badly?”
I recommend Daniel Koretz’s book all the time as essential reading for anyone trying to figure out how we got to the deplorable morass that is today’s federal and state educational policy. I wish I thought more people were reading this book. Maybe people are intimidated that its author is a Harvard expert on the design and use of standardized tests. Maybe it’s the fact that the book was published by the University of Chicago Press. But I don’t see it in very many bookstores, and when I ask people if they have read it, most people tell me they intend to read it. To reassure myself that it is really worth reading, I set myself the task this past weekend of re-reading the entire book. And I found re-reading it to be extremely worthwhile.
The book divides into three parts—an introductory section of several chapters—six or seven chapters in the middle that dissect the way high stakes testing has undermined education and damaged the education of our nation’s poorest children—and some wrap-up chapters. It is the middle part that is essential. While Koretz has some ideas near the end about where we go from here, his analysis of the damage caused is the crucial part. After all, this section at the heart of the book addresses the conversational dilemma many readers of this blog must face as often as I do. What can you say to the person who doggedly tells you that a particular school is a fine school because its scores are high and another school is a failure because its test scores are so low? This person, often well-intentioned, has lived with test-based school accountability for so long that he cannot imagine there is any other way to consider school quality. And anyway, he says, standardized testing is what we have to evaluate schools, so it’s what we need to use.
Koretz explains a 40-year-old social science rule first articulated by Don Campbell, who Koretz identifies as “one of the founders of the science of program evaluation.” Here is how Campbell stated what we now call “Campbell’s Law”: “The more any quantitative social indicator is used for social decision making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.” The rest of the central chapters in Koretz’s book explain precisely how the use of high stakes punishments tied to low test scores has triggered Campbell’s Law. What are the high stakes punishments? First came the school turnarounds prescribed by No Child Left Behind —firing the principal and half the teachers, closing the school, charterizing the school. Later Arne Duncan added the evaluation of teachers by students’ test scores—and schemes rewarding teachers whose students scored well and firing the teachers whose students post low scores. Koretz summarizes No Child Left Behind’s test-and punish strategy: “The reformers’ implicit assumption seemed to be that many teachers knew how to teach more effectively but were being withholding, and therefore confronting them with sanctions and rewards would be enough to get them to deliver.”
Three chapters explore how No Child Left Behind’s test-and-punish strategy has distorted schooling itself and has undermined how teachers teach and how students learn.
- Score Inflation: When the state achievement tests mandated by No Child Left Behind—the ones that would bring negative consequences for schools and teachers—were compared by experts like Koretz himself to another “audit” test such as the National Assessment for Education Progress (NAEP), which has no high stakes consequences, the researchers discovered that while the scores on the state test rose rapidly, NAEP scores remained flat. Koretz comments: “(I)ncreases in scores are meaningful only if they signal similar increases in mastery of the domain. If they do generalize to the domain, gains should appear on other tests that sample from the same domain.” He continues: “(A)ll that is required for scores to become inflated is that the sampling used to create a test has to be predictable… For inflation to occur, teachers or students need to capitalize on this predictability, focusing on the specifics of the test at the expense of the larger domain.” And there are equity concerns here, because score inflation has occurred more often in schools serving poor students: “Ongoing work by my own group has shown… that it is not just the poverty of individual students that predicts the amount of inflation but also the concentration of poor students in a school… (S)chools with a higher proportion of poor students showed greater average inflation.” Teachers under pressure are finding a way to raise test scores without really teaching the students the material they are supposed to be learning. Some schools have also inflated overall scores by focusing primarily on children right at the pass/fail level and paying less attention to students far behind.
- Cheating: Koretz examines the big cheating scandals, notably Atlanta, Philadelphia, and Washington, DC. He notes: “Cheating—by teachers and administrators, not by students—is one of the simplest ways to inflate scores, and if you aren’t caught, it’s the most dependable.” Sometimes teachers or administrators erase and change students answers; sometimes they provide teachers or students with the test items in advance; other times teachers give students the answer during the test. And finally sometimes schools “scrub” off the enrollment rolls the students who are likely to fail.
- Test Prep: Test prep narrows what is taught to students to the material that is tested. Koretz identifies three kinds of bad test prep. Reallocation between subjects has been common when schools emphasize No Child Left Behind’s tested subjects—reading and math—and cut back on social studies, the arts, music and recess. Reallocation within subjects is when schools study past years’ versions of the state tests and ask teachers to focus on particular aspects of a subject. Finally there is coaching. Schools and test-prep companies teach students to respond in a formulaic way to the format of the questions themselves. Koretz explains why all this has implications for educational equity: “Inappropriate test preparation, like score inflation, is more severe in some places than in others. Teachers of high-achieving students have less reason to indulge in bad preparation for high-stakes tests because the majority of their students will score adequately without it—in particular, above the ‘proficient’ cut score that counts for accountability purposes. So one would expect that test preparation would be a more severe problem in schools serving high concentrations of disadvantaged students…. Once again, disadvantaged kids are getting the short end of the stick.”
Two chapters in this middle section explore the ways No Child Left Behind’s test-and-punish scheme has undermined equitable access to education in the schools in areas of concentrated poverty across our cities. The law that promised to leave no child behind not only encouraged test prep and cheating in the schools whose needs were greatest, but it also set impossibly tough and largely arbitrary test score targets for those schools and an impossibly short timeline for bringing students up to those targets. And then the federal government set out to punish the schools and the teachers unable to meet the targets.
- Making Up Unrealistic Targets: In this chapter, Koretz explains how No Child Left Behind’s standardized cut scores and timelines were set unrealistically and arbitrarily; the consequence was to label schools in poor areas as “failing” and to subject schools in areas of concentrated poverty to a series of punishments. Here is Koretz’s short summary: “Part of the blame for this failure lies with the crude and unrealistic methods used to confront inequity. In a nutshell, the core of the approach has been simply to set an arbitrary performance target (the ‘Proficient’ standard) and declare that all schools must make all students reach it in an equally arbitrary amount of time. No one checked to make sure the targets were practical. The myriad factors that cause some students to do poorly in school—both the weaknesses of many of the schools they attend and the disadvantages some students bring to school—were given remarkably little attention. Somehow teachers would just pull this off… The trust most people have in performance standards is essential, because the entire educational system now revolves around them. The percentage of kids who reach the standard is the key number determining which teachers and schools will be rewarded or punished… But in fact, despite all the care that goes into creating them, these standards are anything but solid. They are arbitrary, and the ‘percent proficient’ is a very slippery number… A primary motivation for setting a Proficient standard is to prod schools to improve, but information about how quickly teachers actually can improve student learning doesn’t play much, if any, of a role in setting performance standards… However, setting the standards themselves is just the beginning. What gives the performance standards real bite is their translation into concrete targets for educators, which depends on more than the rigor of the standard itself… We have to say how quickly performance has to increase—not only overall but for different types of kids and schools. A less obvious but equally important question is how much variation in performance is acceptable.”
- Evaluating Teachers: In 2009, beginning with Race to the Top and later as a condition for states to qualify for waivers from the worst consequences of No Child Left Behind, Arne Duncan’s Department of Education required states to change their laws to tie a percentage of teachers’ formal evaluations to students’ test scores. Myriad problems ensued. First of all, the required tests are in reading and math. What about the other teachers? Koretz describes Florida and Tennessee, which judged teachers in non-tested grades and subjects by the scores of students who were not in their classes, and in one case not in their schools. Other states added tests in music, art, and physical education—subjecting students to added standardized testing—just for the purpose of state teacher evaluations. Koretz explains the problems with Value-Added Modeling to evaluate teachers; many factors affecting students’ scores cannot be traced to any teacher and any teacher’s ratings seem to be unstable over several years.
I cannot imagine exactly how our society can recover from the our terrible test-and-punish misadventure and our labeling as “failing” the institutions and teachers who serve our poorest children. What is heartening about The Testing Charade: Pretending to Make Schools Better is the clarity with which Daniel Koretz presents our current dilemma: “We now know what many educators did. Faced with unrealistic targets, some cut corners or simply cheated. And perhaps because the system, in its zeal to address inequities, made the targets most unrealistic for educators serving disadvantaged kids, those kids—ironically—got the worst of it: the most test prep, the most score inflation, and apparently the most cheating. And yet inflated scores allowed policy makers to declare victory, and the public received a steady diet of encouraging but bogus news about rapid improvements in the achievement gap…. On balance… the reforms have been a failure.”
Please read The Testing Charade. We all need to understand and be able to explain how we’ve gone so far astray.