Faith in High Stakes Testing Fades, Even Among the Corporate School Reformers

After a recent twenty-fifth anniversary conference at the Center on Reinventing Public Education at the University of Washington, Bothell—a Gates funded education-reformer think tank, Chalkbeat‘s Matt Barnum summarized presentations by a number of speakers who demonstrate growing skepticism about the high-stakes, standardized testing regime that has dominated American public education for over a quarter of a century.

Because the Center on Reinventing Public Education is known as an advocate for portfolio school reform and corporate accountability, you might expect adherence to the dogma of test-and-punish, but, notes Barnum:  “The pervasiveness of the complaints about testing was striking, given that many education reform advocates have long championed using test scores to measure schools and teachers and then to push them to improve.”

Then at a Massachusetts Institute of Technology School Access and Quality Summit early this month, Paymon Rouhanifard presented a major policy address challenging the use of high stakes testing to rank and rate public schools.  Rouhanifard was until very recently Chris Christy’s appointed, school-reformer superintendent in Camden, New Jersey.  Formerly he was the director in New York City of Joel Klein’s Office of Portfolio Management.  Rouhanifard describes the belief system he brought with him to Camden and describes how his five-year tenure as Camden’s superintendent transformed his thinking: “Our belief was that politics and bureaucracy had inhibited the progress Camden students and families deserved to overcome the steep challenges the city was facing…  We believed it was important for the district to segue out of being a highly political monopoly operator of schools….  This is a story about an evolution of my own thinking during that five-year experience…. What I’m referring to are the math and literacy student achievement data we utilize to drive so many of the critical decisions we make… My realization a few years ago was that I rarely asked questions about what these tests actually told us.  What they didn’t tell us.  And perhaps most importantly, what were the specific behaviors they incentivized, and what were the general trade-offs when we acutely focus on how students do on state tests.”

In 2013, at the beginning of his tenure, Rouhanifard introduced a school report card that rated each school primarily by students’ standardized test scores. Two years ago Rouhanifard eliminated his own school report cards.  He describes his realization: “We are spending an inordinate amount of time on formative and interim assessments and test prep, because those are the behaviors we have incentivized.  We are deprioritizing the sciences, the arts, and civic education…. I… believe the drawbacks currently outweigh the benefits.  That we haven’t been honest about the trade-offs.”

Shael Polakow-Suransky, like Rouhanifard, held a position in Joel Klein’s “reformer” school administration in New York City.  Now the president of Bank Street College of Education, he was formerly Klein’s former deputy schools chancellor. Barnum explains that Polakow-Suransky has become an emphatic critic of the nation’s high-stakes standardized testing regime: “The biggest barrier to student learning and closing the achievement gap is the current system of standardized tests.”

In a piece at The74, the  Thomas Fordham Institute’s Robert Pondiscio quotes Polakow-Suransky: “All of us were well-intentioned in pushing this agenda, but the tools we developed were not effective in raising the bar on a wide scale.”

While the Thomas Fordham Institute has endorsed corporate school reform including high-stakes, test-based accountability, Fordham’s Pondiscio now acknowledges that under the Every Student Succeeds Act, U.S. public schools have become mired in an education culture defined by test-based accountability.  Though he seems unclear on the way forward, Pondiscio now advocates for serious reconsideration: “The challenge is not testing vs. not testing.  It’s not accountability vs. none.  Both bring benefits of different kinds, and both are required by a federal law that’s not going to change anytime soon.  The challenge is to develop a policy vision that supports—not thwarts—the classroom practices and long-term student outcomes we seek… The problem is the reductive culture of testing, which has come to shape and define American education, particularly in the kinds of schools attended by our most disadvantaged children.”

There are some who remain faithful to the school reformer dogma. The Center on Reinventing Public Education’s Robin Lake tries to change the subject: “We need a more productive debate about school accountability, not tired arguments over testing.” And Matt Barnum quotes Sandy Kress—still a tried-and-true believer in the No Child Left Behind regime he helped create: “Research shows clearly that accountability made a real difference in this country in narrowing the achievement gap and lifting student achievement.”

Of course, research does not clearly show that Sandy Kress’s kind of No Child Left Behind accountability made a real difference.  Here is Harvard’s Daniel Koretz, in the authoritative book he published a year ago, The Testing Charade: Pretending to Make Schools Better.  It is perhaps this volume by an academic expert on testing that has helped change the minds of some of the corporate school reformers quoted above.  Koretz writes: “It is no exaggeration to say that the costs of test-based accountability have been huge.  Instruction has been corrupted on a broad scale.  Large amounts of instructional time are now siphoned off into test-prep activities that at best waste time and at worst defraud students and their parents.  Cheating has become widespread.  The public has been deceived into thinking that achievement has dramatically improved and that achievement gaps have narrowed.  Many students are subjected to severe stress, not only during testing but also for long periods leading up to it.  Educators have been evaluated in misleading and in some cases utterly absurd ways  Careers have been disrupted and in some cases ended.  Educators, including prominent administrators, have been indicted and even imprisoned.  The primary benefit we received in return for all of this was substantial gains in elementary-school math that don’t persist until graduation.  This is true despite the many variants of test-based accountability the reformers have tried, and there is nothing on the horizon now that suggests that the net effects will be better in the future. On balance, then, the reforms have been a failure.” (The Testing Charade, pp. 191-192)

Introducing readers to Don Campbell, “one of the founders of the science of program evaluation,” Koretz defines the problems inherent in our society’s quarter century of high-stakes, test-and-punish school accountability by quoting Campbell’s Law:  “The more any quantitative social indicator is used for social decision making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intend to monitor.”  Campbell directly addresses the problem of high stakes testing to rank and rate schools:  “Achievement tests may well be valuable indicators of … achievement under conditions of normal teaching aimed at general competence. But when test scores become the goal of the teaching process, they both lose their value as indicators of educational status and distort the educational process in undesirable ways.” (The Testing Charade, pp. 38-39)

How has the testing regime operated perversely to undermine the schools serving our society’s most vulnerable children—the ones we were told No Child Left Behind would catch up academically if only we created incentives and punishments to motivate their teachers to work harder?  “One aspect of the great inequity of the American educational system is that disadvantaged kids tend to be clustered in the same schools.  The causes are complex, but the result is simple: some schools have far lower average scores—and, particularly important in this system, more kids who aren’t ‘proficient’—than others.  Therefore, if one requires that all students must hit the proficient target by a certain date, these low-scoring schools will face far more demanding targets for gains than other schools do.  This was not an accidental byproduct of the notion that ‘all children can learn to a high level.’  It was a deliberate and prominent part of may of the test-based accountability reforms… Unfortunately… it seems that no one asked for evidence that these ambitious targets for gains were realistic  The specific targets were often an automatic consequence of where the proficient standard was placed and the length of time schools were given to bring all students to that standard, which are both arbitrary.”  (The Testing Charade, pp. 129-130)

Besides imposing unreasonable and damaging punishments on the schools and teachers serving our society’s poorest children, Koretz believes our commitment to a regime of punitive testing has distracted our society from developing the commitment to address the real needs of children and schools in places where poverty is concentrated: “We can undoubtedly reduce variations in performance appreciably, if we summoned the political will and committed the resources to do so—which would require a lot more than simply imposing requirements that educators reach arbitrary targets for test scores.” The Testing Charade, p. 131)

Advertisements

Repeating My Recommendation: Please Read Daniel Koretz’s Book, “The Testing Charade”

How has high stakes testing ruined our schools and how has this strategy, which was at the heart of No Child Left Behind, made it much more difficult to accomplish No Child Left Behind’s stated goal of reducing educational inequality and closing achievement gaps?

Here is how Daniel Koretz begins to answer that question in his 2017 book, The Testing Charade: Pretending to Make Schools Better: In 2002, No Child Left Behind “mandated that all states use the proficient standard as a target and that 100 percent of students reach that level. It imposed a short timeline for this: twelve years. It required that schools report the performance of several disadvantaged groups and it mandated that 100 percent of each of these groups had to reach the proficient standard. It required that almost all students be tested the same way and evaluated against the same performance standards.  And it replaced the straight-line approach by uniform statewide targets for percent proficient, called Adequate Yearly Progress (AYP)…. The law mandated an escalating series of sanctions for schools that failed to make AYP for each reporting group.” Later, “Arne Duncan used his control over funding to increase even further the pressure to raise scores.  The most important of Duncan’s changes was inducing states to tie the evaluation of individual teachers, rather than just schools, to test scores… The reforms caused much more harm than good. Ironically, in some ways they inflicted the most harm on precisely the disadvantaged students the policies were intended to help.”

Koretz poses the following question and his book sets out to answer it: “But why did the reforms fail so badly?”

I recommend Daniel Koretz’s book all the time as essential reading for anyone trying to figure out how we got to the deplorable morass that is today’s federal and state educational policy.  I wish I thought more people were reading this book. Maybe people are intimidated that its author is a Harvard expert on the design and use of standardized tests.  Maybe it’s the fact that the book was published by the University of Chicago Press. But I don’t see it in very many bookstores, and when I ask people if they have read it, most people tell me they intend to read it. To reassure myself that it is really worth reading, I set myself the task this past weekend of re-reading the entire book. And I found re-reading it to be extremely worthwhile.

The book divides into three parts—an introductory section of several chapters—six or seven chapters in the middle that dissect the way high stakes testing has undermined education and damaged the education of our nation’s poorest children—and some wrap-up chapters. It is the middle part that is essential. While Koretz has some ideas near the end about where we go from here, his analysis of the damage caused is the crucial part. After all, this section at the heart of the book addresses the conversational dilemma many readers of this blog must face as often as I do. What can you say to the person who doggedly tells you that a particular school is a fine school because its scores are high and another school is a failure because its test scores are so low? This person, often well-intentioned, has lived with test-based school accountability for so long that he cannot imagine there is any other way to consider school quality. And anyway, he says, standardized testing is what we have to evaluate schools, so it’s what we need to use.

Koretz explains a 40-year-old social science rule first articulated by Don Campbell, who Koretz identifies as “one of the founders of the science of program evaluation.” Here is how Campbell stated what we now call “Campbell’s Law”: “The more any quantitative social indicator is used for social decision making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.” The rest of the central chapters in Koretz’s book explain precisely how the use of high stakes punishments tied to low test scores has triggered Campbell’s Law. What are the high stakes punishments?  First came the school turnarounds prescribed by No Child Left Behind —firing the principal and half the teachers, closing the school, charterizing the school.  Later Arne Duncan added the evaluation of teachers by students’ test scores—and schemes rewarding teachers whose students scored well and firing the teachers whose students post low scores. Koretz summarizes No Child Left Behind’s test-and punish strategy: “The reformers’ implicit assumption seemed to be that many teachers knew how to teach more effectively but were being withholding, and therefore confronting them with sanctions and rewards would be enough to get them to deliver.”

Three chapters explore how No Child Left Behind’s test-and-punish strategy has distorted schooling itself and has undermined how teachers teach and how students learn.

  • Score Inflation: When the state achievement tests mandated by No Child Left Behind—the ones that would bring negative consequences for schools and teachers—were compared by experts like Koretz himself to another “audit” test such as the National Assessment for Education Progress (NAEP), which has no high stakes consequences, the researchers discovered that while the scores on the state test rose rapidly, NAEP scores remained flat.  Koretz comments: “(I)ncreases in scores are meaningful only if they signal similar increases in mastery of the domain.  If they do generalize to the domain, gains should appear on other tests that sample from the same domain.” He continues: “(A)ll that is required for scores to become inflated is that the sampling used to create a test has to be predictable… For inflation to occur, teachers or students need to capitalize on this predictability, focusing on the specifics of the test at the expense of the larger domain.”  And there are equity concerns here, because score inflation has occurred more often in schools serving poor students: “Ongoing work by my own group has shown… that it is not just the poverty of individual students that predicts the amount of inflation but also the concentration of poor students in a school… (S)chools with a higher proportion of poor students showed greater average inflation.” Teachers under pressure are finding a way to raise test scores without really teaching the students the material they are supposed to be learning.  Some schools have also inflated overall scores by focusing primarily on children right at the pass/fail level and paying less attention to students far behind.
  • Cheating: Koretz examines the big cheating scandals, notably Atlanta, Philadelphia, and Washington, DC.  He notes: “Cheating—by teachers and administrators, not by students—is one of the simplest ways to inflate scores, and if you aren’t caught, it’s the most dependable.” Sometimes teachers or administrators erase and change students answers; sometimes they provide teachers or students with the test items in advance; other times teachers give students the answer during the test.  And finally sometimes schools “scrub” off the enrollment rolls the students who are likely to fail.
  • Test Prep: Test prep narrows what is taught to students to the material that is tested.  Koretz identifies three kinds of bad test prep. Reallocation between subjects has been common when schools emphasize No Child Left Behind’s tested subjects—reading and math—and cut back on social studies, the arts, music and recess. Reallocation within subjects is when schools study past years’ versions of the state tests and ask teachers to focus on particular aspects of a subject.  Finally there is coaching. Schools and test-prep companies teach students to respond in a formulaic way to the format of the questions themselves. Koretz explains why all this has implications for educational equity: “Inappropriate test preparation, like score inflation, is more severe in some places than in others. Teachers of high-achieving students have less reason to indulge in bad preparation for high-stakes tests because the majority of their students will score adequately without it—in particular, above the ‘proficient’ cut score that counts for accountability purposes. So one would expect that test preparation would be a more severe problem in schools serving high concentrations of disadvantaged students…. Once again, disadvantaged kids are getting the short end of the stick.”

Two chapters in this middle section explore the ways No Child Left Behind’s test-and-punish scheme has undermined equitable access to education in the schools in areas of concentrated poverty across our cities. The law that promised to leave no child behind not only encouraged test prep and cheating in the schools whose needs were greatest, but it also set impossibly tough and largely arbitrary test score targets for those schools and an impossibly short timeline for bringing students up to those targets.  And then the federal government set out to punish the schools and the teachers unable to meet the targets.

  • Making Up Unrealistic Targets: In this chapter, Koretz explains how No Child Left Behind’s standardized cut scores and timelines were set unrealistically and arbitrarily; the consequence was to label schools in poor areas as “failing” and to subject schools in areas of concentrated poverty to a series of punishments. Here is Koretz’s short summary: “Part of the blame for this failure lies with the crude and unrealistic methods used to confront inequity.  In a nutshell, the core of the approach has been simply to set an arbitrary performance target (the ‘Proficient’ standard) and declare that all schools must make all students reach it in an equally arbitrary amount of time.  No one checked to make sure the targets were practical.  The myriad factors that cause some students to do poorly in school—both the weaknesses of many of the schools they attend and the disadvantages some students bring to school—were given remarkably little attention. Somehow teachers would just pull this off… The trust most people have in performance standards is essential, because the entire educational system now revolves around them. The percentage of kids who reach the standard is the key number determining which teachers and schools will be rewarded or punished… But in fact, despite all the care that goes into creating them, these standards are anything but solid. They are arbitrary, and the ‘percent proficient’ is a very slippery number… A primary motivation for setting a Proficient standard is to prod schools to improve, but information about how quickly teachers actually can improve student learning doesn’t play much, if any, of a role in setting performance standards… However, setting the standards themselves is just the beginning. What gives the performance standards real bite is their translation into concrete targets for educators, which depends on more than the rigor of the standard itself… We have to say how quickly performance has to increase—not only overall but for different types of kids and schools. A less obvious but equally important question is how much variation in performance is acceptable.”
  • Evaluating Teachers: In 2009, beginning with Race to the Top and later as a condition for states to qualify for waivers from the worst consequences of No Child Left Behind, Arne Duncan’s Department of Education required states to change their laws to tie a percentage of teachers’ formal evaluations to students’ test scores. Myriad problems ensued. First of all, the required tests are in reading and math. What about the other teachers? Koretz describes Florida and Tennessee, which judged teachers in non-tested grades and subjects by the scores of students who were not in their classes, and in one case not in their schools.  Other states added tests in music, art, and physical education—subjecting students to added standardized testing—just for the purpose of state teacher evaluations.  Koretz explains the problems with Value-Added Modeling to evaluate teachers; many factors affecting students’ scores cannot be traced to any teacher and any teacher’s ratings seem to be unstable over several years.

I cannot imagine exactly how our society can recover from the our terrible test-and-punish misadventure and our labeling as “failing” the institutions and teachers who serve our poorest children.  What is heartening about The Testing Charade: Pretending to Make Schools Better is the clarity with which Daniel Koretz presents our current dilemma: “We now know what many educators did.  Faced with unrealistic targets, some cut corners or simply cheated.  And perhaps because the system, in its zeal to address inequities, made the targets most unrealistic for educators serving disadvantaged kids, those kids—ironically—got the worst of it: the most test prep, the most score inflation, and apparently the most cheating.  And yet inflated scores allowed policy makers to declare victory, and the public received a steady diet of encouraging but bogus news about rapid improvements in the achievement gap…. On balance… the reforms have been a failure.”

Please read The Testing Charade.  We all need to understand and be able to explain how we’ve gone so far astray.

Harvard’s Daniel Koretz Indicts High Stakes Testing in “The Testing Charade”

Daniel Koretz’s new book, The Testing Charade: Pretending to Make Schools Better, is a scathing indictment of our society’s test-and-punish school regime, formalized in the 2002 No Child Left Behind Act and continuing in the most recent version of the federal education law, the Every Student Succeeds Act.  Koretz, the testing specialist, is not so critical of standardized testing itself as he is of the high stakes sanctions that Congress attached to the annual tests in No Child Left Behind—punishments that have driven massive pressure on educators that has ruined our public schools:

“Pressure to raise scores on achievement tests dominates American education today. It shapes what is taught and how it is taught.  It influences the problems students are given in math class (often questions from earlier tests), the materials they are given to read, the essays and other work they are required to produce, and often the manner in which teachers grade this work. It determines which educators are rewarded, punished, and even fired. In many cases it determines which students are promoted or graduate. This is the result of decades of ‘education reforms’ that progressively expanded the amount of externally imposed testing and ratcheted up the pressure to raise scores.” (p. 1)

Daniel Koretz’s biography at the Harvard Graduate School of Education describes him as an expert on educational assessment and testing policy, and the book describes in considerable detail just how high stakes punishments for schools and teachers have corrupted the results of the tests themselves, narrowed the curriculum, and degraded teaching.

But my deepest interest in the book is Koretz’s depiction of how the testing that was supposed force teachers and schools to better serve poor children, raise their test scores and close achievement gaps has instead truncated opportunity for the very children it was supposed to help. How has test-and-punish narrowed the curriculum to basic reading and math in the poorest schools, and how has it forced teachers to focus on test-prep and coaching instead of enrichment?  How has test-and-punish forced the closing or charterizing of schools in poor neighborhoods? How has evaluating teachers by their students’ test scores resulted in firing principals and teachers in the poorest schools and exacerbated staff turnover?  And what about the children being held back in third grade due to a test score—even when they may be making real progress in reading and the adolescents denied a high school diploma?

Under current federal law, students and schools are given credit for proficiency only when children reach benchmark proficiency scores. A fourth grader who advances during the school year from a first to a third grade reading level will still fail to achieve the fourth grade cut score. Neither the child nor the teacher will be given credit for the child’s improvement: “One aspect of the great inequity of the American educational system is that disadvantaged kids tend to be clustered in the same schools. The causes are complex, but the result is simple: some schools have far lower average scores…. Therefore, if one requires that all students must hit the proficient target by a certain date, these low-scoring schools will face far more demanding targets for gains than other schools do. This was not an accidental byproduct of the notion that ‘all children can learn to a high level.’ It was a deliberate and prominent part of many of the test-based accountability reforms…. Unfortunately… it seems that no one asked for evidence that these ambitious targets for gains were realistic. The specific targets were often an automatic consequence of where the Proficient standard was placed and the length of time schools were given to bring all students to that standard, which are both arbitrary.” (pp. 129-130)

Reformers decided that, if sufficiently pressured to raise test scores, teachers would be able to do so: “(T)hey acted as if… (schools alone could) largely eliminate variations in student achievement, ignoring the impact of factors that have nothing to do with the behavior of educators—for example, the behavior of parents, students’ health and nutrition, and many characteristics of the communities in which students grow up.” (p. 123-124) Koretz explains at length and in detail the ways that teachers and principals whose jobs are threatened have resorted to raising scores—coaching for the test, drilling on materials likely to be covered, and in some cases where the pressure was greatest, cheating by erasing and correcting scores.

Koretz quotes Linda Darling-Hammond’s characterization of test-and-punish school accountability: “the kick the dog harder model of education reform.” And he explains: “If we are going to make real headway, we are going to have to confront the simple fact that many teachers will need substantial supports if they are going to markedly improve the performance of their students… And the range of services needed is broad. One can’t expect students’ performance in schools to be unaffected by inadequate nutrition, insufficient health care, home environments that have prepared them poorly for school, or violence on the way to school.” (p. 201)  He suggests first that we stop judging all students and schools by benchmark scores. We must “set goals based on students’ growth, not the level of their performance.” (p. 235)

In the Washington Post, Valerie Strauss interviews Koretz about his new book, and she publishes an excerpt.

While I have emphasized the sections in which Koretz shows test-and-punish hurting the schools that serve the poorest and most vulnerable children, Koretz is a testing expert, whose primary interest is how high stakes punishments attached to a regime of universal testing have corrupted the entire operation of public schools: “Reformers may take umbrage and say that they certainly didn’t demand that teachers cheat. They didn’t, although in fact many policy makers actively encouraged bad test prep that produced fraudulent gains. What they did demand was unrelenting and often very large gains that many teachers couldn’t produce through better instruction, and they left them with inadequate supports as they struggled to meet these often unrealistic targets. They gave many educators the choice I wrote about thirty years ago—fail, cut corners, or cheat—and many chose not to fail.” (p.244)

Koretz joins a growing number of critics who indict test-and-punish school accountability. What is significant about this book is the thorough and relentless critique by a testing expert who carefully and sometimes technically dissects the evidence.