Rick Hess’s Mistake: Failure of Test-and-Punish Is Not Limited to a Few Districts That Have Disappointed

Frederick M. Hess, the director of education policy studies at the American Enterprise Institute, has always been a corporate education reform kind of guy. That is why Hess’s honest analysis this week of the ultimate fraud of a succession of school district miracles—Washington, D.C.’s test score and graduation rate miracle under Michelle Rhee and those who followed her, Alonzo Crim’s Atlanta in the 1980s, Houston’s Texas Miracle under Rod Paige, Arne Duncan’s Chicago, and Beverly Hall’s Atlanta—is so refreshingly candid.

In all of these cases, as Hess points out, there was “a remarkable dearth of attention paid to ensuring that the metrics (were) actually valid and reliable.”  Second, it was “tempting for civic leaders and national advocates to accept happy success stories at face value—especially when they (were) fronted by a charismatic superintendent.” And finally “reformers and reporters (made) things worse with their lust for ‘celebrity superintendents’ and ‘model systems.’ Their fascination nurtur(ed) an echo chamber in which a handful of leaders (got) exalted, often for too-good-to-be-true results.”

One must give Hess credit for honestly admitting the failure of so much of what his own kind of school reformers have been exalting for the past quarter century—business school accountability for schools, driven by universal standardized testing, and evaluated by two primary outcomes—standardized test scores and graduation rates. But Hess makes a mistake when he attributes the problem to a few “model” school districts that have disappointed.

Hess’s explanation is inadequate.  Inadequate because the system itself—the whole idea of school reform based on high stakes testing—cannot work.  Daniel Koretz, the Harvard specialist on testing, tells us why in a recent book: The Testing Charade: Pretending to Make Schools Better.

Koretz defines the problem with high-stakes-test-based school accountability by exploring a primary principle of social science research. Forty years ago, Don Campbell, “one of the founders of the science of program evaluation,” articulated a core principle now known as “Campbell’s Law”: “The more any quantitative social indicator is used for social decision making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.” (p. 38)

How does Campbell’s Law describe the dilemma Frederick Hess identifies?  Koretz quotes Don Campbell himself describing the distortion that will follow when high stakes consequences are attached to a school district’s capacity to raise its aggregate test scores: “Achievement tests may well be valuable indicators of… achievement under conditions of normal teaching aimed at general competence.  But when test scores become the goal of the teaching process, they both lose their value as indicators of educational status and distort the educational process in undesirable ways.” (p. 39)

In The Testing Charade, Koretz provides extensive evidence about all the ways high stakes tied to test scores have triggered Campbell’s Law—to invalidate the test results themselves and to undermine our education system and the experiences of teachers and students trapped by No Child Left Behind and the Every Student Succeeds Act in a scheme to raise test scores at all costs.

One consequence is score inflation: “All that is required for scores to become inflated is that the sampling used to create a test has to be predictable… For inflation to occur, teachers or students need to capitalize on this predictability, focusing on the specifics of the test at the expense of the larger domain.” (p. 62)  We read about all the ways curriculum designers and teachers are incentivized to focus their classes on the specific elements of any particular academic discipline that have appeared on previous tests.

A second consequence, related to the first, is flat-out test-prep. Test prep narrows what is taught to students to the material that is tested and drills students about using clues in the test itself to come up with the right answers. Koretz identifies three kinds of bad test prep. Reallocation between subjects has been common when schools emphasize No Child Left Behind’s tested subjects—reading and math—and cut back on social studies, the arts, music and recess. Reallocation within subjects is when schools study past years’ versions of the state tests and ask teachers to focus on particular aspects of a subject.  Finally there is coaching. Schools and test-prep companies teach students to respond in a formulaic way to the format of the questions themselves. Koretz explains why all this has implications for educational equity: “Inappropriate test preparation, like score inflation, is more severe in some places than in others. Teachers of high-achieving students have less reason to indulge in bad preparation for high-stakes tests because the majority of their students will score adequately without it—in particular, above the ‘proficient’ cut score that counts for accountability purposes. So one would expect that test preparation would be a more severe problem in schools serving high concentrations of disadvantaged students…. Once again, disadvantaged kids are getting the short end of the stick.” (pp. 116-117)

And a third consequence, demonstrated in every one of Frederick Hess’s examples is cheating. Koretz examines the biggest cheating scandals, notably Atlanta, Philadelphia, and Washington, DC.  He notes: “Cheating—by teachers and administrators, not by students—is one of the simplest ways to inflate scores, and if you aren’t caught, it’s the most dependable.” Sometimes teachers or administrators erase and change students answers; sometimes they provide teachers or students with the test items in advance; other times teachers give students the answer during the test.  And finally sometimes schools “scrub” off the enrollment rolls the students who are likely to fail.

Koretz presents the questions around cheating by educators as morally fraught. After all, test scores are not simply a proxy for the quality of a school or a school district:  “One aspect of the great inequity of the American educational system is that disadvantaged kids tend to be clustered in the same schools. The causes are complex, but the result is simple: some schools have far lower average scores—and, particularly important in this system, more kids who aren’t ‘proficient’—than others. Therefore, if one requires that all students must hit the proficient target by a certain date, these low-scoring schools will face far more demanding targets for gains than other schools do. This was not an accidental byproduct of the notion that ‘all children can learn to a high level.’ It was a deliberate and prominent part of many of the test-based accountability reforms… Unfortunately… it seems that no one asked for evidence that these ambitious targets for gains were realistic. The specific targets were often an automatic consequence of where the Proficient standard was placed and the length of time schools were given to bring all students to that standard, which are both arbitrary.” (pp. 129-130)

In a system that, by its very structure, is guaranteed to trigger Campbell’s Law, Koretz wonders about the moral implications of cheating: “Just who is responsible?  Is it just the people who actually carry out the fraud or require it?  Or are those who create the pressures to cheat also culpable, even if not criminally?” (p. 91)

Like Frederick Hess, Daniel Koretz recognizes that although outcomes-based, test-and-punish school accountability has been hyped and celebrated, ultimately this kind of school policy has not improved schools as promised.  Koretz digs deeper, however, to expose that the system itself—not merely its abuse by particular educators in particular school districts—is deeply flawed.

Koretz concludes: “It is no exaggeration to say that the costs of test-based accountability have been huge. Instruction has been corrupted on a broad scale. Large amounts of instructional time are now siphoned off into test-prep activities that at best waste time and at worst defraud students and their parents.  Cheating has become widespread. The public has been deceived into thinking that achievement has dramatically improved and that achievement gaps have narrowed. Many students are subjected to severe stress… The primary benefit we received in return for all of this was substantial gains in elementary-school math that don’t persist until graduation… On balance, then, the reforms have been a failure.” (pp. 191-192)

Advertisements

Daniel Koretz: More Detail from “The Testing Charade” on Cheating Scandal in Atlanta

Back in 2015, I watched when part of the trial of the Atlanta school teachers—accused of erasing and correcting their students’ test scores—was televised on C-Span (see here and here). And two weeks ago I read Daniel Koretz’s new book, The Testing Charade, a book about what happens when high stakes punishments are attached to any social indicator. I read Koretz’s book pretty much without emotion or judgment—as an academic exercise to understand his argument against the high stakes that policy makers have used as a threat to drive teachers to work harder and raise test scores faster. I didn’t focus on the sections about the cheating scandals.  After all, I imagined, the scandals have just become a part of history.

Then on Wednesday evening, I watched Lisa Stark’s report for the PBS NewsHour about the 9 Atlanta school teachers and principals who are appealing their criminal convictions to clear their names and avoid stints in prison for participating in what is said to have been a 44-school cheating scandal driven by Superintendent Beverly Hall, who won awards when test scores rose miraculously quickly in Atlanta’s schools. Hall died before her own involvement could be adjudicated.

Daniel Koretz, the Harvard professor whose new book explores the Atlanta cheating scandal (among cheating scandals in Washington, D.C, Pennsylvania and many other places) as among the widespread consequences of our test-and-punish regime of school reform, spoke briefly in Lisa Stark’s report. In his book he attributes the problem to what social scientists call Campbell’s Law. Here is Koretz’s definition: “The more any quantitative social indicator is used for social decision making the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.” (p. 38)

Koretz explores the issue far more deeply in his new book than he did in Wednesday night’s short clip for the NewsHour. My feeling two years ago that the Atlanta educators’ criminal convictions were unfair and what, as I watched the PBS report, I recognized as my feeling of relief two weeks ago when I read Koretz’s book—that an expert scholar confirmed my own sense of injustice in Atlanta—sent me back again yesterday to Koretz’s book.  Here is some of what he didn’t have time to say in Wednesday’s report for PBS.

“One aspect of the great inequity of the American educational system is that disadvantaged kids tend to be clustered in the same schools. The causes are complex, but the result is simple: some schools have far lower average scores—and, particularly important in this system, more kids who aren’t ‘proficient’—than others. Therefore, if one requires that all students must hit the proficient target by a certain date, these low-scoring schools will face far more demanding targets for gains than other schools do. This was not an accidental byproduct of the notion that ‘all children can learn to a high level.’ It was a deliberate and prominent part of many of the test-based accountability reforms… Unfortunately… it seems that no one asked for evidence that these ambitious targets for gains were realistic. The specific targets were often an automatic consequence of where the Proficient standard was placed and the length of time schools were given to bring all students to that standard, which are both arbitrary.” (pp. 129-130) Koretz continues: “(T)his decision backfired. The result was, in many cases, unrealistic expectations that teachers simply couldn’t meet by any legitimate means.” (p. 134)

In Atlanta, Koretz describes the situation at Parks Middle School, as it was portrayed by Rachel Aviv in a New Yorker profile of the Atlanta cheating scandal.  Koretz explains: “This is the school where Damany Lewis and Christopher Waller worked. Aviv documented the way in which Waller choreographed an increasingly large and well-organized cheating ring… Why did Lewis and others do this  At least in Lewis’s case, it was not because he was comfortable cheating. Quite the contrary…  Then why? In a nutshell, because their only other choice was to fail—not when compared with reasonable goals but when held to Hall’s and NCLB’s entirely arbitrary targets. Parks is located in a terribly depressed neighborhood. Half the homes are vacant. Students call the neighborhood ‘Jack City’ because of all the armed robberies. Very few of the students come from homes with two parents. Aviv reported that some students came to school in filthy clothing and that Lewis told students to drop dirty laundry in the back of his truck so that he could wash clothes for them. Some of the parents were dysfunctional because of drug use. During the years leading up to the cheating scandal, Parks had made real progress. A new principal renovated the school and worked on both refocusing students on academics and building a sense of community. Using funds that Hall’s administration had obtained, the school implemented after-school and tutoring programs. However, this simply wasn’t enough, given how fast scores had to rise to meet Hall’s demands. Lewis told Aviv that he had pushed his students harder than they had ever been pushed and that he was ‘not willing to let the state slap them in the face and say they’re failures.'” (pp. 77-78)

Besides leaving 9 Atlanta teachers and principals with criminal convictions, what has been the ultimate outcome of all this test-and-punish for society as a whole including our children? “It’s no exaggeration to say that the costs of test-based accountability have been huge. Instruction has been corrupted on a broad scale. Large amounts of instructional time are now siphoned off into test-prep activities that at best waste time and at worst defraud students and their parents. Cheating has become widespread. The public has been deceived into thinking that achievement has dramatically improved and that achievement gaps have narrowed. Many students are subjected to severe stress… Educators have been evaluated in misleading and in some cases utterly absurd ways. Careers have been disrupted and in some cases ended. Educators including prominent administrators, have been indicted and even imprisoned. The primary benefit we received in return for all of this was substantial gains in elementary-school math that don’t persist until graduation.” (p. 191)

Koretz concludes: “Reformers may take umbrage and say that they certainly didn’t demand that teachers cheat. They didn’t, although in fact many policy makers actively encouraged bad test prep that produced fraudulent gins. What they did demand was unrelenting and often very large gains that many teachers couldn’t produce through better instruction, and they left them with inadequate supports as they struggled to meet these often unrealistic targets. They gave many educators the choice… fail, cut corners, or cheat—and many chose not to fail.” (p. 244)

Demanding that Educators Accomplish the Impossible Leads to Atlanta Convictions Under RICO

On Wednesday afternoon 11 convictions were handed down in the Atlanta schools test-cheating scandal.  On the one hand, this is a local issue—merely one city’s tragedy with the local response to the federal testing law, No Child Left Behind, and all the law’s demands for something radical to happen right now.  But this one local consequence is emblematic of the way things can go when laws pressure real people to implement policies that have nothing whatever to do with reality.

The problem is much larger than Atlanta. The Guardian reports, “documented cheating in at least 40 states, since the APS cheating scandal came to light.” In March of 2011, USA Today uncovered what seems to have been an obvious cheating scandal in Washington, D.C.  Michelle Rhee, the Washington, D.C. superintendent at that time somehow managed to be sure the allegations were never investigated, despite efforts by that newspaper and an effort of several years by PBS journalist John Merrow to uncover e-mails that would have exposed Rhee’s and the district’s wrong doing.

As a society we haven’t spoken forcefully enough to stop the process when we’ve been told that educators can, in a year or two, magically turn around the school achievement of all children in a class or a grade level or even a whole school or school district. Atlanta’s school superintendent, Dr. Beverly Hall promised she could do that and then set out to prove it.

“Turnaround” is the code word for what we have been demanding of public schools for over a decade now.  Turn around a school even if “turnaround” is defined as firing all the teachers or just closing the school.  And by a federal law in 2002 we demanded that all schools raise all students’ test scores to the level of “proficiency” by 2014.  This is, of course, a matter of “just pretend.”  It’s never been done and can’t be done anywhere but Lake Woebegon, the fictional hamlet that uses the dialect dubbed “Minnesota Nice” to proclaim that all its children are above average.  Statistically there are always means and medians and modes; people range in their abilities and each one has special talents and weaknesses.  But school policy in America has been blindly denying reality.

I cannot speak to what should have been the court’s judgment on the eleven former Atlanta school employees convicted of erasing wrong answers and changing what students had filled in on their test answer sheets.  According to the NY Times, five were teachers, one was a principal, and five were the administrators under Superintendent Beverly Hall who launched and implemented what the court said was a criminal conspiracy.  Corey Mitchell for Education Week describes the charges: “Because bonuses and raises were awarded to the educators based on the test scores, prosecutors charged the educators with violating the state’s RICO (Racketeer Influenced and Corrupt Organizations) Act by engaging in a massive criminal conspiracy.  It’s a criminal statute that law enforcement typically uses to prosecute those with ties to organized crime.”

Superintendent Beverly Hall pressured the district’s administrators and teachers with incentives, punishments, and shame.  Hall was charged in the indictment, but she was never tried due to a diagnosis and treatment for breast cancer.  She died last month.  Valerie Strauss in the Washington Post quotes from the indictment: “While Superintendent of APS (Atlanta Public Schools), Beverly Hall set annual performance objectives for APS and the individual schools within it, commonly referred to as ‘targets.’  If a school achieved 70% or more of its targets, all employees of the school received a bonus.  Additionally, if certain system-wide targets were achieved, Beverly Hall herself received a substantial bonus…  APS principals and teachers were frequently told by Beverly Hall and her subordinates that excuses for not meeting targets would not be tolerated.  When principals and teachers could not reach their targets, their performance was criticized, their jobs were threatened and some were terminated.  Over time, the unreasonable pressure to meet annual APS targets led some employees to cheat…. The refusal of Beverly Hall and her top administrators to accept anything other than satisfying targets created an environment where achieving the desired end result was more important than the students’ education.”

Jay Bookman, a columnist for the Atlanta Journal Constitution describes Beverly Hall and the school climate she created in which widespread cheating emerged: “Personally, I still have a hard time shaking the memory of one-on-one conversations with Hall in which she obstinately, repeatedly refused to concede that anything had gone awry with the system’s testing system. As I wrote back then, her denials were downright stunning and in hindsight even Nixonian. By sheer force of will, she had created a world in which her distorted version of events was the only one that mattered, and all who lived and worked within that world were forced to abide by its strange rules.”

According to the NY Times,  in Atlanta, “Nearly 180 employees, including 38 principals, were accused of wrongdoing as part of an effort to inflate test scores and misrepresent the achievement of Atlanta’s students and schools.” Many of these people reached plea agreements and were granted probation and community service.  In addition to Dr. Hall, another of those charged died in the years since the scandal was exposed in 2011.  Twelve stood trial.  One teacher was acquitted on Wednesday.

Were the people involved in Atlanta’s cheating scandal just morally weak?  Were they trying to avoid being shamed? Were they desperate to protect their jobs and the income that fed their children? I am sure different people cheated for different reasons, and we can agree that teachers and school leaders shouldn’t cheat.  A larger concern, however, is that a federal education policy based on test-and-punish asks teachers to accomplish the impossible and then shames and punishes and even fires them when they can’t do it.  Bob Scheaffer, writing for the National Center for Fair and Open Testing, explains that we ought to be blaming a system that demands what human beings cannot possibly accomplish: “Across the U.S., strategies that boost scores without improving learning—including outright cheating, narrow teaching to the test and pushing out low-scoring students—are widespread.  These corrupt practices are inevitable consequences of the politically mandated overuse and misuse of high-stakes exams.”

Something has gone haywire in our nation’s education policy.  What happened in Atlanta this week should cause us all to stop and pay attention.

The Evolution of Denial in Atlanta Test-Score Cheating Scandal

Rachel Aviv’s extraordinary New Yorker magazine essay, Wrong Answer, traces the evolution of the Atlanta Public Schools standardized test cheating scandal.  Aviv describes how school administrators, driven by the Adequate Yearly Progress requirements of the No Child Left Behind Act, wielded pressure and shame over several years to recalibrate the moral compass of one middle school’s most dedicated teacher.

A researcher at the American Mathematical Society tells Aviv about Campbell’s Law, “a principle that describes the risks of using a single indicator to measure complex social phenomena: the greater the value placed on a quantitative measure, like test scores, the more likely it is that the people using it and the process it measures will be corrupted.”  The principle sounds abstract, but what happened to the teachers at Parks Middle School was anything but dry.  Little by little, year after year, more and more teachers got involved; administrators condoned the cheating or looked the other way; and everybody celebrated the miraculous scores despite that they were utterly improbable.

Despite that Atlanta’s superintendent Beverly Hall offered cash awards to the staff at schools where scores continued to rise, Aviv’s story is not about the power of prizes and money.  Damany Lewis, a middle school math teacher and the protagonist of Aviv’s story, wants desperately to keep his job precisely because he is so dedicated to serving students whose poverty is so severe that he collects their clothes to wash when they have no other options.  He finds himself coaching a host of athletic teams along with the chess club as he devotes his life to trying to help his students surmount the obstacles in their lives.

Our federal testing law, No Child Left Behind (NCLB), dominates the lives of Atlanta’s educators, from teacher to principal to school administrator.  At the time of this story—from 2006-2010, before the federal government began offering NCLB waivers from the law’s onerous requirement of utopian, ever growing test scores—in order to make what was called Adequate Yearly Progress, schools had to raise their cumulative test scores higher and higher every year.  NCLB is a trap for the middle school where Lewis teaches in two primary ways.  His students arrive at his math classes unable to do the work their elementary school test scores say they are capable of; after all, the elementary schools have already been cheating on test scores well before Aviv’s students arrive in his middle school classes.  And then, once the middle school teachers succumb to temptation and begin erasing and correcting test score answers, they are ensnared into continuing the practice into the future.  There is no going back because the school must produce even higher scores the next year and the next.

Aviv describes the teachers’ dilemma as Lewis understands it: “At happy-hour drinks, he and other teachers complained that the legislators who wrote No Child Left Behind must never have been near a school like Parks.  He felt as if he and his colleagues were part of a nationwide ‘biological experiment’ in which the variables—the fact that so many children were hungry and transient and witnessing violence—hadn’t been controlled.”

Aviv sums up by explaining what she was told in an interview by noted educational researcher David Berliner, that NCLB expected teachers to compensate for factors outside their control: “The people who say poverty is no excuse for low performance are now using teacher accountability as an excuse for doing nothing about poverty.”

I urge you to read Aviv’s powerful article, for it is not only about the culture of denial across Atlanta’s public schools during the reign of Superintendent Beverly Hall.  It is also about our broader cultural blindness to poverty and denial of its consequences for children in an increasingly unequal society.