U.S. Public Education Is Driven by High-Stakes Testing. Are the Proficiency Cut-Scores Legitimate?

Back in 2005, I worked with members of the National Council of Churches Committee on Public Education and Literacy to develop a short resource, Ten Moral Concerns in the No Child Left Behind Act. While closing achievement gaps seemed an important goal, to us it seemed wrong that—according to an unrelenting year-by-year Adequate Yearly Progress schedule—the law blindly held teachers and schools accountable for raising all children’s test performance to the test score targets set by every state. Children come to school with such a wide range of preparation, and achievement gaps are present when children arrive in Kindergarten.  At that time, we expressed our concern this way:

“Till now the No Child Left Behind Act has neither acknowledged where children start the school year nor celebrated their individual accomplishments. A school where the mean eighth grade math score for any one subgroup grows from a third to a sixth grade level has been labeled a “in need of improvement” (a label of failure) even though the students have made significant progress. The law has not acknowledged that every child is unique and that Adequate Yearly Progress (AYP) thresholds are merely benchmarks set by human beings. Although the Department of Education now permits states to measure student growth, because the technology for tracking individual learning over time is far more complicated than the law’s authors anticipated, too many children will continue to be labeled failures even though they are making strides, and their schools will continue to be labeled failures unless all sub-groups of children are on track to reach reading and math proficiency by 2014.”

Of course today we know that the No Child Left Behind Act was supposed to motivate teachers to work harder to raise scores. Policymakers hoped that if they set the bar really high, teachers would figure out how to get kids over it.  It didn’t work.  No Child Left Behind said that all children would be proficient by 2014 or their school would be labeled failing. Finally as 2014 loomed closer, Arne Duncan had to give states waivers to avoid what was going to happen if the law had been enforced: All American public schools would have been declared “failing.”

Despite the failure of No Child Left Behind,  members of the public, the press, and the politicians across the 50 statehouses who implemented the testing requirements of No Child Left Behind continue to accept the validity of high stakes testing. Politicians, the newspaper reporters and editors who report the scores, and the general public trust the supposed experts who set the cut scores.  That is why states still rank and rate public schools by their test scores and legislators pass laws to punish  low-scoring schools and teachers. That is why on Wednesday this blog commented on Ohio’s plan to expand EdChoice vouchers for students in low-scoring schools and add charters in low-scoring school districts. The list of “failing” schools where students will qualify for vouchers will rise next school year in Ohio from 218 to 475. The list of charter school-eligible districts will grow from 38 to 217.

In response to the continuation of test-and-punish, I’ve been quoting Daniel Koretz’s book, The Testing Charade about the fact that testing cut scores are arbitrary and  punishments unfair:  “One aspect of the great inequity of the American educational system is that disadvantaged kids tend to be clustered in the same schools. The causes are complex, but the result is simple: some schools have far lower average scores—and, particularly important in this system, more kids who aren’t ‘proficient’—than others. Therefore, if one requires that all students must hit the proficient target by a certain date, these low-scoring schools will face far more demanding targets for gains than other schools do…  Unfortunately… it seems that no one asked for evidence that these ambitious targets for gains were realistic. The specific targets were often an automatic consequence of where the proficient standard was placed and the length of time schools were given to bring all students to that standard, which are both arbitrary.”  (The Testing Charade, pp. 129-130)

As a blogger, I am not an expert on how test score targets—the cut scores—are set, but Daniel Koretz devotes an entire chapter of his book, “Making Up Unrealistic Targets,” to this subject.  Here is how he begins:  “If one doesn’t look too closely, reporting what percentage of students are ‘proficient’ seems clear enough. Someone somehow determined what level of achievement we should expect at any given grade—that’s what we will call ‘proficient’—and we’re just counting how many kids have reached that point. This seeming simplicity and clarity is why almost all public discussion of test scores is now cast in terms of the percentage reaching either the proficient standard, or occasionally, another cut score… The trust most people have in performance standards is essential, because the entire educational system now revolves around them. The percentage of kids who reach the standard is the key number determining which teachers and schools will be rewarded or punished.”  (The Testing Charade, p. 120)

After emphasizing that benchmark scores are not scientifically set and are in fact all arbitrary, Koretz examines some of the methods. The “bookmark” method, he explains, “hinges entirely on people’s guesses about how imaginary students would perform on individual test items… (P)anels of judges are given a written definition of what a standard like “proficient” is supposed to mean.”  Koretz quotes from Nebraska’s definition of reading comprehension: “A student scoring at the Meets the Standards level generally utilizes a variety of reading skills and strategies to comprehend and interpret narrative and informational text at grade level.” After enumerating some of the specific skills and strategies listed in Nebraska, Koretz adds a qualification to the way Nebraska describes its methodology: “A short digression: the emphasized word generally is very important. One of the problems in setting standards is that students are inconsistent in their performance.” (The Testing Charade, pp. 121-122) (Emphasis in the original.)

Koretz continues: “There is another, perhaps even more important, reason why performance standards can’t be trusted: there are many different methods one can use, and there is rarely a really persuasive reason to select one over the other. For example, another common approach, the Angoff method… is like the bookmark in requiring panelists to imagine marginally proficient students, but in this approach they are not given the order of difficulty of the items or a response probability. Instead panelists have to guess the percentage of imaginary marginally proficient students who would correctly answer every item in the test. Other methods entail examining and rating actual student work, rather than guessing the performance of imaginary students on individual items.  Yet other methods hinge on predictions of later performance—for example, in college. There are yet others. This wouldn’t matter if these different methods gave you at least roughly similar results, but they often don’t.  The percentage of kids deemed to be ‘proficient’ sometimes varies dramatically from one method to another.  This inconsistency was copiously documented almost thirty years ago, and the news hasn’t gotten any better.” (The Testing Charade, pp.123-124)

Koretz continues his warning: “However, setting the standards themselves is just the beginning. What gives the performance standards real bite is their translation into conrcete targets for educators, which depends on more than the rigor of the standard itself.  We have to say just who has to reach the threshold. We have to say how quickly performance has to increase—not only overall but for different types of kids and schools. A less obvious but equally important question is how much variation in performance is acceptable… A sensible way to set targets would be to look for evidence suggesting how rapidly teachers can raise achievement by legitimate means—that is, by improving instruction, not by using bad test prep, gaming the system, or simply cheating…  However, the targets in our test-based accountability systems have often required unremitting improvements, year after year, many times as large as any large-scale change we have seen.” (The Testing Charade, pp. 125-126)

Koretz concludes: “(I)t is clear that the implicit assumption undergirding the reforms is that we can dramatically reduce the variability of achievement… Unfortunately, all evidence indicates that this optimism is unfounded.  We can undoubtedly reduce variations in performance appreciably if we summoned the political will and committed the resources to do so—which would require a lot more than simply imposing requirements that educators reach arbitrary targets for test scores.” (The Testing Charade, p. 131)

Advertisements

Decades of Academic Research Support Community Schools Strategy in New York City’s Renewal Schools

So-called “corporate” school reform has been defined by setting standards and testing students to see if they have met the standards.  Rewards and punishments follow for the teachers and schools said to have produced these results. The assumption has been that a school is a closed box that can turn around the lives of the enrolled students—all apart from the fact that students spend only six or seven hours of the day at school. Corporate school reformers said they would disrupt the stasis they thought defined bureaucratic public schools by offering rewards and punishments to motivate teachers to work harder and smarter. Many of these so-called education reformers came from the business schools and employed competition as their primary motivator. And the politicians who followed their advice brought us test score targets to be met and a promise quickly to make every child a winner.

We were warned in advance that this wouldn’t work as we planned.  Dr. James Comer at the Yale School Development Program created a multifaceted program to help schools support the most vulnerable children and to engage educators, parents and the community in this process of building trust and strong relationships.  In 1997, in his book Waiting for a Miracle, Comer described the results. While his staff and outside evaluators believed that the Comer schools had made important progress in improving the children’s education, Comer wrote: “Our best approximation suggests that after three years about a third of the schools make significant social and academic improvement, a third show a modest improvement which is often difficult to sustain, and a third show no gain.” (Waiting for a Miracle, p. 72) The Comer program suggested that seven years was a more realistic timeline to look for real school improvement.

One of the most artificial aspects of corporate school reform was the setting of achievement test targets and short timelines as a motivator.  No Child Left Behind established that all American children in public schools would be proficient by 2014 or their schools and teachers would be punished. As we moved closer to 2014, everybody began to realize that making all schools produce high scores wasn’t working.  When it became apparent that almost all American schools would fall behind in raising what was called each student’s Adequate Yearly Progress, Arne Duncan, then Secretary of Education, began issuing No Child Left Behind Waivers to states which would promise to meet his particular school reform priorities in exchange for his willingness not to declare that state’s schools “failing.”

Slowly it began to be admitted that students’ lives outside school affect their test scores, and that schools alone cannot solve the serious challenges resulting from concentrated poverty.  In 2012, Diane Ravitch described achievement gaps as a complex challenge in children’s lives—not merely the result of the quality of a particular school: “Such gaps exist wherever there is inequality, not only in this country, but internationally.  In every country, the students from the most advantaged families have higher test scores on average than students from the least advantaged families.” (Reign of Error, p. 57)

Last year, the Harvard University testing expert, Daniel Koretz described the problems of demanding ever-rising test scores from every school on the same prescribed timeline: “One aspect of the great inequity of the American educational system is that disadvantaged kids tend to be clustered in the same schools. The causes are complex, but the result is simple: some schools have far lower average scores—and, particularly important in this system, more kids who aren’t ‘proficient’—than others. Therefore, if one requires that all students must hit the proficient target by a certain date, these low-scoring schools will face far more demanding targets for gains than other schools do. This was not an accidental byproduct of the notion that ‘all children can learn to a high level.’ It was a deliberate and prominent part of many of the test-based accountability reforms… Unfortunately… it seems that no one asked for evidence that these ambitious targets for gains were realistic. The specific targets were often an automatic consequence of where the proficient standard was placed and the length of time schools were given to bring all students to that standard, which are both arbitrary.”  (The Testing Charade, pp. 129-130)

Here we are in 2019, when many educators have realized that something has to be done at school to address the needs of children living in communities where poverty is concentrated. A broad-based movement to make schools a social service and healthcare center for families and to add preschool and after school and summer programs at school has emerged.  These are called Community Schools. Here is how the Children’s Aid Society in New York City defines a Community School: “The foundations for community schools can be conceptualized as a Developmental Triangle that places children at the center, surrounded by families and communities.  Because students’ educational success, health and well-being are the focus of every community school, the legs of the triangle consist of three interconnected support systems: A strong core instructional program… expanded learning opportunities… and a full range of health, mental health and social services designed to promote children’s well-being and remove barriers to learning.” (Building Community Schools: A Guide for Action, p. 1)

This week the Washington Post‘s Valerie Strauss published a new piece by the National Education Policy Center’s Kevin Welner and Julia Daniel pleading with the New York City Schools not to give up on NYC’s 2014 expansion of Community Schools. When he made Community Schools the centerpiece of his Renewal Program for the city’s struggling schools, Mayor Bill de Blasio suggested he would improve the schools rather than following his predecessor Michael Bloomberg’s strategy of shutting down such schools.

But lately De Blasio is being criticized because the school turnarounds have not been quick enough.  In October, Eliza Shapiro, writing for the NY Times, suggested, “New York knew some schools in its $773 million plan were doomed. They kept children in them anyway.” The New York Schools Chancellor, Richard Carranza, responded by affirming  De Blasio’s original goal: “Four years ago, Mayor Bill de Blasio made a bold—and correct—investment in 94 of New York City’s most underserved schools.  Rather than giving up on these students and schools, the city invested in them… The Renewal graduation rate has climbed from 52 to 66 percent.  Attendance has increased from 84 percent to 89 percent.  Chronic absenteeism has fallen from 47 to 36 percent.  Suspensions have decreased by 54 percent… While we have not yet decided the future of the Renewal initiative, we will never stop investing in the kinds of programs that have allowed us to improve so many schools that would have closed under prior administrations.”

In their new piece, New York City Offers Some Unpleasant Truths about School Improvement, Kevin Welner and Julia Daniel defend Mayor de Blasio’s plan for Community Schools, although they point out that the Renewal School program underestimated the amount of time it takes to build the kind of trust and relationships James Comer wrote about and to address the challenges poverty poses for children: “The Renewal program—which also supports schools in the city’s larger Community Schools Initiative (CSI)—assists schools by increasing supports, training, and resources for students and teachers. The CSI increases family and community engagement and creates collaborative structures and practices…. These approaches—extended learning time, family and community engagement, collaborative leadership, and integrated student supports—are fundamental to community schools models and informed by decades of research showing that out-of-school factors have an overwhelming influence on student outcomes.  In turning to this evidence-based approach, the mayor should be applauded.”

Welner and Daniel recognize that a three year timeline isn’t enough: “Fortunately, with the initial (three-year) results now in, we do see encouraging improvements… Yet as is the case with all major reform efforts, there have also been challenges that must be addressed….  For example, these schools have been hampered by high levels of principal turnover.  Further, a quarter of the initial Renewal schools have been closed for not meeting the program’s ambitious goals.”

The National Education Policy Center’s purpose is to bring the peer-reviewed research of the academy to bear on the policy that shapes public schools.  Welner and Daniel starkly assess the impact of child poverty on school achievement and the optimal ways schools can address these challenges:

“Here, we need to step back and confront an unpleasant truth about school improvement.  A large body of research teaches us that the opportunity gaps that drive achievement gaps are mainly attributable to factors outside our schools: concentrated poverty, discrimination, disinvestment, and racially disparate access to a variety of resources and employment opportunities.

“Research finds that school itself has much less of an impact on student achievement than out-of-school factors such as poverty.  While schools are important—and can certainly be crucial in the lives of some students—policymakers repeatedly overestimate their capacity to overcome the deeply detrimental effects of poverty and racism….

“But students in many of these communities are still rocked by housing insecurity, food insecurity, their parents’ employment insecurity, immigration anxieties, neighborhood violence and safety, and other hassles and dangers that can come with being a low-income person of color in today’s United States.

“We need to acknowledge these two realities—seemingly in tension: (1) that education reforms can be very helpful, if they’re the right ones and if we’re patient and committed; but (2) we as a society are deceiving ourselves if we think we’ll transform educational outcomes without addressing economic inequality.”

Finally, Welner and Daniel recommend that in New York City, “De Blasio should remain committed to the Renewal program—a program based on decades of rigorous research and already showing meaningful benefits for underserved students… When we look across the nation and see other leaders chasing silver bullets, or ignoring educational inequity altogether, we should rejoice that New York and its mayor are engaged in the demanding yet essential work of partnering with communities to address basic needs….”

Faith in High Stakes Testing Fades, Even Among the Corporate School Reformers

After a recent twenty-fifth anniversary conference at the Center on Reinventing Public Education at the University of Washington, Bothell—a Gates funded education-reformer think tank, Chalkbeat‘s Matt Barnum summarized presentations by a number of speakers who demonstrate growing skepticism about the high-stakes, standardized testing regime that has dominated American public education for over a quarter of a century.

Because the Center on Reinventing Public Education is known as an advocate for portfolio school reform and corporate accountability, you might expect adherence to the dogma of test-and-punish, but, notes Barnum:  “The pervasiveness of the complaints about testing was striking, given that many education reform advocates have long championed using test scores to measure schools and teachers and then to push them to improve.”

Then at a Massachusetts Institute of Technology School Access and Quality Summit early this month, Paymon Rouhanifard presented a major policy address challenging the use of high stakes testing to rank and rate public schools.  Rouhanifard was until very recently Chris Christy’s appointed, school-reformer superintendent in Camden, New Jersey.  Formerly he was the director in New York City of Joel Klein’s Office of Portfolio Management.  Rouhanifard describes the belief system he brought with him to Camden and describes how his five-year tenure as Camden’s superintendent transformed his thinking: “Our belief was that politics and bureaucracy had inhibited the progress Camden students and families deserved to overcome the steep challenges the city was facing…  We believed it was important for the district to segue out of being a highly political monopoly operator of schools….  This is a story about an evolution of my own thinking during that five-year experience…. What I’m referring to are the math and literacy student achievement data we utilize to drive so many of the critical decisions we make… My realization a few years ago was that I rarely asked questions about what these tests actually told us.  What they didn’t tell us.  And perhaps most importantly, what were the specific behaviors they incentivized, and what were the general trade-offs when we acutely focus on how students do on state tests.”

In 2013, at the beginning of his tenure, Rouhanifard introduced a school report card that rated each school primarily by students’ standardized test scores. Two years ago Rouhanifard eliminated his own school report cards.  He describes his realization: “We are spending an inordinate amount of time on formative and interim assessments and test prep, because those are the behaviors we have incentivized.  We are deprioritizing the sciences, the arts, and civic education…. I… believe the drawbacks currently outweigh the benefits.  That we haven’t been honest about the trade-offs.”

Shael Polakow-Suransky, like Rouhanifard, held a position in Joel Klein’s “reformer” school administration in New York City.  Now the president of Bank Street College of Education, he was formerly Klein’s former deputy schools chancellor. Barnum explains that Polakow-Suransky has become an emphatic critic of the nation’s high-stakes standardized testing regime: “The biggest barrier to student learning and closing the achievement gap is the current system of standardized tests.”

In a piece at The74, the  Thomas Fordham Institute’s Robert Pondiscio quotes Polakow-Suransky: “All of us were well-intentioned in pushing this agenda, but the tools we developed were not effective in raising the bar on a wide scale.”

While the Thomas Fordham Institute has endorsed corporate school reform including high-stakes, test-based accountability, Fordham’s Pondiscio now acknowledges that under the Every Student Succeeds Act, U.S. public schools have become mired in an education culture defined by test-based accountability.  Though he seems unclear on the way forward, Pondiscio now advocates for serious reconsideration: “The challenge is not testing vs. not testing.  It’s not accountability vs. none.  Both bring benefits of different kinds, and both are required by a federal law that’s not going to change anytime soon.  The challenge is to develop a policy vision that supports—not thwarts—the classroom practices and long-term student outcomes we seek… The problem is the reductive culture of testing, which has come to shape and define American education, particularly in the kinds of schools attended by our most disadvantaged children.”

There are some who remain faithful to the school reformer dogma. The Center on Reinventing Public Education’s Robin Lake tries to change the subject: “We need a more productive debate about school accountability, not tired arguments over testing.” And Matt Barnum quotes Sandy Kress—still a tried-and-true believer in the No Child Left Behind regime he helped create: “Research shows clearly that accountability made a real difference in this country in narrowing the achievement gap and lifting student achievement.”

Of course, research does not clearly show that Sandy Kress’s kind of No Child Left Behind accountability made a real difference.  Here is Harvard’s Daniel Koretz, in the authoritative book he published a year ago, The Testing Charade: Pretending to Make Schools Better.  It is perhaps this volume by an academic expert on testing that has helped change the minds of some of the corporate school reformers quoted above.  Koretz writes: “It is no exaggeration to say that the costs of test-based accountability have been huge.  Instruction has been corrupted on a broad scale.  Large amounts of instructional time are now siphoned off into test-prep activities that at best waste time and at worst defraud students and their parents.  Cheating has become widespread.  The public has been deceived into thinking that achievement has dramatically improved and that achievement gaps have narrowed.  Many students are subjected to severe stress, not only during testing but also for long periods leading up to it.  Educators have been evaluated in misleading and in some cases utterly absurd ways  Careers have been disrupted and in some cases ended.  Educators, including prominent administrators, have been indicted and even imprisoned.  The primary benefit we received in return for all of this was substantial gains in elementary-school math that don’t persist until graduation.  This is true despite the many variants of test-based accountability the reformers have tried, and there is nothing on the horizon now that suggests that the net effects will be better in the future. On balance, then, the reforms have been a failure.” (The Testing Charade, pp. 191-192)

Introducing readers to Don Campbell, “one of the founders of the science of program evaluation,” Koretz defines the problems inherent in our society’s quarter century of high-stakes, test-and-punish school accountability by quoting Campbell’s Law:  “The more any quantitative social indicator is used for social decision making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intend to monitor.”  Campbell directly addresses the problem of high stakes testing to rank and rate schools:  “Achievement tests may well be valuable indicators of … achievement under conditions of normal teaching aimed at general competence. But when test scores become the goal of the teaching process, they both lose their value as indicators of educational status and distort the educational process in undesirable ways.” (The Testing Charade, pp. 38-39)

How has the testing regime operated perversely to undermine the schools serving our society’s most vulnerable children—the ones we were told No Child Left Behind would catch up academically if only we created incentives and punishments to motivate their teachers to work harder?  “One aspect of the great inequity of the American educational system is that disadvantaged kids tend to be clustered in the same schools.  The causes are complex, but the result is simple: some schools have far lower average scores—and, particularly important in this system, more kids who aren’t ‘proficient’—than others.  Therefore, if one requires that all students must hit the proficient target by a certain date, these low-scoring schools will face far more demanding targets for gains than other schools do.  This was not an accidental byproduct of the notion that ‘all children can learn to a high level.’  It was a deliberate and prominent part of may of the test-based accountability reforms… Unfortunately… it seems that no one asked for evidence that these ambitious targets for gains were realistic  The specific targets were often an automatic consequence of where the proficient standard was placed and the length of time schools were given to bring all students to that standard, which are both arbitrary.”  (The Testing Charade, pp. 129-130)

Besides imposing unreasonable and damaging punishments on the schools and teachers serving our society’s poorest children, Koretz believes our commitment to a regime of punitive testing has distracted our society from developing the commitment to address the real needs of children and schools in places where poverty is concentrated: “We can undoubtedly reduce variations in performance appreciably, if we summoned the political will and committed the resources to do so—which would require a lot more than simply imposing requirements that educators reach arbitrary targets for test scores.” The Testing Charade, p. 131)

Schools Serving Very Poor Children Need Financial Assistance. Instead Ohio Beats Them Up.

Ohio operates a test-and-punish accountability scheme that ranks and rates schools and school districts, and punishes school districts whose scores are low.  All the while, the state has diminished its effort to support public education and equalize funding.

In mid-September, for example, the state released school report cards awarding schools and school districts letter grades—“A” through “F.”  Like two other districts recently taken over by the state after receiving a series of “F” grades, East Cleveland will be seized by the state and assigned a state-appointed overseer CEO to replace its school superintendent and an appointed commission to replace the local school board.  East Cleveland—an economically and racially segregated inner-ring Cleveland suburban school district—is among Ohio’s very poorest.  Historically the residents in the community have voted high millage relative to their incomes to pay for their public schools despite the closure of local industry and the collapse of the economy.  The school districts in two other impoverished communities, Youngstown and Lorain, were taken over in recent years without a subsequent rise in test scores, the state’s chosen metric. Both received “F” grades again this year. The implementation of state takeover has been insensitive and insulting. Ohio’s Plunderbund reported in March that Krish Mohip, the state overseer CEO in Youngstown, feels he cannot safely move his family to the community where he is in charge of the public schools. He has also been openly interviewing for other jobs. Lorain’s CEO, David Hardy tried to donate the amount of what would be the property taxes on a Lorain house to the school district, when he announced that he does not intend to bring his family to live in Lorain.

EdChoice vouchers are a second high stakes punishment in the school attendance zones of “F”-rated schools. EdChoice gives families the opportunity to opt their children out of “failing” public schools by granting their children a chance to leave at public expense.  Writing for the Heights Observer, Susan Kaeser describes how this works in another Cleveland inner-ring suburban school district: “Access to EdChoice vouchers is tied to Ohio’s deeply flawed education accountability system.  If the aggregate test score data for an individual public school falls short, the school is defined as an EdChoice school.  Anyone residing in the attendance area of that school who could have attended that school is eligible for an EdChoice voucher… Nearly every district that has EdChoice designation serves many high-need students.”

Most students using EdChoice vouchers in the Cleveland Heights-University Heights School District which Kaeser describes are attending religious schools, and in fact real estate companies have been marketing houses in the state-designated neighborhoods as qualifying for EdChoice vouchers. Children can qualify for one of these vouchers as Kindergartners, without ever attending or intending to enroll in the public school that anchors the neighborhood. As Kaeser explains, “Once a student receives a voucher it can be renewed until the student graduates… Voucher use has grown exponentially as more schools were designated EdChoice and as recipients renew their vouchers.  This year, 176 Kindergarten students received first-time vouchers (without previously enrolling in a public school), adding to the total of more than 650 recipients.  The expected loss to the CH-UH district this year from EdChoice is $3.7 million….”  The rapid expansion of this program is fiscally unsustainable.

In a paywalled, September 14, 2018, On The Money report, a legislative update from the Hannah News Service, the Ohio Education Policy Institute school finance expert, Howard Fleeter tracks the impact statewide of Ohio’s EdChoice vouchers. Over the ten years since the program’s inception, it has grown from 3,100 to 22,153 students.  Fleeter explains: “EdChoice vouchers are worth up to $4,650 for students in grades K-8 and up to $6,000 for students in grades 9-12.”  He continues, explaining that while the money ostensibly comes from the state, EdChoice is “funded through a ‘district deduction’ system… The deduction system means that the voucher student is counted in the district of residence’s Formula ADM (Average Daily Membership) and then the voucher is paid for by deducting the voucher amount from the district’s state aid.  This can often result in a district seeing a deduction for the voucher greater than the state aid that was received for that student, meaning that the district is in effect subsidizing the voucher program.”  While in FY 2007, $10,368,839 was spent statewide for EdChoice vouchers.  By FY 2017, the amount statewide had climbed to $102,688,259.  Over the decade, a total of $649,158,483 of state and local tax dollars was diverted from public schools to private school tuition through EdChoice vouchers.

All of Ohio’s school districts where students qualify for EdChoice vouchers are districts serving very poor children. And yet, last month in a new report Howard Fleeter explains: “(R)esidential taxpayers in the low wealth districts are paying taxes at nearly the same rate as are their higher wealth counterparts… The Tax Effort measure shows that when ability to pay is taken into account, the low wealth districts are levying taxes at the highest rate relative to their income, while the highest wealth districts are levying taxes at the lowest rate relative to income.”  Fleeter continues: “(T)he lowest wealth… districts have seen their share of total state and local resources fall from 26.4% in FY99 to 23.1% in FY19, while the highest wealth… school districts have seen their share of total state and local resources increase from 22.2% in FY99 to 23.4% in FY19.  Unsurprisingly… a variety of equity measures indicate that equity in state and local school operating revenues improved from FY99 to FY 09, but regressed somewhat from FY09 to FY19.”

When he was interviewed by Jim Siegel for the Columbus Dispatch, Fleeter was less technical and more candid about the state’s school funding formula: “The formula itself is kind of just spraying money in a not-very-targeted way.”

Siegel reminds readers about the impact of the 2008 Great Recession, compounded by state tax cuts promoted by Governor John Kasich and passed by the legislature: “GOP leaders… eliminated the tangible personal property tax, which more than a decade ago generated about $1.1 billion per year for schools.  For a time, state officials reimbursed schools for those losses, but that has largely been phased out… And finally, there are Gov. John Kasich’s funding formula and fiscal priorities, including income-tax cuts that have meant an estimated $3 billion less in available revenue each year… Kasich crafted a new formula designed to drive funding to districts with the least ability to raise their own local funds, but Fleeter and public education officials have argued that it doesn’t quite work properly.”

Through various schemes to privatize education—EdChoice and several other voucher programs along with a large charter school sector—Governor Kasich and the Republican legislature have found another method, in addition to the flawed school funding formula, to divert needed state dollars out of public schools across the state.  State takeovers of struggling school districts and EdChoice vouchers are the clearest examples in state policy of punitive, top down programs that blame and punish local educators in poor communities instead of driving resources and support to communities serving concentrations of children in poverty.

Once again, it is appropriate to quote Harvard’s Daniel Koretz explaining in The Testing Charade just how high stakes, test-based accountability blames and punishes schools that face the overwhelming challenge of student poverty:  “One aspect of the great inequity of the American educational system is that disadvantaged kids tend to be clustered in the same schools. The causes are complex, but the result is simple: some schools have far lower average scores—and, particularly important in this system, more kids who aren’t ‘proficient’—than others. Therefore, if one requires that all students must hit the proficient target by a certain date, these low-scoring schools will face far more demanding targets for gains than other schools do. This was not an accidental byproduct of the notion that ‘all children can learn to a high level.’ It was a deliberate and prominent part of many of the test-based accountability reforms… Unfortunately… it seems that no one asked for evidence that these ambitious targets for gains were realistic. The specific targets were often an automatic consequence of where the Proficient standard was placed and the length of time schools were given to bring all students to that standard, which are both arbitrary.” (pp. 129-130)

Rick Hess’s Mistake: Failure of Test-and-Punish Is Not Limited to a Few Districts That Have Disappointed

Frederick M. Hess, the director of education policy studies at the American Enterprise Institute, has always been a corporate education reform kind of guy. That is why Hess’s honest analysis this week of the ultimate fraud of a succession of school district miracles—Washington, D.C.’s test score and graduation rate miracle under Michelle Rhee and those who followed her, Alonzo Crim’s Atlanta in the 1980s, Houston’s Texas Miracle under Rod Paige, Arne Duncan’s Chicago, and Beverly Hall’s Atlanta—is so refreshingly candid.

In all of these cases, as Hess points out, there was “a remarkable dearth of attention paid to ensuring that the metrics (were) actually valid and reliable.”  Second, it was “tempting for civic leaders and national advocates to accept happy success stories at face value—especially when they (were) fronted by a charismatic superintendent.” And finally “reformers and reporters (made) things worse with their lust for ‘celebrity superintendents’ and ‘model systems.’ Their fascination nurtur(ed) an echo chamber in which a handful of leaders (got) exalted, often for too-good-to-be-true results.”

One must give Hess credit for honestly admitting the failure of so much of what his own kind of school reformers have been exalting for the past quarter century—business school accountability for schools, driven by universal standardized testing, and evaluated by two primary outcomes—standardized test scores and graduation rates. But Hess makes a mistake when he attributes the problem to a few “model” school districts that have disappointed.

Hess’s explanation is inadequate.  Inadequate because the system itself—the whole idea of school reform based on high stakes testing—cannot work.  Daniel Koretz, the Harvard specialist on testing, tells us why in a recent book: The Testing Charade: Pretending to Make Schools Better.

Koretz defines the problem with high-stakes-test-based school accountability by exploring a primary principle of social science research. Forty years ago, Don Campbell, “one of the founders of the science of program evaluation,” articulated a core principle now known as “Campbell’s Law”: “The more any quantitative social indicator is used for social decision making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.” (p. 38)

How does Campbell’s Law describe the dilemma Frederick Hess identifies?  Koretz quotes Don Campbell himself describing the distortion that will follow when high stakes consequences are attached to a school district’s capacity to raise its aggregate test scores: “Achievement tests may well be valuable indicators of… achievement under conditions of normal teaching aimed at general competence.  But when test scores become the goal of the teaching process, they both lose their value as indicators of educational status and distort the educational process in undesirable ways.” (p. 39)

In The Testing Charade, Koretz provides extensive evidence about all the ways high stakes tied to test scores have triggered Campbell’s Law—to invalidate the test results themselves and to undermine our education system and the experiences of teachers and students trapped by No Child Left Behind and the Every Student Succeeds Act in a scheme to raise test scores at all costs.

One consequence is score inflation: “All that is required for scores to become inflated is that the sampling used to create a test has to be predictable… For inflation to occur, teachers or students need to capitalize on this predictability, focusing on the specifics of the test at the expense of the larger domain.” (p. 62)  We read about all the ways curriculum designers and teachers are incentivized to focus their classes on the specific elements of any particular academic discipline that have appeared on previous tests.

A second consequence, related to the first, is flat-out test-prep. Test prep narrows what is taught to students to the material that is tested and drills students about using clues in the test itself to come up with the right answers. Koretz identifies three kinds of bad test prep. Reallocation between subjects has been common when schools emphasize No Child Left Behind’s tested subjects—reading and math—and cut back on social studies, the arts, music and recess. Reallocation within subjects is when schools study past years’ versions of the state tests and ask teachers to focus on particular aspects of a subject.  Finally there is coaching. Schools and test-prep companies teach students to respond in a formulaic way to the format of the questions themselves. Koretz explains why all this has implications for educational equity: “Inappropriate test preparation, like score inflation, is more severe in some places than in others. Teachers of high-achieving students have less reason to indulge in bad preparation for high-stakes tests because the majority of their students will score adequately without it—in particular, above the ‘proficient’ cut score that counts for accountability purposes. So one would expect that test preparation would be a more severe problem in schools serving high concentrations of disadvantaged students…. Once again, disadvantaged kids are getting the short end of the stick.” (pp. 116-117)

And a third consequence, demonstrated in every one of Frederick Hess’s examples is cheating. Koretz examines the biggest cheating scandals, notably Atlanta, Philadelphia, and Washington, DC.  He notes: “Cheating—by teachers and administrators, not by students—is one of the simplest ways to inflate scores, and if you aren’t caught, it’s the most dependable.” Sometimes teachers or administrators erase and change students answers; sometimes they provide teachers or students with the test items in advance; other times teachers give students the answer during the test.  And finally sometimes schools “scrub” off the enrollment rolls the students who are likely to fail.

Koretz presents the questions around cheating by educators as morally fraught. After all, test scores are not simply a proxy for the quality of a school or a school district:  “One aspect of the great inequity of the American educational system is that disadvantaged kids tend to be clustered in the same schools. The causes are complex, but the result is simple: some schools have far lower average scores—and, particularly important in this system, more kids who aren’t ‘proficient’—than others. Therefore, if one requires that all students must hit the proficient target by a certain date, these low-scoring schools will face far more demanding targets for gains than other schools do. This was not an accidental byproduct of the notion that ‘all children can learn to a high level.’ It was a deliberate and prominent part of many of the test-based accountability reforms… Unfortunately… it seems that no one asked for evidence that these ambitious targets for gains were realistic. The specific targets were often an automatic consequence of where the Proficient standard was placed and the length of time schools were given to bring all students to that standard, which are both arbitrary.” (pp. 129-130)

In a system that, by its very structure, is guaranteed to trigger Campbell’s Law, Koretz wonders about the moral implications of cheating: “Just who is responsible?  Is it just the people who actually carry out the fraud or require it?  Or are those who create the pressures to cheat also culpable, even if not criminally?” (p. 91)

Like Frederick Hess, Daniel Koretz recognizes that although outcomes-based, test-and-punish school accountability has been hyped and celebrated, ultimately this kind of school policy has not improved schools as promised.  Koretz digs deeper, however, to expose that the system itself—not merely its abuse by particular educators in particular school districts—is deeply flawed.

Koretz concludes: “It is no exaggeration to say that the costs of test-based accountability have been huge. Instruction has been corrupted on a broad scale. Large amounts of instructional time are now siphoned off into test-prep activities that at best waste time and at worst defraud students and their parents.  Cheating has become widespread. The public has been deceived into thinking that achievement has dramatically improved and that achievement gaps have narrowed. Many students are subjected to severe stress… The primary benefit we received in return for all of this was substantial gains in elementary-school math that don’t persist until graduation… On balance, then, the reforms have been a failure.” (pp. 191-192)

Ohio Releases 2018 School Report Cards, Brands Poorest School Districts with “F”s

Yesterday, Ohio released school district report cards that reflect the test-and-punish theory that if we hold schools accountable for raising students’ test scores and graduation rates, teachers will somehow rise to the occasion and find a way to raise measured achievement to high levels.  Instead, the new state report cards demonstrate just what we already knew they would.  While the 2018 school report cards in Ohio have now become official and will subject the school districts branded with “F”s to punishments like state takeover, the state has been releasing unofficial, trial-balloon school and school district grades for several years now, and every time, the school districts in the state’s wealthiest communities got “A”s while city school districts, and inner-ring suburbs got “D”s and  “F”s.

This year, 28 school districts across Ohio earned “A” ratings. Twenty-three “A”-rated school districts are located in the state’s wealthiest suburban and exurban areas surrounding Cleveland, Cincinnati, Columbus, Dayton and Toledo. Eleven of the A-rated suburban districts are located in greater Cleveland, including five of Cuyahoga County’s privileged suburbs and six exurbs in the surrounding Geauga, Summit, Portage, Lorain and Medina Counties.  Five “A”-rated school districts are located in small towns—four in prosperous farming country in western Ohio.

Fourteen districts across Ohio received “F”s yesterday. These include the majority of the state’s largest cities: Cleveland, Canton, Columbus, Dayton, Toledo, and Youngstown.  Ohio’s other two big-city school districts—Cincinnati and Akron—earned “D” grades. The list of so-called “F” school districts also includes a number of very poor, segregated inner ring suburbs including East Cleveland and Euclid in greater Cleveland and North College Hill in greater Cincinnati. The two Ohio school districts currently under state takeover—Youngstown and Lorain—did not improve this year under state management; both earned “F” grades. Three school districts were waiting to learn whether the state would take them over if they earned an “F” again for the third time this year: Warrensville Heights in greater Cleveland and Trotwood-Madison in greater Dayton raised their scores to “D” and avoided the takeover. East Cleveland, among the very poorest and most racially segregated school districts in Ohio, will face state takeover, as its 2018 grade adds a third year to the district’s “F” ratings.

The Plain Dealer‘s Patrick O’Donnell has been reporting since 2013 (here and here) on what many Ohio researchers and educators believe is the correlation of the state’s school and school district grades with aggregate family income in the communities served by particular school districts.

More broadly, academic research, for half a century since the 1966 Coleman Report, has confirmed the correlation of school achievement—measured by standardized achievement tests and graduation rates—with aggregate neighborhood and family economic circumstances.  More recently, the Stanford University sociologist, Sean Reardon has shown that our society is resegregating by income with wealthy families and poor families moving to separate communities. Reardon also demonstrates that the number of mixed income communities is declining. Reardon has also shown that as our society is becoming more residentially segregated by family income, there has been a simultaneous jump in an income-inequality school achievement gap. The achievement gap between the children with income in the top ten percent and the children with income in the bottom ten percent was 30-40 percent wider among children born in 2001 than those born in 1975, and twice as large as the black-white achievement gap.  The geographic distribution of Ohio’s 2018, “A”–“F” school grades demonstrates the growing residential segregation of our state’s metropolitan areas and the kind of economic achievement gap Reardon has identified.

In his important new book, The Testing Charade: Pretending to Make Schools Better, Harvard University’s Daniel Koretz describes the testing regime formalized in the 2002 No Child Left Behind Act: “One aspect of the great inequity of the American educational system is that disadvantaged kids tend to be clustered in the same schools. The causes are complex, but the result is simple: some schools have far lower average scores—and, particularly important in this system, more kids who aren’t ‘proficient’—than others. Therefore, if one requires that all students must hit the proficient target by a certain date, these low-scoring schools will face far more demanding targets for gains than other schools do. This was not an accidental byproduct of the notion that ‘all children can learn to a high level.’ It was a deliberate and prominent part of many of the test-based accountability reforms… Unfortunately… it seems that no one asked for evidence that these ambitious targets for gains were realistic. The specific targets were often an automatic consequence of where the Proficient standard was placed and the length of time schools were given to bring all students to that standard, which are both arbitrary.” (pp. 129-130)

A new report this week from the Alliance to Reclaim Our Schools additionally indicts what remains very unequal school funding.  While it has been repeatedly demonstrated that school districts where poverty is concentrated need extra money to meet their students’ many needs, these school districts across the United States have fewer dollars per pupil once state and local funding is combined: “Districts serving white and more affluent students spend thousands to tens of thousands of dollars more, per pupil, than high poverty school districts and those serving majorities of Black and Brown students. The challenges faced by these schools—larger class size, fewer experienced teachers, the lack of libraries, science equipment, technology and counselors—all reflect a lack of resources.”  The report adds, “The Education Trust found that in 2015, on average, districts with large majorities of students of color provided about $1,800 (13 percent) less per student than districts in the same state serving the fewest students of color.”  Howard Fleeter, an economist and school funding analyst at the Ohio Education Policy Institute, confirmed in a recent report that Ohio’s current school funding formula fails to compensate for vastly unequal local fiscal capacity across Ohio’s school districts.

There are many reasons to be concerned about the broader implications of Ohio’s policy of awarding “A”–“F” grades to the state’s very unequally funded school districts—places which also reflect the geographic distribution of our society’s massive family economic inequality. While the federal Every Student Succeeds Act requires states to evaluate schools and publish the results, and while ESSA says that standardized test scores and graduation rates must be part of the calculation, Congress does not require states to award a single “summative” grade to each school and school district.  Several years ago in greater Cleveland, a local fair housing agency, Heights Community Congress sponsored a well-attended program on how real estate websites—like Great Schools, which at the time published A-F grades for public schools (Great Schools now uses numerical ratings.)—have been redlining particular school districts and the neighborhoods in the attendance zones of particular schools. You would think these real estate websites have been violating the Fair Housing Act by steering families away from particular school districts, but they have been, in fact, merely using the information provided by the state of Ohio in the school report cards. The branding of public schools with “A”–“F” grades (or today’s Great Schools’ numerical system) encourages families who can afford it to avoid poor and mixed income school districts and buy homes in homogeneously white and wealthy exurbia.

Instead of branding Ohio’s poorest African American and Hispanic school districts with “F”s and punishing the state’s very poorest school districts with state takeover, the state should significantly increase its financial support for public schools in poor communities and encourage the development of full-service wraparound schools that provide medical and social services for families right at school.  Ohio’s system of branding the state’s poorest schools with “F” grades and imposing sanctions like state takeover undermines support for public education in school districts that desperately need strong community institutions.  The school district report cards also encourage segregation of the state’s metropolitan areas by race and family income.

Repeating My Recommendation: Please Read Daniel Koretz’s Book, “The Testing Charade”

How has high stakes testing ruined our schools and how has this strategy, which was at the heart of No Child Left Behind, made it much more difficult to accomplish No Child Left Behind’s stated goal of reducing educational inequality and closing achievement gaps?

Here is how Daniel Koretz begins to answer that question in his 2017 book, The Testing Charade: Pretending to Make Schools Better: In 2002, No Child Left Behind “mandated that all states use the proficient standard as a target and that 100 percent of students reach that level. It imposed a short timeline for this: twelve years. It required that schools report the performance of several disadvantaged groups and it mandated that 100 percent of each of these groups had to reach the proficient standard. It required that almost all students be tested the same way and evaluated against the same performance standards.  And it replaced the straight-line approach by uniform statewide targets for percent proficient, called Adequate Yearly Progress (AYP)…. The law mandated an escalating series of sanctions for schools that failed to make AYP for each reporting group.” Later, “Arne Duncan used his control over funding to increase even further the pressure to raise scores.  The most important of Duncan’s changes was inducing states to tie the evaluation of individual teachers, rather than just schools, to test scores… The reforms caused much more harm than good. Ironically, in some ways they inflicted the most harm on precisely the disadvantaged students the policies were intended to help.”

Koretz poses the following question and his book sets out to answer it: “But why did the reforms fail so badly?”

I recommend Daniel Koretz’s book all the time as essential reading for anyone trying to figure out how we got to the deplorable morass that is today’s federal and state educational policy.  I wish I thought more people were reading this book. Maybe people are intimidated that its author is a Harvard expert on the design and use of standardized tests.  Maybe it’s the fact that the book was published by the University of Chicago Press. But I don’t see it in very many bookstores, and when I ask people if they have read it, most people tell me they intend to read it. To reassure myself that it is really worth reading, I set myself the task this past weekend of re-reading the entire book. And I found re-reading it to be extremely worthwhile.

The book divides into three parts—an introductory section of several chapters—six or seven chapters in the middle that dissect the way high stakes testing has undermined education and damaged the education of our nation’s poorest children—and some wrap-up chapters. It is the middle part that is essential. While Koretz has some ideas near the end about where we go from here, his analysis of the damage caused is the crucial part. After all, this section at the heart of the book addresses the conversational dilemma many readers of this blog must face as often as I do. What can you say to the person who doggedly tells you that a particular school is a fine school because its scores are high and another school is a failure because its test scores are so low? This person, often well-intentioned, has lived with test-based school accountability for so long that he cannot imagine there is any other way to consider school quality. And anyway, he says, standardized testing is what we have to evaluate schools, so it’s what we need to use.

Koretz explains a 40-year-old social science rule first articulated by Don Campbell, who Koretz identifies as “one of the founders of the science of program evaluation.” Here is how Campbell stated what we now call “Campbell’s Law”: “The more any quantitative social indicator is used for social decision making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.” The rest of the central chapters in Koretz’s book explain precisely how the use of high stakes punishments tied to low test scores has triggered Campbell’s Law. What are the high stakes punishments?  First came the school turnarounds prescribed by No Child Left Behind —firing the principal and half the teachers, closing the school, charterizing the school.  Later Arne Duncan added the evaluation of teachers by students’ test scores—and schemes rewarding teachers whose students scored well and firing the teachers whose students post low scores. Koretz summarizes No Child Left Behind’s test-and punish strategy: “The reformers’ implicit assumption seemed to be that many teachers knew how to teach more effectively but were being withholding, and therefore confronting them with sanctions and rewards would be enough to get them to deliver.”

Three chapters explore how No Child Left Behind’s test-and-punish strategy has distorted schooling itself and has undermined how teachers teach and how students learn.

  • Score Inflation: When the state achievement tests mandated by No Child Left Behind—the ones that would bring negative consequences for schools and teachers—were compared by experts like Koretz himself to another “audit” test such as the National Assessment for Education Progress (NAEP), which has no high stakes consequences, the researchers discovered that while the scores on the state test rose rapidly, NAEP scores remained flat.  Koretz comments: “(I)ncreases in scores are meaningful only if they signal similar increases in mastery of the domain.  If they do generalize to the domain, gains should appear on other tests that sample from the same domain.” He continues: “(A)ll that is required for scores to become inflated is that the sampling used to create a test has to be predictable… For inflation to occur, teachers or students need to capitalize on this predictability, focusing on the specifics of the test at the expense of the larger domain.”  And there are equity concerns here, because score inflation has occurred more often in schools serving poor students: “Ongoing work by my own group has shown… that it is not just the poverty of individual students that predicts the amount of inflation but also the concentration of poor students in a school… (S)chools with a higher proportion of poor students showed greater average inflation.” Teachers under pressure are finding a way to raise test scores without really teaching the students the material they are supposed to be learning.  Some schools have also inflated overall scores by focusing primarily on children right at the pass/fail level and paying less attention to students far behind.
  • Cheating: Koretz examines the big cheating scandals, notably Atlanta, Philadelphia, and Washington, DC.  He notes: “Cheating—by teachers and administrators, not by students—is one of the simplest ways to inflate scores, and if you aren’t caught, it’s the most dependable.” Sometimes teachers or administrators erase and change students answers; sometimes they provide teachers or students with the test items in advance; other times teachers give students the answer during the test.  And finally sometimes schools “scrub” off the enrollment rolls the students who are likely to fail.
  • Test Prep: Test prep narrows what is taught to students to the material that is tested.  Koretz identifies three kinds of bad test prep. Reallocation between subjects has been common when schools emphasize No Child Left Behind’s tested subjects—reading and math—and cut back on social studies, the arts, music and recess. Reallocation within subjects is when schools study past years’ versions of the state tests and ask teachers to focus on particular aspects of a subject.  Finally there is coaching. Schools and test-prep companies teach students to respond in a formulaic way to the format of the questions themselves. Koretz explains why all this has implications for educational equity: “Inappropriate test preparation, like score inflation, is more severe in some places than in others. Teachers of high-achieving students have less reason to indulge in bad preparation for high-stakes tests because the majority of their students will score adequately without it—in particular, above the ‘proficient’ cut score that counts for accountability purposes. So one would expect that test preparation would be a more severe problem in schools serving high concentrations of disadvantaged students…. Once again, disadvantaged kids are getting the short end of the stick.”

Two chapters in this middle section explore the ways No Child Left Behind’s test-and-punish scheme has undermined equitable access to education in the schools in areas of concentrated poverty across our cities. The law that promised to leave no child behind not only encouraged test prep and cheating in the schools whose needs were greatest, but it also set impossibly tough and largely arbitrary test score targets for those schools and an impossibly short timeline for bringing students up to those targets.  And then the federal government set out to punish the schools and the teachers unable to meet the targets.

  • Making Up Unrealistic Targets: In this chapter, Koretz explains how No Child Left Behind’s standardized cut scores and timelines were set unrealistically and arbitrarily; the consequence was to label schools in poor areas as “failing” and to subject schools in areas of concentrated poverty to a series of punishments. Here is Koretz’s short summary: “Part of the blame for this failure lies with the crude and unrealistic methods used to confront inequity.  In a nutshell, the core of the approach has been simply to set an arbitrary performance target (the ‘Proficient’ standard) and declare that all schools must make all students reach it in an equally arbitrary amount of time.  No one checked to make sure the targets were practical.  The myriad factors that cause some students to do poorly in school—both the weaknesses of many of the schools they attend and the disadvantages some students bring to school—were given remarkably little attention. Somehow teachers would just pull this off… The trust most people have in performance standards is essential, because the entire educational system now revolves around them. The percentage of kids who reach the standard is the key number determining which teachers and schools will be rewarded or punished… But in fact, despite all the care that goes into creating them, these standards are anything but solid. They are arbitrary, and the ‘percent proficient’ is a very slippery number… A primary motivation for setting a Proficient standard is to prod schools to improve, but information about how quickly teachers actually can improve student learning doesn’t play much, if any, of a role in setting performance standards… However, setting the standards themselves is just the beginning. What gives the performance standards real bite is their translation into concrete targets for educators, which depends on more than the rigor of the standard itself… We have to say how quickly performance has to increase—not only overall but for different types of kids and schools. A less obvious but equally important question is how much variation in performance is acceptable.”
  • Evaluating Teachers: In 2009, beginning with Race to the Top and later as a condition for states to qualify for waivers from the worst consequences of No Child Left Behind, Arne Duncan’s Department of Education required states to change their laws to tie a percentage of teachers’ formal evaluations to students’ test scores. Myriad problems ensued. First of all, the required tests are in reading and math. What about the other teachers? Koretz describes Florida and Tennessee, which judged teachers in non-tested grades and subjects by the scores of students who were not in their classes, and in one case not in their schools.  Other states added tests in music, art, and physical education—subjecting students to added standardized testing—just for the purpose of state teacher evaluations.  Koretz explains the problems with Value-Added Modeling to evaluate teachers; many factors affecting students’ scores cannot be traced to any teacher and any teacher’s ratings seem to be unstable over several years.

I cannot imagine exactly how our society can recover from the our terrible test-and-punish misadventure and our labeling as “failing” the institutions and teachers who serve our poorest children.  What is heartening about The Testing Charade: Pretending to Make Schools Better is the clarity with which Daniel Koretz presents our current dilemma: “We now know what many educators did.  Faced with unrealistic targets, some cut corners or simply cheated.  And perhaps because the system, in its zeal to address inequities, made the targets most unrealistic for educators serving disadvantaged kids, those kids—ironically—got the worst of it: the most test prep, the most score inflation, and apparently the most cheating.  And yet inflated scores allowed policy makers to declare victory, and the public received a steady diet of encouraging but bogus news about rapid improvements in the achievement gap…. On balance… the reforms have been a failure.”

Please read The Testing Charade.  We all need to understand and be able to explain how we’ve gone so far astray.