Faith in High Stakes Testing Fades, Even Among the Corporate School Reformers

After a recent twenty-fifth anniversary conference at the Center on Reinventing Public Education at the University of Washington, Bothell—a Gates funded education-reformer think tank, Chalkbeat‘s Matt Barnum summarized presentations by a number of speakers who demonstrate growing skepticism about the high-stakes, standardized testing regime that has dominated American public education for over a quarter of a century.

Because the Center on Reinventing Public Education is known as an advocate for portfolio school reform and corporate accountability, you might expect adherence to the dogma of test-and-punish, but, notes Barnum:  “The pervasiveness of the complaints about testing was striking, given that many education reform advocates have long championed using test scores to measure schools and teachers and then to push them to improve.”

Then at a Massachusetts Institute of Technology School Access and Quality Summit early this month, Paymon Rouhanifard presented a major policy address challenging the use of high stakes testing to rank and rate public schools.  Rouhanifard was until very recently Chris Christy’s appointed, school-reformer superintendent in Camden, New Jersey.  Formerly he was the director in New York City of Joel Klein’s Office of Portfolio Management.  Rouhanifard describes the belief system he brought with him to Camden and describes how his five-year tenure as Camden’s superintendent transformed his thinking: “Our belief was that politics and bureaucracy had inhibited the progress Camden students and families deserved to overcome the steep challenges the city was facing…  We believed it was important for the district to segue out of being a highly political monopoly operator of schools….  This is a story about an evolution of my own thinking during that five-year experience…. What I’m referring to are the math and literacy student achievement data we utilize to drive so many of the critical decisions we make… My realization a few years ago was that I rarely asked questions about what these tests actually told us.  What they didn’t tell us.  And perhaps most importantly, what were the specific behaviors they incentivized, and what were the general trade-offs when we acutely focus on how students do on state tests.”

In 2013, at the beginning of his tenure, Rouhanifard introduced a school report card that rated each school primarily by students’ standardized test scores. Two years ago Rouhanifard eliminated his own school report cards.  He describes his realization: “We are spending an inordinate amount of time on formative and interim assessments and test prep, because those are the behaviors we have incentivized.  We are deprioritizing the sciences, the arts, and civic education…. I… believe the drawbacks currently outweigh the benefits.  That we haven’t been honest about the trade-offs.”

Shael Polakow-Suransky, like Rouhanifard, held a position in Joel Klein’s “reformer” school administration in New York City.  Now the president of Bank Street College of Education, he was formerly Klein’s former deputy schools chancellor. Barnum explains that Polakow-Suransky has become an emphatic critic of the nation’s high-stakes standardized testing regime: “The biggest barrier to student learning and closing the achievement gap is the current system of standardized tests.”

In a piece at The74, the  Thomas Fordham Institute’s Robert Pondiscio quotes Polakow-Suransky: “All of us were well-intentioned in pushing this agenda, but the tools we developed were not effective in raising the bar on a wide scale.”

While the Thomas Fordham Institute has endorsed corporate school reform including high-stakes, test-based accountability, Fordham’s Pondiscio now acknowledges that under the Every Student Succeeds Act, U.S. public schools have become mired in an education culture defined by test-based accountability.  Though he seems unclear on the way forward, Pondiscio now advocates for serious reconsideration: “The challenge is not testing vs. not testing.  It’s not accountability vs. none.  Both bring benefits of different kinds, and both are required by a federal law that’s not going to change anytime soon.  The challenge is to develop a policy vision that supports—not thwarts—the classroom practices and long-term student outcomes we seek… The problem is the reductive culture of testing, which has come to shape and define American education, particularly in the kinds of schools attended by our most disadvantaged children.”

There are some who remain faithful to the school reformer dogma. The Center on Reinventing Public Education’s Robin Lake tries to change the subject: “We need a more productive debate about school accountability, not tired arguments over testing.” And Matt Barnum quotes Sandy Kress—still a tried-and-true believer in the No Child Left Behind regime he helped create: “Research shows clearly that accountability made a real difference in this country in narrowing the achievement gap and lifting student achievement.”

Of course, research does not clearly show that Sandy Kress’s kind of No Child Left Behind accountability made a real difference.  Here is Harvard’s Daniel Koretz, in the authoritative book he published a year ago, The Testing Charade: Pretending to Make Schools Better.  It is perhaps this volume by an academic expert on testing that has helped change the minds of some of the corporate school reformers quoted above.  Koretz writes: “It is no exaggeration to say that the costs of test-based accountability have been huge.  Instruction has been corrupted on a broad scale.  Large amounts of instructional time are now siphoned off into test-prep activities that at best waste time and at worst defraud students and their parents.  Cheating has become widespread.  The public has been deceived into thinking that achievement has dramatically improved and that achievement gaps have narrowed.  Many students are subjected to severe stress, not only during testing but also for long periods leading up to it.  Educators have been evaluated in misleading and in some cases utterly absurd ways  Careers have been disrupted and in some cases ended.  Educators, including prominent administrators, have been indicted and even imprisoned.  The primary benefit we received in return for all of this was substantial gains in elementary-school math that don’t persist until graduation.  This is true despite the many variants of test-based accountability the reformers have tried, and there is nothing on the horizon now that suggests that the net effects will be better in the future. On balance, then, the reforms have been a failure.” (The Testing Charade, pp. 191-192)

Introducing readers to Don Campbell, “one of the founders of the science of program evaluation,” Koretz defines the problems inherent in our society’s quarter century of high-stakes, test-and-punish school accountability by quoting Campbell’s Law:  “The more any quantitative social indicator is used for social decision making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intend to monitor.”  Campbell directly addresses the problem of high stakes testing to rank and rate schools:  “Achievement tests may well be valuable indicators of … achievement under conditions of normal teaching aimed at general competence. But when test scores become the goal of the teaching process, they both lose their value as indicators of educational status and distort the educational process in undesirable ways.” (The Testing Charade, pp. 38-39)

How has the testing regime operated perversely to undermine the schools serving our society’s most vulnerable children—the ones we were told No Child Left Behind would catch up academically if only we created incentives and punishments to motivate their teachers to work harder?  “One aspect of the great inequity of the American educational system is that disadvantaged kids tend to be clustered in the same schools.  The causes are complex, but the result is simple: some schools have far lower average scores—and, particularly important in this system, more kids who aren’t ‘proficient’—than others.  Therefore, if one requires that all students must hit the proficient target by a certain date, these low-scoring schools will face far more demanding targets for gains than other schools do.  This was not an accidental byproduct of the notion that ‘all children can learn to a high level.’  It was a deliberate and prominent part of may of the test-based accountability reforms… Unfortunately… it seems that no one asked for evidence that these ambitious targets for gains were realistic  The specific targets were often an automatic consequence of where the proficient standard was placed and the length of time schools were given to bring all students to that standard, which are both arbitrary.”  (The Testing Charade, pp. 129-130)

Besides imposing unreasonable and damaging punishments on the schools and teachers serving our society’s poorest children, Koretz believes our commitment to a regime of punitive testing has distracted our society from developing the commitment to address the real needs of children and schools in places where poverty is concentrated: “We can undoubtedly reduce variations in performance appreciably, if we summoned the political will and committed the resources to do so—which would require a lot more than simply imposing requirements that educators reach arbitrary targets for test scores.” The Testing Charade, p. 131)


Northwestern University Economist Uses Data to Prove Students’ Test Scores Fail to Measure Quality Teaching

Mike Rose, a UCLA education professor, understands a lot about teaching.  In his extensive writing about education, Rose explains good teaching with precision and insight.  Rose culminated a four year visit to excellent classrooms across the United States with the publication of the story of those teachers in Possible Lives. He has also written widely about what good teachers do and what ought to be considered when teachers are evaluated.

Rose explains: “Teaching done well is complex intellectual work, and this is so in the primary grades as well as Advanced Placement physics.  Teaching begins with knowledge of subject matter, of instructional materials and technologies, of cognitive and social development.  But it’s not just that teachers know things.  Teaching is using knowledge to foster the growth of others. This takes us to the heart of what teaching is…. The teacher sets out to explain what a protein or a metaphor is, or how to balance the terms in an algebraic equation, or the sociological dynamics of prejudice, but to do so needs to be thinking about how to explain these things: what illustrations, what analogies, what alternative explanations when the first one fails.  This instruction is done not only to convey particular knowledge about metaphors or algebraic equations, but also to get students to understand and think about these topics.  This involves hefty cognitive activity, … but the teacher is doing it with a room full of young people—which brings a significant performative dimension to the task.”

Rose continues: “Thus teaching is a deeply social and emotional activity. You have to know your students and be able to read them quickly, and from that reading make decisions to slow down or speed up, stay with a point or return to it later, connect one student’s comment to another’s.  Simultaneously, you are assessing on the fly Susie’s silence, Pedro’s slump, Janelle’s uncharacteristic aggressiveness.  Students are, to varying degrees, also learning from each other, learning all kinds of things, from how to carry oneself to how to multiply mixed numbers. How teachers draw on this dynamic interaction varies depending on their personal style, the way they organize their rooms, and so on—but it is an ever-present part of the work they do.”

Rose further describes characteristics of the classrooms created by excellent teachers and what he has observed about how teachers continue to improve their practice through their careers: “The classrooms were safe. They provided physical safety…. but there was also safety from insult and diminishment… And there was safety to take intellectual risks… Intimately related to safety is respect… Respect also has a cognitive dimension.  As a New York principal put it, ‘It’s not just about being polite—even the curriculum has to be challenging enough that it’s respectful.’  Talking  about safety and respect leads to a consideration of authority… A teacher’s authority came not just with age or with the role, but from multiple sources—knowing the subject, appreciating students’ backgrounds, and providing a safe and respectful space. And even in traditionally run classrooms, authority was distributed. Students contributed to the flow of events, shaped the direction of discussion, became authorities on the work they were doing. These classrooms, then, were places of expectation and responsibility.”

As Rose discusses the characteristics of good teaching, he is not evaluating teachers by standardized test scores in language arts and mathematics. Arne Duncan’s U.S. Department of Education made the evaluation of teachers by their students’ test scores a condition for states’ qualifying for Race to the Top grants and No Child Left Behind Waivers.  Major research bodies—the American Statistical Association and the American Education Research Association— have, however, condemned the use of test scores and econometric, value-added-measures of teacher quality due to their unreliability.

Now Northwestern University economist, C. Kirabo Jackson, has developed an econometric model demonstrating that students’ test scores including those based on value-added models miss the most important characteristics of teachers—the sort of qualitative characteristics Mike Rose so clearly describes.  In EducationNext, Jackson explains: “I find that, while teachers have notable effects on both test scores and non-cognitive skills, their impact on non-cognitive skills is 10 times more predictive of students’ longer-term success in high school than their impact on test scores. We cannot identify the teachers who matter most by using test-score impacts alone, because many teachers who raise test scores do not improve non-cognitive skills, and vice versa. These results provide hard evidence that measuring teachers’ impact through their students’ test scores captures only a fraction of their overall effect on student success.”

Jackson creates a behavior index to measure whether students act out, skip class or fail to hand in homework.  He concludes: “A student whose 9th-grade behavior index is at the 85th percentile is a sizable 15.8 percentage points more likely to graduate from high school on time than a student with a median behavior index score.  I find a weaker relationship with test scores…”

Jackson then studies specific teachers and whether the same teachers demonstrate facility at raising students’ language arts and math scores, on the one hand, and improving their behavior, on the other.  His data demonstrate that, “(M)any teachers who are excellent at improving one skill are poor at improving the other, but also that knowing a teacher’s impact on one skill provides little information on the teacher’s impact on the other.”  You will need to read Jackson’s new article to learn his methodology for evaluating teachers’ impact on students’ behavior.

Jackson concludes: “These results confirm an idea that many believe to be true but that has not been previously documented—that teacher effects on test scores capture only a fraction of their impact on their students. The fact that teacher impacts on behavior are much stronger predictors of their impact on longer-run outcomes than test-score impacts, and that teacher impacts on test scores and those on behavior are largely unrelated, means that the lion’s share of truly excellent teachers, those who improve long-run outcomes—will not be identified using test-score value added alone… This analysis provides the first hard evidence that such contributions to student progress are both measurable and consequential.”

While I’m delighted to know that Jackson’s research has exposed a tragic flaw in the practice of judging teachers by the standardized test scores of their students, I hope this new research does not stimulate policy makers to start demanding that states use Jackson’s methodology to measure teachers’ impact on student behavior.

Thank goodness, Jackson himself suggests a more qualitative approach: “To fully assess teacher performance, policymakers should consider measures of a broad range of student skills, classroom observations, and responsiveness to feedback alongside effective ratings based on test scores.”  In other words Jackson acknowledges that experts on pedagogy—people like Mike Rose—know what they are talking about when they analyze and describe excellent teaching.

School Ratings Not Only Tell You Little about Schools, They Contribute to Economic Segregation

Jack Schneider, a professor and education historian at the College of the Holy Cross and director of research for the Massachusetts Consortium for Innovative Education Assessment, points out that the school district in Boston, Massachusetts encourages parents to choose from among the public schools across the district.  In a short commentary,  State School Rankings ‘Virtually Worthless,’ Schneider explains that many parents make that choice pretty much based on overall school ratings assigned by the state.

How does Massachusetts calculate its school ratings?  “Each year, the state classifies schools into one of five levels, with the ‘highest performing’ designated Level 1. This practice, though distinct in its details, is in keeping with what is done in the vast majority of states. The theory behind such rankings, whether devised as numerical scores, A-F grades, or narrative labels, is that parents and communities want a clear and simple indicator of school quality. Unfortunately, there are… flaws that make these levels virtually worthless. The first and most obvious problem with state-issued ratings of schools is that they are based primarily on a flawed measure: student standardized test scores.”

Schneider believes such school “grades,” “report cards” and rating systems show parents very little about the quality of schools. Schneider explains all the factors about school quality that test-based ratings omit: “Last fall, MassINC conducted a poll of Boston parents and found that more than two-thirds of them identified as ‘very important’ or ‘extremely important’ all of the following: the quality of the teachers and administrators; school safety and discipline; the school’s academic programming; college and career readiness; class sizes; facility quality; the values promoted by the school; the school’s approach to discipline; and the diversity of the teachers and administrators. These critical dimensions of school quality are mostly ignored in the vast majority of statewide rating systems….”

Also, explains Schneider, “(S)chools are not uniformly good or bad. As most of us know from experience, schools—as structures, organizations, and communities—have different strengths and weaknesses. Schools that are struggling in some ways may be thriving in others. And schools with illustrious reputations often have a lot to work on.”

And finally, Schneider names the reality that school ratings are shaping our society: “Perhaps most importantly, ratings shape the decisions parents make about where to live and where to send their children to school.”  Although Schneider does not explore the details of this important observation,  academic research demonstrates the reasons why school ratings are likely to reinforce growing housing segregation by family income.

Over a half century of sociological research (dating back to the landmark 1966 Coleman report) demonstrates a strong correlation between overall school achievement and aggregate family income. When states rate schools by their aggregate test scores, the schools whose students are wealthy tend to get an A, and the schools serving very poor children too frequently get a D or an F.  Here are academic experts discussing how test scores reflect a community’s aggregate economic level, not school quality.

In 2011, the Stanford University educational sociologist Sean Reardon showed here that while in 1970, only 15 percent of families lived in neighborhoods classified as affluent or poor, by 2007, 31 percent of families lived in such neighborhoods. By 2007, fewer families across America lived in mixed income communities. Reardon also demonstrated here that along with growing residential inequality is a simultaneous jump in an income-inequality school achievement gap. The achievement gap between the children with income in the top ten percent and the children with income in the bottom ten percent, was 30-40 percent wider among children born in 2001 than those born in 1975, and twice as large as the black-white achievement gap.

Based on Reardon’s research, in a 2016 report from the National Education Policy Center warning against the continued reliance on No Child Left Behind’s strategy of testing children, rating schools by scores, and punishing the schools and teachers unable quickly to raise scores, William Mathis and Tina Trujillo caution policymakers: “We cannot expect to close the achievement gap until we address the social and economic gaps that divide our society… Low test scores are indicators of our social inequities… Otherwise, we would not see our white and affluent children scoring at the highest levels in the world and our children of color scoring equivalent to third-world countries.  We also would not see our urban areas, with the lowest scores and greatest needs, funded well below our highest scoring suburban schools. With two-thirds of the variance in test scores attributable to environmental conditions, the best way of closing the opportunity gap is through providing jobs and livable wages across the board. We must also deal with governmentally determined housing patterns that segregate our children… One of the frequently heard phrases used to justify annual high-stakes disaggregated assessment is that ‘shining a light’ on deficiencies of particular groups will prompt decision-makers to increase funding, expand programs, and ensure high quality. This has not happened. Shining a light does not provide the social and educational learning essentials for our neediest children.”

William Mathis and Kevin Welner, in another 2016 National Education Policy Report, summarize what was misguided about school accountability policy imposed by No Child Left Behind and the Every Student Succeeds Act: “As policymakers and the courts abandoned desegregation efforts and wealth moved from cities to the suburbs, most of the nation’s major cities developed communities of concentrated poverty, and policymakers gave the school districts serving those cities the task of overcoming the opportunity gaps created by that poverty. Moreover, districts were asked to do this with greatly inadequate funding. The nation’s highest poverty school districts receive ten percent lower funding per student while districts serving children of color receive 15 percent less. This approach, of relying on under-resourced urban districts to remedy larger societal inequities, has consistently failed.”

How does this relate to test-based school accountability?  Last fall, in The Testing Charade: Pretending to Make Schools Better, Harvard University’s Daniel Koretz explains: “One aspect of the great inequity of the American educational system is that disadvantaged kids tend to be clustered in the same schools. The causes are complex, but the result is simple: some schools have far lower average scores…. Therefore, if one requires that all students must hit the proficient target by a certain date, these low-scoring schools will face far more demanding targets for gains than other schools do. This was not an accidental byproduct of the notion that ‘all children can learn to a high level.’ It was a deliberate and prominent part of many of the test-based accountability reforms…. Unfortunately… it seems that no one asked for evidence that these ambitious targets for gains were realistic. The specific targets were often an automatic consequence of where the Proficient standard was placed and the length of time schools were given to bring all students to that standard, which are both arbitrary.” (pp. 129-130) Policymakers decided that, if sufficiently pressured to raise test scores, teachers would be able to do so: “(T)hey acted as if… (schools alone could) largely eliminate variations in student achievement, ignoring the impact of factors that have nothing to do with the behavior of educators—for example, the behavior of parents, students’ health and nutrition, and many characteristics of the communities in which students grow up.” (p. 123-124)

Test-and-punish accountability since 2002, when No Child Left Behind was enacted, has condemned as “failing” the poorest schools and school districts whose test scores, according to academic research, are undermined by the economic circumstances of their communities and families. In lock-step, states have bought in to holding schools accountable and exacerbated the problem by ranking schools with numerical rankings or letter grades—again based on standardized test scores—that encourage wealthier families who can afford it to move to affluent communities that brag about A-rated schools and to abandon the schools in poor communities. For sixteen years, school accountability policies mandated by federal and state governments have been contributing to the economic resegregation of America’s metropolitan areas.

An Urgently Needed New Year’s Resolution for Those Who Care About Public Education

A worthwhile New Year’s resolution would be to honor educators—the people who feel called to help others realize their promise. We live in an era of attacks on the public schools and school teachers, and even on higher education in America’s world-renowned colleges and universities.

A resolution to honor educators would mean we consult educators about the public policies that shape our schools, but in recent years we have listened instead to politicians, philanthropists,  business leaders, and tech titans—Michael Bloomberg, Bill Gates, Arne Duncan, Mark Zuckerberg, and Eva Moskowitz—or Eli Broad, Jeb Bush and Betsy DeVos.

As it happens, John Dewey—a professor of education, perhaps America’s most famous education philosopher, and an education psychologist as well—published a short, readable education creed in 1897. As an exercise for the new year, indulge yourself by comparing Dewey’s pedagogic creed to the ideas and principles that underpin today’s public education policy driven by business, philanthropy, the tech-savvy, and politicians. Imagine how different our schools might be if school teachers who have studied the philosophy and psychology of education were trusted by the education committees in Congress and across the statehouses.

Here are just four of the concepts explored in Dewey’s “Pedagogic Creed.” Dewey’s thinking directly confronts what is happening in our schools driven by high stakes test and punish—charter schools dominated by no-excuses compliance—schools with unworkable ratios of students per teacher—schools oriented to college-and-career prep.

First, Dewey, the psychologist, explains that because all learning comes from within the learner, school must be child- or student-centered.  “I believe that interests are the signs and symptoms of growing power. I believe that they represent dawning capacities.  Accordingly the constant and careful observation of interests is of the utmost importance for the educator. I believe that these interests are to be observed as showing the state of development which the child has reached. I believe that they prophesy the state upon which he is about to enter. I believe that only through the continual and sympathetic observation of childhood’s interests can the adult enter into the child’s life and see what it is ready for, and upon what material it could work most readily and fruitfully.”  Therefore, “The child’s own instincts and powers furnish the material and give the starting point for all education.  Save as the efforts of the educator connect with some activity which the child is carrying on of his own initiative independent of the educator, education becomes reduced to a pressure from without. It may, indeed, give certain external results but cannot truly be called educative.”

Second, Dewey challenges the idea of school as career prep or college prep. “I believe that much of present education fails because it neglects this fundamental principle of the school as a form of community life. It conceives the school as a place where certain information is to be given, where certain lessons are to be learned, or where certain habits are to be formed.  The value of these is conceived as lying largely in the remote future; the child must do these things for the sake of something else he is to do; they are mere preparation.  As a result they do not become a part of the life experience of the child and so are not truly educative.” “But on the other hand, the only possible adjustment which we can give to the child under existing conditions, is that which arises through putting him in complete possession of all his powers.  With the advent of democracy and modern industrial conditions, it is impossible to foretell definitely just what civilization will be twenty years from now. Hence it is impossible to prepare the child for any precise set of conditions. To prepare him for the future life means to give him command of himself; it means to to train him that he will have the full and ready use of all his capacities; that his eye and ear and hand may be tools ready to command, that his judgment may be capable of grasping the conditions under which it has to work, and the executive forces be trained to act economically and efficiently.”

Third, what about the role of the teacher and the student’s peers?  “I believe that the only true education comes through the stimulation of the child’s powers by the demands of the social situations in which he finds himself.  Through these demands he is stimulated to act as a member of a unity, to emerge from his original narrowness of action and feeling and to conceive of himself from the standpoint of the welfare of the group to which he belongs. Through the responses which others make to his own activities he comes to know what these mean in social terms… For instance, through the response which is made to the child’s instinctive babblings the child comes to know what those babblings mean; they are transformed into articulate language and thus the child is introduced into the consolidated wealth of ideas and emotions which are now summed up in language.” “I believe that moral education centres about this conception of the school as a mode of social life, that the best and deepest moral training is precisely that which one gets through having to enter into proper relations with others in a unity of worth and thought… I believe that under existing conditions far too much of the stimulus and control proceeds from the teacher, because of the neglect of the idea of the school as a form of social life.  I believe that the teacher’s place and work in the school is to be interpreted from this same basis.  The teacher is not in the school to impose certain ideas or to form certain habits in the child, but is there as a member of the community to select the influences which shall affect the child and to assist him in properly responding to these influences.”

And fourth, all education must be social; it cannot happen merely in front of a computer screen. “I believe that education is a regulation of the process of coming to share in the social consciousness…” “This process begins unconsciously almost at birth, and is continually shaping the individual’s powers, saturating his consciousness, forming his habits, training his ideas, and arousing his feelings and emotions. Through this unconscious education the individual gradually comes to share in the intellectual and moral resources which humanity has succeeded in getting together… The most formal and technical education in the world cannot safely depart from this general process.” “I believe that in the ideal school we have the reconciliation of the individualistic and the institutional ideals. I believe that the community’s duty to education is, therefore, its paramount moral duty… I believe it is the business of every one interested in education to insist upon the school as the primary and most effective instrument of social progress and reform in order that society may be awakened to realize what the school stands for, and aroused to the necessity of endowing the educator with sufficient equipment to perform his task… I believe, finally, that the teacher is engaged, not simply in the training of individuals, but in the formation of the proper social life.”

Yes! Rethinking the Value of Testing and of Graduation Tests, Ohio Joins More Progressive States

At its meeting on Tuesday, the Ohio State Board of Education discussed ways to reduce standardized testing along with the urgent need to amend the state’s current demand that high school students pass an overly tough set of end-of-course exams in order to qualify for high school graduation. The board had already eased the graduation requirement for the class of 2018. Now its members have agreed to ask the legislature to add an alternative path to graduation for students in the classes of 2019 and 2020.

The Plain Dealer‘s education reporter Patrick O’Donnell explains: “Statewide requirements that students score well on state tests in order to earn a diploma took effect with the class of 2018, this year’s senior class. But worries about a graduation ‘apocalypse’ or ‘trainwreck’ because of low scores led the board and state legislature to ease the requirements earlier this year, just for the senior class… After debate the last few months, board members now want to extend the same exemptions for the classes of 2019 and 2020… Those include graduating, even if state test scores are poor, by reaching some career training goals, having strong attendance or classroom grades as seniors, doing a senior capstone project or working at a job or on community service.”

On Tuesday, the Ohio State Board of Education also discussed ways to reduce the overall heavy test burden on students and teachers: “The state school board is asking the Ohio legislature to wipe out three items that add a testing burden to teachers and students—the high school English I exam, WorkKeys tests for some career training students, and requirements that some tests be given just to evaluate teachers.  State Superintendent Paolo DeMaria and an advisory panel he appointed recommended these and other changes to the board in June, after statewide outcry over the time spent on standardized testing in schools… Board members voted nearly unanimously for the three reductions Tuesday afternoon… (T)he board and DeMaria agreed that the state needs only the high school English II exam, usually given to sophomores, to meet the federal requirement for an English test in high school. They also agreed strongly with DeMaria’s recommendation to wipe out tests that are given just to measure the effectiveness of teachers.  Districts often give a pre-test at the start of the year, then another at the end of the year, to see how much a teacher taught over the year.”

O’Donnell adds that State Superintendent DeMaria recommends eliminating a number of other tests considered extraneous by his advisory panel.

Ohio’s beginning steps to cut back on the standardized testing that has dominated schools since 2002, when No Child Left Behind became federal law, reflect a broader trend, according to Monty Neill and Lisa Guisbond of the National Center for Fair and Open Testing (FairTest).  FairTest just released a major report, Test Reform Victories Surge in 2017: What’s Behind the Winning Strategies?, which summarizes the effects of broad public opposition to over-testing and some relaxation of federal pressure now that No Child Left Behind has been replaced by the Every Student Succeeds Act: “Widespread opposition to the overuse and misuse of standardized testing is producing a marked shift in attitudes about high-stakes assessments and, increasingly, state and district practices… The drumbeat of concerns includes: the amount of testing; the time it consumes; the outsized consequences for students, teachers and schools attached to test scores; the negative impacts on educational equity for low-income and minority students; and the damage to teaching, learning and children’s futures from the testing fixation.”

FairTest’s report is particularly scathing about the damage for young adults when failure of state-mandated tests denies them a high school diploma: “For tens of thousands of students who don’t drop out but stay in school and complete their other high school graduation requirements, exit exams unjustly confer the status and diminished opportunities of high school dropouts. The National Research Council of the National Academy of Sciences concluded that the graduation tests have done nothing to lift student achievement but have raised the dropout rate. Since 2012, the number of states that had or planned to have standardized high school exit exams has plunged from 25 to 13.”

FairTest adds that “seven states have made their elimination of graduation testing retroactive,” creating the opportunity for students previously denied diplomas in Georgia, South Carolina, California, Alaska, Arizona, Texas, and Nevada to apply for the diplomas they were denied as long as they successfully completed all other graduation requirements.

Public opinion has been changing as it has been more widely understood that “passing” cut scores on standardized tests are in many ways aspirational, not realistic. Cut scores that determine children’s futures have not been based on some kind of scientifically determined amount of knowledge children must master; instead they have been set by politicians for the purpose of driving teachers to work harder and faster.  High stakes standardized testing has been particularly punitive for students who start much farther behind.

Here is Daniel Koretz, the Harvard University professor whose new book, The Testing Charade: Pretending to Make Schools Better, exposes the damage inflicted by high stakes testing: “One aspect of the great inequity of the American educational system is that disadvantaged kids tend to be clustered in the same schools. The causes are complex, but the result is simple: some schools have far lower average scores…. Therefore, if one requires that all students must hit the proficient target by a certain date, these low-scoring schools will face far more demanding targets for gains than other schools do. This was not an accidental byproduct of the notion that ‘all children can learn to a high level.’ It was a deliberate and prominent part of many of the test-based accountability reforms…. Unfortunately… it seems that no one asked for evidence that these ambitious targets for gains were realistic. The specific targets were often an automatic consequence of where the Proficient standard was placed and the length of time schools were given to bring all students to that standard, which are both arbitrary.” (pp. 129-130)

John Merrow and Thomas Toch Debate Michelle Rhee’s Strategy for Running Urban Schools

A debate about school reform has been raging on the pages of The Washington Monthly—between Thomas Toch, a defender of what is frequently called “corporate school reform” and John Merrow, the retired education reporter for the PBS NewsHour.  The subject: Washington, D.C. school reform as launched by Michelle Rhee and further evolved during the tenure of Kaya Henderson and others whom Henderson hired.  This now-old story about the D.C. public schools still matters, because the theories and practices introduced by Michelle Rhee a decade ago in the nation’s capital continue to drive the operation of urban school districts across the United States.

Thomas Toch formerly led the think tank Education Sector and now serves as the director of FutureEd, an education think tank at Georgetown University. The July-August, Washington Monthly published Toch’s  Hot for Teachers, a paean to what he believes is a decade of public school improvement between 2007 and 2016 in the nation’s capital. Toch is careful to point out that his subject is broader than Michelle Rhee’s tenure that ended with her resignation in October of 2010. As Toch describes the elevation of test scores across the District, however, and as he celebrates a crackdown on “bad teaching,” improved recruitment and retention of teachers, and broad-scale, data-driven school management, Toch’s rhetoric betrays a pro-corporate-school-reform bias, which must filtered as one reads his story:

Toch appreciates charter schools: “Some 43 percent of D.C. students were enrolled in charters in 2013, up from less than 15 percent a decade earlier.  Many of these schools, with names like DC Prep, KIPP DC, and Achievement Prep, were earning attention for their innovative strategies and strong results.  Foundations heaped money onto them, and the young talent entering teaching through prestigious pipelines like Teach for America were keen to work in the schools.” He also celebrates Michelle Rhee and Kaya Henderson’s strategy for working with school teachers: “Rhee’s successors at DCPS have redesigned teaching through some of the very policies that teachers’ unions and other Rhee adversaries opposed most strongly: comprehensive teacher evaluations, the abandonment of seniority-based staffing, and performance-based promotions and compensation.”  Before Rhee resigned, “Kaya Henderson, who had been Teach for America’s D.C. director and then managed Rhee’s New Teacher Project work in the city, supervised the project as the new chancellor’s chief of human capital.  She worked with Jason Kamras, a Princeton graduate who had arrived in Washington a decade earlier through Teach for America…. At the beginning of the 2009-10 school year, Henderson and Kamras launched the most comprehensive teacher measurement system ever implemented in public education.  It set citywide teaching standards for the first time ever… Under the new system, every teacher would be observed five times a year—three times by the administrators in their schools and twice by ‘master educators’ from the central office who would provide an independent check on principals’ ratings.”  Toch believes that fear is a useful strategy for making people work harder: “Rhee’s team deepened teachers’ angst…. Henderson ordered that student test scores make up 50 percent of teachers’ ratings if they taught tested subjects and grades. That turned out to be only 15 percent of the school system’s teaching force, but the move stoked anxiety and resentment throughout the city’s teaching ranks.”

Toch’s analysis continues beyond the transition from Chancellor Rhee to Chancellor Henderson. Noting that Henderson learned from Rhee’s mistakes, Toch emphasizes that after Rhee’s exit, Henderson introduced more support for good teaching—career ladders, for example, and collaboration among grade-level teams of teachers.  Toch does betray the top-down reformer’s bias, however: “There’s no doubt that the school reform stars aligned in Washington over the past decade: There was a rare infusion of talent in the central office; stable leadership enabled by mayoral control of city schools; freedom from key collective bargaining obstacles, and substantial funding, first from grants, then from savings from improvements in the city’s special education system.”

John Merrow, the retired PBS NewsHour reporter who has repeatedly investigated Michelle Rhee’s contentious tenure as the D.C. Chancellor, collaborated with Mary Levy to publish, in the September-October Washington Monthly, a rebuttal to Toch’s story.  Merrow has also expanded this story on his personal blog.  Merrow’s response to Toch centers on the Rhee years, because that is the subject Merrow knows best and because Merrow believes Toch’s distorted portrayal of a D.C. school improvement miracle is grounded in a biased understanding of Rhee’s troubled tenure.

Merrow points to gentrification as the source of much of the test score improvement in Washington, D.C.  He documents that achievement gaps by race, ethnicity and income have not closed: “Those results, however, stop looking so good once we disaggregate data about different groups of students.  Despite small overall increases, minority and low-income scores lag far behind the NAEP’s big-city average, and the already huge achievement gaps have actually widened.  From 2007 to 2015, the NAEP reading scores of low income eighth graders increased just 1 point, from 232 to 233, while scores of non-low-income students (called ‘others’ in NAEP-speak) climbed 31 points, from 250-282.  Over that same time period, the percentage of low-income students scoring at the ‘proficient’ level remained an embarrassingly low 8 percent, while proficiency among ‘others’ climbed from 22 to 53 percent.  An analysis of the data by race between 2007 and 2015 is also discouraging: black proficiency increased 3 points, from 8 percent to 11 percent, while Hispanic proficiency actually declined from 18 percent to 17 percent.  In 2007 the white student population was not large enough to be reported, but in 2015, white proficiency was at 75 percent.”

Merrow describes what he calls “central office bloat”: “Many of these highly paid non-teachers spend their days watching over teachers in scheduled and unscheduled classroom observations, generally lasting about thirty minutes…. Why so many of these teacher watchers?  Because those who subscribe to top-down management do not trust teachers.” Merrow bemoans the result: a collapse of morale along with widespread resignations of teachers and school leaders.  Some of this is because staff are being moved among schools, enhancing disruptive change, but he notes: “Unfortunately, the greatest upheavals are in schools serving large numbers of low-income children, kids who need stability wherever they can find it.”

As he re-posts his Washington Monthly article on his personal blog, Merrow adds several pages of what he has documented over the years in his investigation of a years’ long cheating scandal in Washington DC, a scandal exposed by U.S.A. Today in March of 2011, but, as Merrow has documented repeatedly, never investigated.  He castigates Toch for (in his July-August article) dismissing the extent of the pressure Rhee was placing on school principals and the widespread reach of the cheating.

Here is some of Merrow’s rebuttal: “Contrary to Toch’s assertions, cheating—in the form of suspiciously high rates of erasures of wrong answers and filling in the right ones—occurred in more than half of DCPS schools.  The changes were never thoroughly investigated beyond an initial analysis by the agency that had corrected the exams in the first place, CTB/McGraw-Hill.  Deep erasure analysis was never ordered by Rhee, her then deputy Henderson, or the mayor.  The ‘investigations’ Toch refers to were either controlled by Rhee and, later Henderson or conducted by inept investigators—and sometimes both… Rhee, and subsequently Henderson, tightly controlled the inquiries, limiting the number of schools that could be visited, the number of interviews that could be conducted, and even the questions, that could be asked.”

Merrow poses the essential question: “Why would so many schools be driven to cheat?  In her one-on-one meetings with all her principals, Rhee insisted that they guarantee test score increases and made it clear that failing to ‘make the numbers’ would have consequences.  The adults who subsequently changed answers, coached students during testing, and shared exams before the tests were intent on keeping their jobs, which depended on higher scores… The rookie Chancellor met one-on-one with all her principals and, in these meetings, made them guarantee test score increases. We filmed a number of these sessions, and saw firsthand how Rhee relentlessly negotiated the numbers up, while also making it clear that failing to ‘make the numbers’ would have consequences.”

Merrow dismisses Toch’s piece as corporate-school-reform hot air: “To remain aloft, a hot air balloon must be fed regular bursts of hot air.  Without hot air, the balloon falls to earth.  That seems to be the appropriate analogy for the District of Columbia Public Schools (DCPS) during the ten-year regime (2007-2016) of Chancellors Michelle Rhee and Kaya Henderson.  Their top-down approach to school reform might not have lasted but for the unstinting praise provided by influential supporters from the center left and right—their hot air.  The list includes the editorial page of the Washington Post, (and) former U.S. Secretary of Education Arne Duncan….”

Merrow dubs Toch’s article this summer as merely another draft of hot air.  He blasts Toch’s argument “that Rhee and Henderson revolutionized the teaching profession in D.C. schools, to the benefit of students. ”  And he calls Toch a cheerleader who, “obscures a harsh truth: on most relevant measures, Washington’s public schools have either regressed or made minimal progress under their leadership.  Schools in upper-middle-class neighborhoods seem to be thriving, but outcomes for low-income minority students—the great majority of enrollment—are pitifully low.”

Thomas Toch responds to Merrow’s allegations.  His response is printed by The Washington Monthly at the end of Merrow and Mary Levy’s report, Has D.C. Teacher Reform Been Successful?

Harvard’s Daniel Koretz Indicts High Stakes Testing in “The Testing Charade”

Daniel Koretz’s new book, The Testing Charade: Pretending to Make Schools Better, is a scathing indictment of our society’s test-and-punish school regime, formalized in the 2002 No Child Left Behind Act and continuing in the most recent version of the federal education law, the Every Student Succeeds Act.  Koretz, the testing specialist, is not so critical of standardized testing itself as he is of the high stakes sanctions that Congress attached to the annual tests in No Child Left Behind—punishments that have driven massive pressure on educators that has ruined our public schools:

“Pressure to raise scores on achievement tests dominates American education today. It shapes what is taught and how it is taught.  It influences the problems students are given in math class (often questions from earlier tests), the materials they are given to read, the essays and other work they are required to produce, and often the manner in which teachers grade this work. It determines which educators are rewarded, punished, and even fired. In many cases it determines which students are promoted or graduate. This is the result of decades of ‘education reforms’ that progressively expanded the amount of externally imposed testing and ratcheted up the pressure to raise scores.” (p. 1)

Daniel Koretz’s biography at the Harvard Graduate School of Education describes him as an expert on educational assessment and testing policy, and the book describes in considerable detail just how high stakes punishments for schools and teachers have corrupted the results of the tests themselves, narrowed the curriculum, and degraded teaching.

But my deepest interest in the book is Koretz’s depiction of how the testing that was supposed force teachers and schools to better serve poor children, raise their test scores and close achievement gaps has instead truncated opportunity for the very children it was supposed to help. How has test-and-punish narrowed the curriculum to basic reading and math in the poorest schools, and how has it forced teachers to focus on test-prep and coaching instead of enrichment?  How has test-and-punish forced the closing or charterizing of schools in poor neighborhoods? How has evaluating teachers by their students’ test scores resulted in firing principals and teachers in the poorest schools and exacerbated staff turnover?  And what about the children being held back in third grade due to a test score—even when they may be making real progress in reading and the adolescents denied a high school diploma?

Under current federal law, students and schools are given credit for proficiency only when children reach benchmark proficiency scores. A fourth grader who advances during the school year from a first to a third grade reading level will still fail to achieve the fourth grade cut score. Neither the child nor the teacher will be given credit for the child’s improvement: “One aspect of the great inequity of the American educational system is that disadvantaged kids tend to be clustered in the same schools. The causes are complex, but the result is simple: some schools have far lower average scores…. Therefore, if one requires that all students must hit the proficient target by a certain date, these low-scoring schools will face far more demanding targets for gains than other schools do. This was not an accidental byproduct of the notion that ‘all children can learn to a high level.’ It was a deliberate and prominent part of many of the test-based accountability reforms…. Unfortunately… it seems that no one asked for evidence that these ambitious targets for gains were realistic. The specific targets were often an automatic consequence of where the Proficient standard was placed and the length of time schools were given to bring all students to that standard, which are both arbitrary.” (pp. 129-130)

Reformers decided that, if sufficiently pressured to raise test scores, teachers would be able to do so: “(T)hey acted as if… (schools alone could) largely eliminate variations in student achievement, ignoring the impact of factors that have nothing to do with the behavior of educators—for example, the behavior of parents, students’ health and nutrition, and many characteristics of the communities in which students grow up.” (p. 123-124) Koretz explains at length and in detail the ways that teachers and principals whose jobs are threatened have resorted to raising scores—coaching for the test, drilling on materials likely to be covered, and in some cases where the pressure was greatest, cheating by erasing and correcting scores.

Koretz quotes Linda Darling-Hammond’s characterization of test-and-punish school accountability: “the kick the dog harder model of education reform.” And he explains: “If we are going to make real headway, we are going to have to confront the simple fact that many teachers will need substantial supports if they are going to markedly improve the performance of their students… And the range of services needed is broad. One can’t expect students’ performance in schools to be unaffected by inadequate nutrition, insufficient health care, home environments that have prepared them poorly for school, or violence on the way to school.” (p. 201)  He suggests first that we stop judging all students and schools by benchmark scores. We must “set goals based on students’ growth, not the level of their performance.” (p. 235)

In the Washington Post, Valerie Strauss interviews Koretz about his new book, and she publishes an excerpt.

While I have emphasized the sections in which Koretz shows test-and-punish hurting the schools that serve the poorest and most vulnerable children, Koretz is a testing expert, whose primary interest is how high stakes punishments attached to a regime of universal testing have corrupted the entire operation of public schools: “Reformers may take umbrage and say that they certainly didn’t demand that teachers cheat. They didn’t, although in fact many policy makers actively encouraged bad test prep that produced fraudulent gains. What they did demand was unrelenting and often very large gains that many teachers couldn’t produce through better instruction, and they left them with inadequate supports as they struggled to meet these often unrealistic targets. They gave many educators the choice I wrote about thirty years ago—fail, cut corners, or cheat—and many chose not to fail.” (p.244)

Koretz joins a growing number of critics who indict test-and-punish school accountability. What is significant about this book is the thorough and relentless critique by a testing expert who carefully and sometimes technically dissects the evidence.