We’ll Have to Reduce Test-and-Punish. Talking about Social Emotional Learning Isn’t Enough

Silly me!  I didn’t realize until a couple of weeks ago that SEL is a thing.  SEL is a new term in educational circles: Social Emotional Learning.  I heard Linda Darling-Hammond—Stanford University emeritus professor, CEO of the Learning Policy Institute, and chair of an Aspen Institute National Commission on Social, Emotional, and Academic Development—present the work of the commission, and then I started reading more about Social Emotional Learning (SEL).

It would appear that many of the educational academics promoting SEL are doing so as an effort to shift our schools’ focus away from the incessant drilling on basic language arts and math that has been driven by the high-stakes testing embedded in the 2002 No Child Left Behind (NCLB).  NCLB and Race to the Top, that compounded NCLB’s punitive grasp on our public schools, have created fear-driven pressure to raise scores at any cost. The stakes are high: Schools have been closed or charterized, teachers fired or their salaries cut, and school districts trapped in state takeover.  And worse—in terms of the social and emotional health of children—students whose reading scores are too low at the end of third grade have been retained in grade for an extra remedial year.

The Learning Policy Institute has been intent about trying to help state education departments take advantage of the way the 2015 Every Student Succeeds Act (ESSA) tweaks accountability.  ESSA eliminates direct federal punishments for low test scores by turning accountability over to states, but it says states must have their own plans to hold public schools accountable.  Beyond the required reporting of test scores and graduation rates, states can now add new factors, as long as the new factors are research-based. For example, the Learning Policy Institute has been explaining how research backs up the establishment of wraparound Community Schools.  Its publications have shown states how to demonstrate through research that Community Schools are a worthy of inclusion in states’ dashboards of factors by which schools can be judged and held accountable.

Now, it would appear that Darling-Hammond’s support of Social Emotional Learning, through her leadership on the Aspen SEL Commission, is an attempt to help states position SEL as a factor in their Every Student Succeeds dashboards by which schools can be held accountable.  In Education Week a year ago after Aspen released coverage of its new SEL Commission, Evie Blad reported: “The new federal education law requires schools to report new factors, like chronic absenteeism rates, in their public report cards, and it requires states to broaden how they measure school success.  No state decided to include direct measures of social-emotional learning in its accountability system.  Most cited cautions from researchers who’ve said existing measures are not sophisticated enough to be used for high-stakes purposes.  But mindfulness of students’ emotions, relationships, and development can help schools show improvement in other areas covered by the law, like attendance and achievement commissioners said.”  The Aspen Commission, we should assume, hopes its new report will beef up the research base on SEL.

I suppose it s worth establishing a research base to support education of the whole child if in some way measuring SEL will help states be more humane in evaluating what is being accomplished at school.  However, it is also essential to remember that the Every Student Succeeds Act makes two other factors primary in the states’ ESSA accountability reports: standardized test scores and high school graduation rates.  I wonder if inserting Social Emotional Learning right on top of test-and-punish doesn’t merely represent a contradiction in strategies. And figuring out metrics by which a state can judge how a district is doing at SEL and then holding schools accountable for SEL in the state’s accountability system seems bizarre.

Some of the puzzling language in the Aspen Institute Commission’s report is about showing states and school districts how to measure SEL so that it will count for school accountability: “Develop and use measures to track progress across school and out-of-school settings, with a focus on continuous improvement rather than rewards and sanctions.”  So far the advice seems pretty positive compared to what we’re doing now which is focusing on rewards and sanctions. But the report later vaguely suggests some kind of measurable outcomes: “Use a broader range of assessments and other demonstrations of learning that capture the full gamut of young people’s knowledge and skills… Use data to identify and address gaps in students’ access to the full range of learning opportunities in and out of school.”

Recently in his personal blog, the writer and education professor at UCLA, Mike Rose raised concerns about Social Emotional Learning: “(D)o we need all these studies to demonstrate what any good teacher knows: that the nature and quality of the relationship between teachers and students matter?… More broadly I worry that as we pay needed attention to the full scope of a child’s being, we will inadvertently reinforce the false dichotomy between thought and emotion.”

Rose harks back to a piece he wrote in 2013 in which he worried that, “Under No Child Left Behind and Race to the Top, cognition in education policy has increasingly come to mean the skills measured by standardized tests of reading and mathematics.  And as economists have gotten more involved in education, they’ve needed quantitative measures of cognitive ability and academic achievement for their analytical models….”  Rose worries about dividing education into a “cognitive/non-cognitive binary.”  “The problem is exacerbated by the aforementioned way economists carve up and define mental activity.  If cognition is represented by scores on ability or achievement tests, then anything not captured in those scores—like the desired qualities of character—is, de facto, non-cognitive.  We’re now left with a pinched notion of cognition and a reductive dichotomy to boot.”

For Rose, social and emotional work must be an essential part of every teacher’s daily practice—and something children learn in their experience of schooling. In an excellent 2014 article published by The American Scholar, Rose describes the characteristics of the best classrooms he visited on a journey across the United States to research his fine book, Possible Lives: “For all the variation… the classrooms shared certain qualities… The classrooms were safe. They provided physical safety, which in some neighborhoods is a real consideration.  But there was also safety from insult and diminishment… And there was safety to take intellectual risks… Intimately related to safety is respect, a word I heard frequently during my travels.  It meant many things: politeness, fair treatment, and beyond individual civility, a respect for the language and culture of the local population… Respect also has a cognitive dimension.  As a New York principal put it, ‘It’s not just about being polite—even the curriculum has to be challenging enough that it’s respectful.’  Talking about safety and respect leads to a consideration of authority… A teacher’s authority came not just with age or with the role, but from multiple sources—knowing the subject, appreciating students’ backgrounds, and providing a safe and respectful space.  And even in traditionally run classrooms, authority was distributed.  Students contributed to the flow of events, shaped the direction of discussion, became authorities on the work they were doing.  These classrooms, then, were places of expectation and responsibility… (O)verall the students I talked to, from primary-grade children to graduating seniors, had the sense that their teachers had their best interests at heart and their classrooms were good places to be.”

The people who are trying to make Social Emotional Learning part of states’ Every Student Succeeds accountability dashboards undoubtedly have good intentions. They are trying, once again to make normal child development and attention to the needs of the whole child primary goals in America’s public school classrooms.  Unfortunately, however, because standardized test scores and high school graduation rates—both highly measurable data sets—remain at the very center of ESSA’s federal demand for school accountability, Social Emotional Learning will always be on the side.

To improve the social and emotional climate in our schools today, we’ll need do go after what is really the problem—what Harvard’s Daniel Koretz calls “the testing charade.”

Advertisements

Striking Schoolteachers Have Changed the National Conversation about Our Public Schools

The editor of Current Affairs, Nathan Robinson offers a profound critique of President Barack Obama and Education Secretary Arne Duncan’s signature education policy, Race to the Top.  Race to the Top epitomized neoliberalism—“meritocratic, technocratic, and capitalistic, meaning that it (1) sees competition as good and winning competitions as proof of desert, (2) defers to policy experts over the actual people affected by policies, (3) views productivity and success within the marketplace as a measure of the good.”

Robinson reminds us that Race to the Top, “gave $4.3 billion in funding to U.S. schools through a novel mechanism: Instead of giving out the aid based on how much a state’s schools needed it, the Department of Education awarded it through a competition.  Applications ‘were graded on a 500-point scale according to the rigor of the reforms proposed and their compatibility with four administration priorities: developing common standards and assessments; improving teacher training, evaluation, and retention policies; creating better data systems; and adopting preferred school turnaround strategies.’… The Obama administration also wanted states to adopt policies favorable to charter schools. Education secretary Arne Duncan said explicitly that, ‘States that do not have public charter laws or put artificial caps on the growth of charter schools will jeopardize their applications under the Race to the Top Fund.'”

Robinson condemns the Obama-Duncan strategy: “There is something deeply objectionable about nearly every part of Race to the Top.  First, the very idea of having states scramble to compete for federal funds means that children are given additional support based on how good their state legislatures are at pleasing the president, rather than how much those children need support.  Michigan got no Race to the Top money, and Detroit’s schools didn’t see a penny of this $4.3 billion, because it didn’t win the ‘race.’  This ‘fight to the death’ approach… was novel, since ‘historically, most federal funds have been distributed through categorical grant programs that allocate money to districts on the basis of need-based formulas.’ Here, though, one can see how Obama’s neoliberal politics differed in its approach from the New Deal liberalism of old: Once upon a time, liberals talking about how to fix schools would talk about making sure all teachers had the resources they needed to give students a quality education.  Now, they were importing the competitive capitalist model into government… There is a mistrust of teachers: The premise here is that unless teachers have the right incentives, they will perform badly. There is an underlying acceptance here of the free market principle that government services do not perform well because they lack the kind of economic rewards and punishments that exist in the private sector.  So we should introduce competitive marketplaces in schools (i.e. charterize the system) and do constant assessment of teacher job performance to weed out the Bad Teachers.  Race to the Top literature talks about ‘turning around failing schools,’ not ‘fixing inequality in schools’….”

Although lots of people have been complaining about Race to the Top and Duncan’s strategy for years, Robinson’s jeremiad strikes a different chord this year after months of walkouts and strikes by desperate school teachers. Last week, the NY Times education reporter Dana Goldstein described what she believes is a major turning in the way people are thinking about public education.  She has been writing about schools for 13 years, beginning in the era—the precursor to Arne Duncan’s Race to the Top—of No Child Left Behind, the law signed in 2002 that brought us high stakes test-and-punish. But she observes today: “So much has changed in education, as the focus shifts from calling out and overhauling bad teachers and schools to listening more carefully to what educators say about their working conditions and how students are affected by them… The emphasis now is on what education experts call ‘inputs’—classroom funding, teacher pay, and students’ access to social workers and guidance counselors—and less on ‘outputs,’ like test scores or graduation rates.”

In their strikes this year schoolteachers have forced policymakers to stop obsessing about punishing  low-scoring, “loser” schools and begin reckoning with society’s responsibility to pay for the kind of schools our children need.

One striking example of the shift in emphasis that Goldstein describes is the story of Debora Gist.  For Politico and the Hechinger Report, Amadou Diallo profiles the transformation of Deborah Gist, formerly the Rhode Island Commissioner of Education who made her name by firing the entire staff of the high school in Central Falls, one of Rhode Island’s poorest communities. Gist also won the state a $75 million Race to the Top award by promising to comply with Arne Duncan’s neoliberal priorities. Unpopular in Rhode Island and especially unpopular with unionized school teachers, Gist returned to her hometown, Tulsa, Oklahoma as the superintendent of schools. Gist’s priorities began to change when she faced an acute shortage of teachers in a state where salaries are fourth from the bottom among all the states. Dialo reports that 300 of the district’s 2,000 teachers are working under emergency certificates because salaries are too low to attract qualified staff.  Last spring, when schoolteachers walked out, Gist herself joined unionized teachers to walk 110 miles from Tulsa to Oklahoma City to demand better funding for the state’s schools.  She is also dipping into the school district’s emergency reserves to pay basic expenses.  While Gist refuses to acknowledge that she has entirely left her Rhode Island priorities behind, she describes the lessons that have taught her to become an ally instead of an enemy of her district’s teachers: “I knew coming into Tulsa that Oklahoma spent less than half per student of what Rhode Island did… What I didn’t anticipate was the continued cuts we’d be receiving.  I didn’t fully realize what that would mean in terms of the lack of adults in our schools… and the pressure that creates.”

The most extraordinary evidence that the teachers’ strikes are forcing a rethinking of education policy, however, came last week in Los Angeles.  The settlement of the recent strike by 30,000 Los Angeles teachers brought concessions including a modest raise, smaller classes and the guarantee of more support staff like counselors, librarians and school nurses. But the teachers demanded something more: They insisted on a vote by Los Angeles’s charter school-friendly board of education on a resolution requesting that the state legislature place an 8-10 month moratorium on new charter schools while a study is conducted on the impact of charter schools on the public school district.

The Los Angeles school board did take such a vote last week, and the Washington Post‘s Valerie Strauss describes the outcome: “The school board voted Tuesday to ratify the strike-ending deal between the Los Angels Unified School District and United Teachers of Los Angeles.  The new contract provides teachers with 6 percent pay increases, more resources for schools and small reductions in class size. The strike ended with other agreements too, including what many saw as a surprising promise by the school district to support a state moratorium of up to 10 months on charter schools while the state studies their effects.  The Los Angeles Board of Education has six members, at least half of whom were elected with the help of financial support from the charter lobby. The district superintendent, Austin Beutner, is a former investment banker who is a charter backer.”

For Salon Jeff Bryant explains the details of this development: “(T)he concessions teachers won that will likely have the most impact outside of L.A. are related to charter schools.  The teachers forced the district leader to present to the school board a resolution calling on the state to cap the number of charter schools, and the teachers made the district give their union increased oversight of charter co-locations—a practice that allows charter operations to take possession of a portion of an existing public school campus.  Los Angeles Unified has 277 charter schools, the largest number of charter schools of any school district in the nation. The schools serve nearly 119,000 students, nearly one in five students.  The vast majority of charters are staffed by non-union teachers.  So the quick take from some is the teachers’ union made curbs on charter schools part of their demands because these schools are a threat to the union’s power. But when you talk to teachers, that’s not what they say. They tell you they want to curb charter school growth, not because it threatens their union, but because charters threaten the very survival of public schools…. (T)eachers I spoke with described competition from surrounding charter schools as an existential threat to their schools and an undermining influence on the public system.”

Bryant describes the growing realization across Los Angeles, and backed up by recent academic research that, “While public school districts can’t build new schools unless increases in enrollment or an influx of school-aged children demand them, charter schools can make the case based on subjective arguments having nothing to do with numbers, and when local school boards deny charter applicants, charter operators can appeal to the county or state board that, more often than not, overrules the local board…  Bryant quotes researcher—and former charter school supporter—Julian Vasquez Heilig: “Charters contribute to the funding problems because we’re paying for two school systems… There’s an incredible amount of waste and inefficiency.”

Linda Darling-Hammond Disappoints in Cleveland City Club Address

Linda Darling-Hammond is a national figure in the field of education policy.  She is the President and CEO of the Learning Policy Institute at Stanford University, where she is an emeritus professor of education, and she headed up President Obama’s transition team for education. She is the author of several books including The Flat World and Education, in which she declares: “One wonders what we might accomplish as a nation if we could finally set aside what appears to be our de facto commitment to inequality so profoundly at odds with our rhetoric of equity, and put the millions of dollars spent continually arguing and litigating into building a high quality education system for all children.” (p. 164)

Last Friday, Darling-Hammond delivered the weekly address at the Cleveland City Club.  I was disappointed.

Darling-Hammond declared that “we have left No Child Left Behind (NCLB) behind” and implied that its 2015 replacement, the federal Every Student Succeeds Act, has erased the punitive philosophy of its NCLB predecessor.  Darling-Hammond then devoted most of her prepared remarks to Ohio’s adoption of one of her own research priorities—social-emotional learning—into the state’s new five-year strategic plan for education.  Darling-Hammond chaired the Aspen Institute’s National Commission on Social, Emotional, and Academic Development, which on January 15, 2019 published its final report, From a Nation at Risk to a Nation at Hope.

Of course one cannot blame an academic for focusing a major policy address on her own particular research interest. But I was disappointed nonetheless, because Darling-Hammond’s remarks so completely neglected what I and many others believe are alarming realities today in Ohio public school policy. More broadly she also failed to acknowledge catastrophic school funding shortages brought to national attention by striking school teachers for almost a year now from West Virginia to Oklahoma to Arizona and in the past two weeks in Los Angeles, funding shortages caused by tax cuts and tax freezes and exacerbated when scarce tax dollars are redirected to privatized charter schools and voucher programs. Only after she had finished her prepared remarks and in answer to a question about Ohio’s punitive state school district takeovers, did she briefly comment on the enormous and controversial policies many in the audience hoped she would address.

Despite that Darling-Hammond told us she believes the kind of punitive high-stakes school accountability prescribed by No Child Left Behind is fading, state-imposed sanctions based on aggregate standardized test scores remain the drivers of Ohio public school policy. Here are some of our greatest challenges:

  • Under a Jeb Bush-style Third Grade Guarantee, Ohio still retains third graders for another year of third grade when their reading test scores are too low. This is despite years of academic research demonstrating that retaining children in a grade for an additional year smashes their self esteem and exacerbates the chance they will later drop out of school without graduating.  This policy runs counter to anything resembling social-emotional learning.
  • Even though the federal government has ended the Arne Duncan requirement that states use students’ standardized test scores to evaluate teachers, in Ohio, students’ standardized test scores continue to be used for the formal evaluations of their teachers.  The state has reduced the percentage of weight students’ test scores play in teachers’ formal evaluations, but students’ test scores continue to play a role.
  • Aggregate student test scores remain the basis of the state’s branding and ranking of our public schools and school districts with letter grades—A-F,  with attendant punishments for the schools and school districts that get Fs.
  • When a public school is branded with an F, the students in that so-called “failing” school qualify for an Ed Choice Voucher to be used for private school tuition. And the way Ohio schools are funded ensures that in most cases, local levy money in addition to state basic aid follows that child.
  • Ohio permits charter school sponsors to site privately managed charter schools in so-called “failing” school districts. The number of these privatized schools is expected to rise next year when a safe-harbor period (that followed the introduction of a new Common Core test) ends.  Earlier this month, the Plain Dealer reported: “Next school year, that list of ineffective schools (where students will qualify for Ed Choice Vouchers) balloons to more than 475… The growth of charter-eligible districts grew even more, from 38 statewide to 217 for next school year. Once restricted to only urban and the most-struggling districts in Ohio, charter schools can now open in more than a third of the districts in the state.”
  •  If a school district is rated “F” for three consecutive years, a law pushed through in the middle of the night by former Governor John Kasich and his allies subjects the district to state takeover. The school board is replaced with an appointed Academic Distress Commission which replaces the superintendent with an appointed CEO.  East Cleveland this year will join Youngstown and Lorain, now three years into their state takeovers—without academic improvement in either case.
  • All this punitive policy sits on top of what many Ohioans and their representatives in both political parties agree has become an increasingly inequitable school funding distribution formula. Last August, after he completed a new study of the state’s funding formula, Columbus school finance expert, Howard Fleeter described Ohio’s current method of funding schools to the Columbus Dispatch: “The formula itself is kind of just spraying money in a not-very-targeted way.”

Forty-two minutes into the video of last Friday’s City Club address by Darling-Hammond, when a member of the Ohio State Board of Education, Meryl Johnson asked the speaker to comment on Ohio’s state takeovers of so called “failing” school districts, Darling-Hammond briefly addressed the tragedy of the kind of punitive systems that now dominate Ohio’s public school policy: “We have been criminalizing poverty in a lot of different ways, and that is one of them… There’s about a .9 correlation between the level of poverty and test scores.  So, if the only thing you measure is the absolute test score, then you’re always going to have the high poverty communities at the bottom and then they can be taken over.” But rather than address Ohio’s situation directly, Darling-Hammond continued by describing value-added ratings of schools which she implied could instead be used to measure what the particular school contributes to learning, and then she described the educational practices in other countries she has studied.

In the context of the new report of the National Commission on Social, Emotional, and Academic Development, which she chaired, Darling-Hammond’s focus last Friday was social-emotional learning. The Commission’s new report emphasizes the need to broaden “the definition of student success to prioritize he whole child.”  The report recommends that our society: “Develop and use measures to track progress across school and out-of-schools settings, with a focus on continuous improvement rather than on rewards and sanctions.”

I wish Darling-Hammond had more pointedly applied the Commission’s findings to Ohio, where, while people applaud the goal, there have been serious questions about whether Ohio’s addition of social-emotional learning in the state’s new five-year strategic plan is workable in our underfunded and terribly punitive, high stakes testing environment. Some of the factors that affect a school’s capacity to support the social and emotional needs of students are small classes that ensure students are known and respected, enough counselors and school psychologists, the presence of the arts and enrichments, and the presence of play in the school lives of very young children. Ohio’s meager school funding and emphasis on high-stakes testing threaten all of these.

In these times we need to be especially attentive to the social and emotional needs of America’s students as the federal Department of Education steps away from policies designed to protect students’ safety and emotional well being. Remember that at the end of December, Education Secretary Betsy DeVos rescinded urgently important Obama-era civil rights guidance designed to reduce out of school suspension and expulsion, reduce racial disparities in suspension and expulsion, and increase in-school programs promoting restorative discipline.  Ohio’s new strategic plan to prioritize social-emotional learning in public schools is an important first nudge—pushing our state away from No Child Left Behind’s test-and-punish. But there remains a long, long list of urgently needed policy changes. I wish Linda Darling-Hammond had been more supportive of our struggle in her address last Friday.

U.S. Public Education Is Driven by High-Stakes Testing. Are the Proficiency Cut-Scores Legitimate?

Back in 2005, I worked with members of the National Council of Churches Committee on Public Education and Literacy to develop a short resource, Ten Moral Concerns in the No Child Left Behind Act. While closing achievement gaps seemed an important goal, to us it seemed wrong that—according to an unrelenting year-by-year Adequate Yearly Progress schedule—the law blindly held teachers and schools accountable for raising all children’s test performance to the test score targets set by every state. Children come to school with such a wide range of preparation, and achievement gaps are present when children arrive in Kindergarten.  At that time, we expressed our concern this way:

“Till now the No Child Left Behind Act has neither acknowledged where children start the school year nor celebrated their individual accomplishments. A school where the mean eighth grade math score for any one subgroup grows from a third to a sixth grade level has been labeled a “in need of improvement” (a label of failure) even though the students have made significant progress. The law has not acknowledged that every child is unique and that Adequate Yearly Progress (AYP) thresholds are merely benchmarks set by human beings. Although the Department of Education now permits states to measure student growth, because the technology for tracking individual learning over time is far more complicated than the law’s authors anticipated, too many children will continue to be labeled failures even though they are making strides, and their schools will continue to be labeled failures unless all sub-groups of children are on track to reach reading and math proficiency by 2014.”

Of course today we know that the No Child Left Behind Act was supposed to motivate teachers to work harder to raise scores. Policymakers hoped that if they set the bar really high, teachers would figure out how to get kids over it.  It didn’t work.  No Child Left Behind said that all children would be proficient by 2014 or their school would be labeled failing. Finally as 2014 loomed closer, Arne Duncan had to give states waivers to avoid what was going to happen if the law had been enforced: All American public schools would have been declared “failing.”

Despite the failure of No Child Left Behind,  members of the public, the press, and the politicians across the 50 statehouses who implemented the testing requirements of No Child Left Behind continue to accept the validity of high stakes testing. Politicians, the newspaper reporters and editors who report the scores, and the general public trust the supposed experts who set the cut scores.  That is why states still rank and rate public schools by their test scores and legislators pass laws to punish  low-scoring schools and teachers. That is why on Wednesday this blog commented on Ohio’s plan to expand EdChoice vouchers for students in low-scoring schools and add charters in low-scoring school districts. The list of “failing” schools where students will qualify for vouchers will rise next school year in Ohio from 218 to 475. The list of charter school-eligible districts will grow from 38 to 217.

In response to the continuation of test-and-punish, I’ve been quoting Daniel Koretz’s book, The Testing Charade about the fact that testing cut scores are arbitrary and  punishments unfair:  “One aspect of the great inequity of the American educational system is that disadvantaged kids tend to be clustered in the same schools. The causes are complex, but the result is simple: some schools have far lower average scores—and, particularly important in this system, more kids who aren’t ‘proficient’—than others. Therefore, if one requires that all students must hit the proficient target by a certain date, these low-scoring schools will face far more demanding targets for gains than other schools do…  Unfortunately… it seems that no one asked for evidence that these ambitious targets for gains were realistic. The specific targets were often an automatic consequence of where the proficient standard was placed and the length of time schools were given to bring all students to that standard, which are both arbitrary.”  (The Testing Charade, pp. 129-130)

As a blogger, I am not an expert on how test score targets—the cut scores—are set, but Daniel Koretz devotes an entire chapter of his book, “Making Up Unrealistic Targets,” to this subject.  Here is how he begins:  “If one doesn’t look too closely, reporting what percentage of students are ‘proficient’ seems clear enough. Someone somehow determined what level of achievement we should expect at any given grade—that’s what we will call ‘proficient’—and we’re just counting how many kids have reached that point. This seeming simplicity and clarity is why almost all public discussion of test scores is now cast in terms of the percentage reaching either the proficient standard, or occasionally, another cut score… The trust most people have in performance standards is essential, because the entire educational system now revolves around them. The percentage of kids who reach the standard is the key number determining which teachers and schools will be rewarded or punished.”  (The Testing Charade, p. 120)

After emphasizing that benchmark scores are not scientifically set and are in fact all arbitrary, Koretz examines some of the methods. The “bookmark” method, he explains, “hinges entirely on people’s guesses about how imaginary students would perform on individual test items… (P)anels of judges are given a written definition of what a standard like “proficient” is supposed to mean.”  Koretz quotes from Nebraska’s definition of reading comprehension: “A student scoring at the Meets the Standards level generally utilizes a variety of reading skills and strategies to comprehend and interpret narrative and informational text at grade level.” After enumerating some of the specific skills and strategies listed in Nebraska, Koretz adds a qualification to the way Nebraska describes its methodology: “A short digression: the emphasized word generally is very important. One of the problems in setting standards is that students are inconsistent in their performance.” (The Testing Charade, pp. 121-122) (Emphasis in the original.)

Koretz continues: “There is another, perhaps even more important, reason why performance standards can’t be trusted: there are many different methods one can use, and there is rarely a really persuasive reason to select one over the other. For example, another common approach, the Angoff method… is like the bookmark in requiring panelists to imagine marginally proficient students, but in this approach they are not given the order of difficulty of the items or a response probability. Instead panelists have to guess the percentage of imaginary marginally proficient students who would correctly answer every item in the test. Other methods entail examining and rating actual student work, rather than guessing the performance of imaginary students on individual items.  Yet other methods hinge on predictions of later performance—for example, in college. There are yet others. This wouldn’t matter if these different methods gave you at least roughly similar results, but they often don’t.  The percentage of kids deemed to be ‘proficient’ sometimes varies dramatically from one method to another.  This inconsistency was copiously documented almost thirty years ago, and the news hasn’t gotten any better.” (The Testing Charade, pp.123-124)

Koretz continues his warning: “However, setting the standards themselves is just the beginning. What gives the performance standards real bite is their translation into conrcete targets for educators, which depends on more than the rigor of the standard itself.  We have to say just who has to reach the threshold. We have to say how quickly performance has to increase—not only overall but for different types of kids and schools. A less obvious but equally important question is how much variation in performance is acceptable… A sensible way to set targets would be to look for evidence suggesting how rapidly teachers can raise achievement by legitimate means—that is, by improving instruction, not by using bad test prep, gaming the system, or simply cheating…  However, the targets in our test-based accountability systems have often required unremitting improvements, year after year, many times as large as any large-scale change we have seen.” (The Testing Charade, pp. 125-126)

Koretz concludes: “(I)t is clear that the implicit assumption undergirding the reforms is that we can dramatically reduce the variability of achievement… Unfortunately, all evidence indicates that this optimism is unfounded.  We can undoubtedly reduce variations in performance appreciably if we summoned the political will and committed the resources to do so—which would require a lot more than simply imposing requirements that educators reach arbitrary targets for test scores.” (The Testing Charade, p. 131)

Faith in High Stakes Testing Fades, Even Among the Corporate School Reformers

After a recent twenty-fifth anniversary conference at the Center on Reinventing Public Education at the University of Washington, Bothell—a Gates funded education-reformer think tank, Chalkbeat‘s Matt Barnum summarized presentations by a number of speakers who demonstrate growing skepticism about the high-stakes, standardized testing regime that has dominated American public education for over a quarter of a century.

Because the Center on Reinventing Public Education is known as an advocate for portfolio school reform and corporate accountability, you might expect adherence to the dogma of test-and-punish, but, notes Barnum:  “The pervasiveness of the complaints about testing was striking, given that many education reform advocates have long championed using test scores to measure schools and teachers and then to push them to improve.”

Then at a Massachusetts Institute of Technology School Access and Quality Summit early this month, Paymon Rouhanifard presented a major policy address challenging the use of high stakes testing to rank and rate public schools.  Rouhanifard was until very recently Chris Christy’s appointed, school-reformer superintendent in Camden, New Jersey.  Formerly he was the director in New York City of Joel Klein’s Office of Portfolio Management.  Rouhanifard describes the belief system he brought with him to Camden and describes how his five-year tenure as Camden’s superintendent transformed his thinking: “Our belief was that politics and bureaucracy had inhibited the progress Camden students and families deserved to overcome the steep challenges the city was facing…  We believed it was important for the district to segue out of being a highly political monopoly operator of schools….  This is a story about an evolution of my own thinking during that five-year experience…. What I’m referring to are the math and literacy student achievement data we utilize to drive so many of the critical decisions we make… My realization a few years ago was that I rarely asked questions about what these tests actually told us.  What they didn’t tell us.  And perhaps most importantly, what were the specific behaviors they incentivized, and what were the general trade-offs when we acutely focus on how students do on state tests.”

In 2013, at the beginning of his tenure, Rouhanifard introduced a school report card that rated each school primarily by students’ standardized test scores. Two years ago Rouhanifard eliminated his own school report cards.  He describes his realization: “We are spending an inordinate amount of time on formative and interim assessments and test prep, because those are the behaviors we have incentivized.  We are deprioritizing the sciences, the arts, and civic education…. I… believe the drawbacks currently outweigh the benefits.  That we haven’t been honest about the trade-offs.”

Shael Polakow-Suransky, like Rouhanifard, held a position in Joel Klein’s “reformer” school administration in New York City.  Now the president of Bank Street College of Education, he was formerly Klein’s former deputy schools chancellor. Barnum explains that Polakow-Suransky has become an emphatic critic of the nation’s high-stakes standardized testing regime: “The biggest barrier to student learning and closing the achievement gap is the current system of standardized tests.”

In a piece at The74, the  Thomas Fordham Institute’s Robert Pondiscio quotes Polakow-Suransky: “All of us were well-intentioned in pushing this agenda, but the tools we developed were not effective in raising the bar on a wide scale.”

While the Thomas Fordham Institute has endorsed corporate school reform including high-stakes, test-based accountability, Fordham’s Pondiscio now acknowledges that under the Every Student Succeeds Act, U.S. public schools have become mired in an education culture defined by test-based accountability.  Though he seems unclear on the way forward, Pondiscio now advocates for serious reconsideration: “The challenge is not testing vs. not testing.  It’s not accountability vs. none.  Both bring benefits of different kinds, and both are required by a federal law that’s not going to change anytime soon.  The challenge is to develop a policy vision that supports—not thwarts—the classroom practices and long-term student outcomes we seek… The problem is the reductive culture of testing, which has come to shape and define American education, particularly in the kinds of schools attended by our most disadvantaged children.”

There are some who remain faithful to the school reformer dogma. The Center on Reinventing Public Education’s Robin Lake tries to change the subject: “We need a more productive debate about school accountability, not tired arguments over testing.” And Matt Barnum quotes Sandy Kress—still a tried-and-true believer in the No Child Left Behind regime he helped create: “Research shows clearly that accountability made a real difference in this country in narrowing the achievement gap and lifting student achievement.”

Of course, research does not clearly show that Sandy Kress’s kind of No Child Left Behind accountability made a real difference.  Here is Harvard’s Daniel Koretz, in the authoritative book he published a year ago, The Testing Charade: Pretending to Make Schools Better.  It is perhaps this volume by an academic expert on testing that has helped change the minds of some of the corporate school reformers quoted above.  Koretz writes: “It is no exaggeration to say that the costs of test-based accountability have been huge.  Instruction has been corrupted on a broad scale.  Large amounts of instructional time are now siphoned off into test-prep activities that at best waste time and at worst defraud students and their parents.  Cheating has become widespread.  The public has been deceived into thinking that achievement has dramatically improved and that achievement gaps have narrowed.  Many students are subjected to severe stress, not only during testing but also for long periods leading up to it.  Educators have been evaluated in misleading and in some cases utterly absurd ways  Careers have been disrupted and in some cases ended.  Educators, including prominent administrators, have been indicted and even imprisoned.  The primary benefit we received in return for all of this was substantial gains in elementary-school math that don’t persist until graduation.  This is true despite the many variants of test-based accountability the reformers have tried, and there is nothing on the horizon now that suggests that the net effects will be better in the future. On balance, then, the reforms have been a failure.” (The Testing Charade, pp. 191-192)

Introducing readers to Don Campbell, “one of the founders of the science of program evaluation,” Koretz defines the problems inherent in our society’s quarter century of high-stakes, test-and-punish school accountability by quoting Campbell’s Law:  “The more any quantitative social indicator is used for social decision making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intend to monitor.”  Campbell directly addresses the problem of high stakes testing to rank and rate schools:  “Achievement tests may well be valuable indicators of … achievement under conditions of normal teaching aimed at general competence. But when test scores become the goal of the teaching process, they both lose their value as indicators of educational status and distort the educational process in undesirable ways.” (The Testing Charade, pp. 38-39)

How has the testing regime operated perversely to undermine the schools serving our society’s most vulnerable children—the ones we were told No Child Left Behind would catch up academically if only we created incentives and punishments to motivate their teachers to work harder?  “One aspect of the great inequity of the American educational system is that disadvantaged kids tend to be clustered in the same schools.  The causes are complex, but the result is simple: some schools have far lower average scores—and, particularly important in this system, more kids who aren’t ‘proficient’—than others.  Therefore, if one requires that all students must hit the proficient target by a certain date, these low-scoring schools will face far more demanding targets for gains than other schools do.  This was not an accidental byproduct of the notion that ‘all children can learn to a high level.’  It was a deliberate and prominent part of may of the test-based accountability reforms… Unfortunately… it seems that no one asked for evidence that these ambitious targets for gains were realistic  The specific targets were often an automatic consequence of where the proficient standard was placed and the length of time schools were given to bring all students to that standard, which are both arbitrary.”  (The Testing Charade, pp. 129-130)

Besides imposing unreasonable and damaging punishments on the schools and teachers serving our society’s poorest children, Koretz believes our commitment to a regime of punitive testing has distracted our society from developing the commitment to address the real needs of children and schools in places where poverty is concentrated: “We can undoubtedly reduce variations in performance appreciably, if we summoned the political will and committed the resources to do so—which would require a lot more than simply imposing requirements that educators reach arbitrary targets for test scores.” The Testing Charade, p. 131)

Northwestern University Economist Uses Data to Prove Students’ Test Scores Fail to Measure Quality Teaching

Mike Rose, a UCLA education professor, understands a lot about teaching.  In his extensive writing about education, Rose explains good teaching with precision and insight.  Rose culminated a four year visit to excellent classrooms across the United States with the publication of the story of those teachers in Possible Lives. He has also written widely about what good teachers do and what ought to be considered when teachers are evaluated.

Rose explains: “Teaching done well is complex intellectual work, and this is so in the primary grades as well as Advanced Placement physics.  Teaching begins with knowledge of subject matter, of instructional materials and technologies, of cognitive and social development.  But it’s not just that teachers know things.  Teaching is using knowledge to foster the growth of others. This takes us to the heart of what teaching is…. The teacher sets out to explain what a protein or a metaphor is, or how to balance the terms in an algebraic equation, or the sociological dynamics of prejudice, but to do so needs to be thinking about how to explain these things: what illustrations, what analogies, what alternative explanations when the first one fails.  This instruction is done not only to convey particular knowledge about metaphors or algebraic equations, but also to get students to understand and think about these topics.  This involves hefty cognitive activity, … but the teacher is doing it with a room full of young people—which brings a significant performative dimension to the task.”

Rose continues: “Thus teaching is a deeply social and emotional activity. You have to know your students and be able to read them quickly, and from that reading make decisions to slow down or speed up, stay with a point or return to it later, connect one student’s comment to another’s.  Simultaneously, you are assessing on the fly Susie’s silence, Pedro’s slump, Janelle’s uncharacteristic aggressiveness.  Students are, to varying degrees, also learning from each other, learning all kinds of things, from how to carry oneself to how to multiply mixed numbers. How teachers draw on this dynamic interaction varies depending on their personal style, the way they organize their rooms, and so on—but it is an ever-present part of the work they do.”

Rose further describes characteristics of the classrooms created by excellent teachers and what he has observed about how teachers continue to improve their practice through their careers: “The classrooms were safe. They provided physical safety…. but there was also safety from insult and diminishment… And there was safety to take intellectual risks… Intimately related to safety is respect… Respect also has a cognitive dimension.  As a New York principal put it, ‘It’s not just about being polite—even the curriculum has to be challenging enough that it’s respectful.’  Talking  about safety and respect leads to a consideration of authority… A teacher’s authority came not just with age or with the role, but from multiple sources—knowing the subject, appreciating students’ backgrounds, and providing a safe and respectful space. And even in traditionally run classrooms, authority was distributed. Students contributed to the flow of events, shaped the direction of discussion, became authorities on the work they were doing. These classrooms, then, were places of expectation and responsibility.”

As Rose discusses the characteristics of good teaching, he is not evaluating teachers by standardized test scores in language arts and mathematics. Arne Duncan’s U.S. Department of Education made the evaluation of teachers by their students’ test scores a condition for states’ qualifying for Race to the Top grants and No Child Left Behind Waivers.  Major research bodies—the American Statistical Association and the American Education Research Association— have, however, condemned the use of test scores and econometric, value-added-measures of teacher quality due to their unreliability.

Now Northwestern University economist, C. Kirabo Jackson, has developed an econometric model demonstrating that students’ test scores including those based on value-added models miss the most important characteristics of teachers—the sort of qualitative characteristics Mike Rose so clearly describes.  In EducationNext, Jackson explains: “I find that, while teachers have notable effects on both test scores and non-cognitive skills, their impact on non-cognitive skills is 10 times more predictive of students’ longer-term success in high school than their impact on test scores. We cannot identify the teachers who matter most by using test-score impacts alone, because many teachers who raise test scores do not improve non-cognitive skills, and vice versa. These results provide hard evidence that measuring teachers’ impact through their students’ test scores captures only a fraction of their overall effect on student success.”

Jackson creates a behavior index to measure whether students act out, skip class or fail to hand in homework.  He concludes: “A student whose 9th-grade behavior index is at the 85th percentile is a sizable 15.8 percentage points more likely to graduate from high school on time than a student with a median behavior index score.  I find a weaker relationship with test scores…”

Jackson then studies specific teachers and whether the same teachers demonstrate facility at raising students’ language arts and math scores, on the one hand, and improving their behavior, on the other.  His data demonstrate that, “(M)any teachers who are excellent at improving one skill are poor at improving the other, but also that knowing a teacher’s impact on one skill provides little information on the teacher’s impact on the other.”  You will need to read Jackson’s new article to learn his methodology for evaluating teachers’ impact on students’ behavior.

Jackson concludes: “These results confirm an idea that many believe to be true but that has not been previously documented—that teacher effects on test scores capture only a fraction of their impact on their students. The fact that teacher impacts on behavior are much stronger predictors of their impact on longer-run outcomes than test-score impacts, and that teacher impacts on test scores and those on behavior are largely unrelated, means that the lion’s share of truly excellent teachers, those who improve long-run outcomes—will not be identified using test-score value added alone… This analysis provides the first hard evidence that such contributions to student progress are both measurable and consequential.”

While I’m delighted to know that Jackson’s research has exposed a tragic flaw in the practice of judging teachers by the standardized test scores of their students, I hope this new research does not stimulate policy makers to start demanding that states use Jackson’s methodology to measure teachers’ impact on student behavior.

Thank goodness, Jackson himself suggests a more qualitative approach: “To fully assess teacher performance, policymakers should consider measures of a broad range of student skills, classroom observations, and responsiveness to feedback alongside effective ratings based on test scores.”  In other words Jackson acknowledges that experts on pedagogy—people like Mike Rose—know what they are talking about when they analyze and describe excellent teaching.

School Ratings Not Only Tell You Little about Schools, They Contribute to Economic Segregation

Jack Schneider, a professor and education historian at the College of the Holy Cross and director of research for the Massachusetts Consortium for Innovative Education Assessment, points out that the school district in Boston, Massachusetts encourages parents to choose from among the public schools across the district.  In a short commentary,  State School Rankings ‘Virtually Worthless,’ Schneider explains that many parents make that choice pretty much based on overall school ratings assigned by the state.

How does Massachusetts calculate its school ratings?  “Each year, the state classifies schools into one of five levels, with the ‘highest performing’ designated Level 1. This practice, though distinct in its details, is in keeping with what is done in the vast majority of states. The theory behind such rankings, whether devised as numerical scores, A-F grades, or narrative labels, is that parents and communities want a clear and simple indicator of school quality. Unfortunately, there are… flaws that make these levels virtually worthless. The first and most obvious problem with state-issued ratings of schools is that they are based primarily on a flawed measure: student standardized test scores.”

Schneider believes such school “grades,” “report cards” and rating systems show parents very little about the quality of schools. Schneider explains all the factors about school quality that test-based ratings omit: “Last fall, MassINC conducted a poll of Boston parents and found that more than two-thirds of them identified as ‘very important’ or ‘extremely important’ all of the following: the quality of the teachers and administrators; school safety and discipline; the school’s academic programming; college and career readiness; class sizes; facility quality; the values promoted by the school; the school’s approach to discipline; and the diversity of the teachers and administrators. These critical dimensions of school quality are mostly ignored in the vast majority of statewide rating systems….”

Also, explains Schneider, “(S)chools are not uniformly good or bad. As most of us know from experience, schools—as structures, organizations, and communities—have different strengths and weaknesses. Schools that are struggling in some ways may be thriving in others. And schools with illustrious reputations often have a lot to work on.”

And finally, Schneider names the reality that school ratings are shaping our society: “Perhaps most importantly, ratings shape the decisions parents make about where to live and where to send their children to school.”  Although Schneider does not explore the details of this important observation,  academic research demonstrates the reasons why school ratings are likely to reinforce growing housing segregation by family income.

Over a half century of sociological research (dating back to the landmark 1966 Coleman report) demonstrates a strong correlation between overall school achievement and aggregate family income. When states rate schools by their aggregate test scores, the schools whose students are wealthy tend to get an A, and the schools serving very poor children too frequently get a D or an F.  Here are academic experts discussing how test scores reflect a community’s aggregate economic level, not school quality.

In 2011, the Stanford University educational sociologist Sean Reardon showed here that while in 1970, only 15 percent of families lived in neighborhoods classified as affluent or poor, by 2007, 31 percent of families lived in such neighborhoods. By 2007, fewer families across America lived in mixed income communities. Reardon also demonstrated here that along with growing residential inequality is a simultaneous jump in an income-inequality school achievement gap. The achievement gap between the children with income in the top ten percent and the children with income in the bottom ten percent, was 30-40 percent wider among children born in 2001 than those born in 1975, and twice as large as the black-white achievement gap.

Based on Reardon’s research, in a 2016 report from the National Education Policy Center warning against the continued reliance on No Child Left Behind’s strategy of testing children, rating schools by scores, and punishing the schools and teachers unable quickly to raise scores, William Mathis and Tina Trujillo caution policymakers: “We cannot expect to close the achievement gap until we address the social and economic gaps that divide our society… Low test scores are indicators of our social inequities… Otherwise, we would not see our white and affluent children scoring at the highest levels in the world and our children of color scoring equivalent to third-world countries.  We also would not see our urban areas, with the lowest scores and greatest needs, funded well below our highest scoring suburban schools. With two-thirds of the variance in test scores attributable to environmental conditions, the best way of closing the opportunity gap is through providing jobs and livable wages across the board. We must also deal with governmentally determined housing patterns that segregate our children… One of the frequently heard phrases used to justify annual high-stakes disaggregated assessment is that ‘shining a light’ on deficiencies of particular groups will prompt decision-makers to increase funding, expand programs, and ensure high quality. This has not happened. Shining a light does not provide the social and educational learning essentials for our neediest children.”

William Mathis and Kevin Welner, in another 2016 National Education Policy Report, summarize what was misguided about school accountability policy imposed by No Child Left Behind and the Every Student Succeeds Act: “As policymakers and the courts abandoned desegregation efforts and wealth moved from cities to the suburbs, most of the nation’s major cities developed communities of concentrated poverty, and policymakers gave the school districts serving those cities the task of overcoming the opportunity gaps created by that poverty. Moreover, districts were asked to do this with greatly inadequate funding. The nation’s highest poverty school districts receive ten percent lower funding per student while districts serving children of color receive 15 percent less. This approach, of relying on under-resourced urban districts to remedy larger societal inequities, has consistently failed.”

How does this relate to test-based school accountability?  Last fall, in The Testing Charade: Pretending to Make Schools Better, Harvard University’s Daniel Koretz explains: “One aspect of the great inequity of the American educational system is that disadvantaged kids tend to be clustered in the same schools. The causes are complex, but the result is simple: some schools have far lower average scores…. Therefore, if one requires that all students must hit the proficient target by a certain date, these low-scoring schools will face far more demanding targets for gains than other schools do. This was not an accidental byproduct of the notion that ‘all children can learn to a high level.’ It was a deliberate and prominent part of many of the test-based accountability reforms…. Unfortunately… it seems that no one asked for evidence that these ambitious targets for gains were realistic. The specific targets were often an automatic consequence of where the Proficient standard was placed and the length of time schools were given to bring all students to that standard, which are both arbitrary.” (pp. 129-130) Policymakers decided that, if sufficiently pressured to raise test scores, teachers would be able to do so: “(T)hey acted as if… (schools alone could) largely eliminate variations in student achievement, ignoring the impact of factors that have nothing to do with the behavior of educators—for example, the behavior of parents, students’ health and nutrition, and many characteristics of the communities in which students grow up.” (p. 123-124)

Test-and-punish accountability since 2002, when No Child Left Behind was enacted, has condemned as “failing” the poorest schools and school districts whose test scores, according to academic research, are undermined by the economic circumstances of their communities and families. In lock-step, states have bought in to holding schools accountable and exacerbated the problem by ranking schools with numerical rankings or letter grades—again based on standardized test scores—that encourage wealthier families who can afford it to move to affluent communities that brag about A-rated schools and to abandon the schools in poor communities. For sixteen years, school accountability policies mandated by federal and state governments have been contributing to the economic resegregation of America’s metropolitan areas.