Education Expert Demonstrates Why Gov. Youngkin’s Attack on Virginia’s Public Schools Is Wrong

I suspect that Glenn Youngkin, the governor of Virginia, knows very little, really, about public education.  He was an investment banker before he became a politician, and his children attend the elite, private Georgetown Prep. But Youngkin knows how to build political capital by frightening parents and the general public about so-called failures in the state’s public schools. He campaigned last year by promoting the racist idea that parents need more control over their kids’ schools to prevent the children’s being frightened or upset by the injustices that have scarred American history. And now, he has begun using test score data to try to paint the state’s public schools as failing.

The problem is that this time, as he tries to use the state’s scores on the “nation’s report card,” the National Assessment of Education Progress ( NAEP), to prove there is something drastically wrong with Virginia’s public schools, he and his so-called experts who just castigated the state’s schools in a new report seem to have misread the meaning of the test scores they denigrate. Youngkin’s claim is that too few Virginia students achieve the “proficient” cut score on the NAEP.

For the Washington Post, Hannah Natanson and Laura Vozzella report: “The Virginia Department of Education painted a grim picture of student achievement in the state in a report released Thursday, asserting that children are performing poorly on national assessments in reading and math and falling behind peers in other states.  The 34-page report on students’ academic performance, requested as part of Gov. Glenn Youngkin’s first executive order, says these trends are especially pronounced among Black, Hispanic and low-income students. The report further critiques what it calls school districts’ lack of transparency regarding declining student performance—and it laments parents’ ‘eroding’ confidence in the state’s public schools.”  The Youngkin administration’s new report contends that Virginia has been expecting too little of its public school students—that, while Virginia’s state test, the Standards of Learning or SOL, shows the state’s students are doing well, Virginia’s NAEP scores show the states’ students are not really “proficient.”

But Youngkin’s report ignores years of discussion about what the “proficient” achievement level on the National Assessment of Educational Progress really means.  In her 2013 book, Reign of Error, Diane Ravitch who once served on the NAEP’s Governing Board, took the trouble to explain: “All definitions of education standards are subjective…  People who set standards use their own judgment to decide the passing mark on a test. None of this is science.” Ravitch explains further precisely how the NAEP Governing Board has always defined the difference between the “proficient” standard and the “basic” standard: “‘Proficient’ represents solid achievement. The National Assessment Governing Board (NAGB)… defines it as ‘solid academic performance for each grade assessed. This is a very high level of academic performance. Students reaching this level have demonstrated competency over challenging subject matter, including subject matter knowledge, application of such knowledge to real-world situations, and analytical skills appropriate to the subject matter.’… ‘Basic,’ as defined by NAGB, is ‘partial mastery of prerequisite knowledge and skills that are fundamental for proficient work at each grade.'” Ravitch concludes that according to the NAEP standard: “a student who is ‘proficient’ earns a solid A and not less than a strong B+” while “the student who scores ‘basic’ is probably a B or C student.” (Reign of Error, p. 47)

Daniel Koretz, a Harvard University expert on the construction of standardized tests and their uses for high stakes school accountability devotes an entire chapter of his 2017 book, The Testing Charade, to the topic, “Making Up Unrealistic Targets.”  Koretz describes exactly how Glenn Youngkin appears to be manipulating the meaning of NAEP cut scores as an argument for blaming the schools and pressuring educators to prep students to improve test scores at any cost: “In a nutshell, the core of the approach has been simply to set an arbitrary performance target (the ‘Proficient’ standard) and declare that all schools must make all students reach it in an equally arbitrary amount of time…. (A)lmost all public discussion of test scores is now cast in terms of the percentage reaching either the proficient standard, or occasionally, another cut score… This trust in performance standards, however is misplaced… (I)n fact, despite all the care that goes into creating them, these standards are anything but solid. They are arbitrary, and the ‘percent proficient’ is a very slippery number.” (The Testing Charade, pp. 119-121)

Natanson and Vozzella report that Virginia’s educators immediately pushed back against Youngkin’s new report: “The superintendent of Alexandria City Public Schools, Gregory C, Hutchings Jr., said the report inspired him to navigate to the NAEP website, where he discovered that Virginia students had consistently scored above the national average. ‘So, I’m not really understanding the whole premise of this report…. (which) was around us performing so much lower than everyone else.'”

Fortunately, last Friday, right after Youngkin’s report was released, the Washington Post‘s Valerie Strauss published a column by James Harvey, the recently retired executive director of the National Superintendents Roundtable.  Harvey scathingly criticizes the National Assessment Governing Board (NAGB) for its confusing definition of “proficient.”  Like a lot of federal policy after Reagan’s 1983, A Nation at Risk report, which blamed the public schools for widespread mediocrity and and became the basis for standards-based school reform, the NAGB set its proficiency targets to drive higher expectations. Harvey writes: “Proficient doesn’t mean proficient. Oddly, NAEP’s definition of proficiency has little or nothing to do with proficiency as most people understand the term. NAEP experts think of NAEP’s standard as ‘aspirational.’ In 2001, two experts associated with NAGB made it clear that: ‘The proficient achievement level does not refer to ‘at grade’ performance. Nor is performance at the Proficient level synonymous with ‘proficiency’ in the subject. That is, students who may be considered proficient in a subject, given the common usage of the term, might not satisfy the requirements for performance at the NAEP achievement level.”

Harvey summarizes the decades-long controversy about National Assessment of Educational Progress cut scores: “What is striking in reviewing the history of NAEP is how easily its policy board has shrugged off criticisms about the standards-setting process. The critics constitute a roll call of the statistical establishment’s heavyweights…  (T)he likes of the National Academy of Education, the Government Accounting Office, the National Academy of Sciences, and the Brookings Institution have issued scorching complaints that the benchmark-setting processes were ‘fundamentally flawed,’ ‘indefensible,’ and ‘of doubtful validity,’ while producing ‘results that are not believable.'”

Harvey continues: “How unbelievable? Fully half the 17-year-olds maligned as being just basic by NAEP obtained four-year college degrees. About one-third of Advanced Placement Calculus students, the creme de la creme of American high school students, failed to meet the NAEP proficiency benchmark. While only one-third of American fourth-graders are said to be proficient in reading by NAEP, international assessments of fourth-grade reading judged American students to rank as high as No. 2 in the world. For the most part, such pointed criticism from assessment experts has been greeted with silence from NAEP’s policy board.”

In her introduction to Harvey’s piece, Valerie Strauss explains: “Youngkin isn’t the first politician to misinterpret NAEP scores and then use that bad interpretation to bash public schools.” Please do read Strauss’s introduction and James Harvey’s fine column to better understand how high stakes standardized testing has been used politically to drive a kind of school reform that manipulates big data but has little relevance to expanding educational opportunity.

Advertisement

We Must Renew Efforts to End High-Stakes “Test and Punish” in U.S. Public Schools

As an opponent of federally mandated high-stakes standardized tests in the public schools, I have been worrying that, after educators were unsuccessful last year in pressing Education Secretary Miguel Cardona to stop the testing for the 2020-2021 school year during the pandemic, many opponents of test-based accountability have pretty much stopped pushing back on the testing.

In a column last week for Education Week, Rick Hess worries that supporters of high stakes testing are also struggling.  Rick Hess is the “public school accountability hawk” scholar-in-residence at the American Enterprise Institute. He writes: “During the pandemic, I’ve talked to a lot of educational leaders and advocates who believe in the importance of testing and school accountability—but feel like they’re swimming upstream in their efforts to maintain support for these issues. I’ve been struck at how tough many of them have found it to navigate the shifting political currents.”

If advocates on both sides of the school accountability debate are worried that COVID has drawn the public’s attention away from the effects of standardized testing on public education, it seems like a good time to renew advocacy for eliminating annual testing as the driving force in our public schools.

Hess’s subject in his recent column is the federally required administration of standardized achievement tests every year for all students in grades 3-8 and once in high school. The policy was put in place in 2002 by No Child Left Behind and continued in 2015 when Congress passed the Every Student Succeeds Act. For two decades, proponents like Hess have described testing’s goal as holding schools accountable by imposing sanctions on the schools unable quickly to raise the aggregate test scores of their student populations.

Hess acknowledges more problems with standardized testing than I would have expected: “I suspect the current struggles are healthy—they’re a reminder of how much the momentum and machinery of the Clinton-Bush-Obama era allowed testing advocates to coast. Backed by federal mandates, huge foundation dollars, and media allies, they talked in sweeping assertions about the importance of testing and accountability. They’d insist that testing was the key to leaving no child behind… That reading and math tests revealed achievement gaps and that this was crucial to closing them. That the right standards would provide a foundation for the right tests, permitting complex teacher and school evaluation systems to drive system improvement… (T)esting has real shortcomings. State tests aren’t designed to improve instruction. The results don’t come back for months, and parents don’t get any actionable feedback from them.”

Despite his complaints about big problems with test-based school accountbility, however, Hess continues to believe that advocates must strengthen and improve their advocacy for continuing annual high-stakes testing: “Testing and accountability advocates can no longer count on being carried forward by powerful political patrons or deep-pocketed foundations. And, after multiple years of pandemic waivers, they can no longer count on Washington ordering states to hold the line. This should serve as a call to think anew about how to make the case for testing… It’s an opportunity to revisit how to ensure testing really is serving the needs of students, parents, and educators—and learn how to explain that in a distrustful era.”

The problem with Hess’s argument is that he fails to show that high-stakes testing accomplishes any positive purpose, and he neglects to identify much of the damage thrust upon our schools and our society by “test and punish” school accountability.

Making the strongest case against annual standardized testing is Daniel Koretz, the Harvard University expert on the construction of standardized tests and their uses at school. Koretz’s book, The Testing Charade: Pretending to Make Schools Better, written for a wide audience, is the most important book examining how high stakes testing has wrecked our public schools. Koretz cites something called Campbell’s Law to explain what No Child Left Behind brought us twenty years ago: “The more any quantitative social indicator is used for social decision making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor… Achievement tests may well be valuable indicators of… achievement under conditions of normal teaching aimed at general competence. But when test scores become the goal of the teaching process, they both lose their value as indicators of educational status and distort the educational process in undesirable ways.”(The Testing Charade, pp. 38-39)

Koretz explains what happened to teaching and learning when policymakers attached high stakes to achievement tests that had been designed simply to measure what students are learning. The new purpose was accountability—creating consequences for the schools and the teachers in schools where scores failed quickly to rise. There are a number of ways high-stakes testing narrows the curriculum: “(T)he tested samples of content and skills are not fully representative, either of the goals of schooling broadly or of student achievement more narrowly…(H)igh-stakes testing creates strong incentives to focus on the tested sample rather than the domain it is intended to represent.” (The Testing Charade, p. 16-19)

Federally mandated high-stakes testing in U.S. public schools focuses only on math and reading: “The often unspoken premise of the reformers was that somehow… other subjects, such as history, civics, art, and music, aspects of math and reading that are hard to measure with standardized tests, and ‘softer’ things such as engaging instruction, love of learning, and ability to work in groups—would somehow take care of itself. It didn’t, and that shouldn’t have surprised anyone.  The second reason for the failure is that the system is very high-pressure… Narrowness and high pressure are a very potent combination… A third critical failure of the reforms is that they left almost no room for human judgment. Teachers are not trusted to evaluate students or each other, principals are not trusted to evaluate teachers, and the judgment of professionals from outside the school has only a limited role. What the reformers trust is ‘objective’ standardized measures. This was not accidental.” The Testing Charade, pp. 32-33)

Koretz explains how schools and school districts discovered ways to inflate their scores through test prep and drilling on the material that predictably appears on the tests year after year. But test prep hasn’t been the only consequence. Sometimes schools held struggling middle school students back a grade to prevent their being tested on the high school test. Sometimes teachers were caught providing students with the answers on the tests and in some places teachers were found to have erased and changed students’ answers on the tests. One instance of outright cheating happened in Washington, D.C. under Michelle Rhee, and in Atlanta, the superintendent and many educators were indicted.

Koretz explains that the high-stakes testing regime was particularly punitive for the schools serving the poorest children: “One aspect of the great inequity of the American educational system is that disadvantaged kids tend to be clustered in the same schools. The causes are complex, but the result is simple: some schools have far lower average scores…. Therefore, if one requires that all students must hit the proficient target by a certain date, these low-scoring schools will face far more demanding targets for gains than other schools do. This was not an accidental byproduct of the notion that ‘all children can learn to a high level.’ It was a deliberate and prominent part of many of the test-based accountability reforms…. Unfortunately… it seems that no one asked for evidence that these ambitious targets for gains were realistic. The specific targets were often an automatic consequence of where the Proficient standard was placed and the length of time schools were given to bring all students to that standard, which are both arbitrary.” (The Testing Charade, pp. 129-130)

What about the effects of high-stakes testing in society beyond the classroom?  No Child Left Behind imposed federal punishments by requiring that staffs at low scoring schools be reconstituted by firing principals and half the staff, or by requiring that schools be charterized, privatized, or shut down.  Education Secretary Arne Duncan used Race to the Top to force states to tie teachers’ evaluations to students’ test scores. In 2015, Congress replaced No Child Left Behind with the Every Student Succeeds Act and stopped imposing federally established harsh sanctions, but ESSA continues—in 2022—to require that every year all the states state must submit plans embodying sanctions to hold the lowest-scoring five percent of public schools accountable.

Here are some of the broader effects of ESSA. Today the federal government continues to require states to rank and rate schools based primarily on standardized test scores. The ranking and rating of schools brands low scoring school districts—usually the districts serving concentrations of poor children—as “failing” and drives middle class flight to wealthier exurbs, thereby accelerating racial and economic segregation. Some states continue to take over low-scoring schools and school districts and turn these districts over to appointed overseers or commissions.  School districts continue to shut down low-scoring schools. Many states locate charter schools and grant voucher eligibility in low scoring school districts. And even though researchers have demonstrated that students’ test scores are an unreliable and invalid way to evaluate teachers and despite that the federal government no longer requires states to use test scores for teacher evaluation, many states haven’t taken the trouble to repeal policies that evaluate teachers by their students’ scores. Many states continue to hold students back in third grade if their reading scores are low, and some states base high school graduation on the state test even for students who have successfully completed all of their required courses.

Rick Hess calls on proponents of high-stakes standardized testing “to think anew about how to make the case for testing.”  I call on opponents of standardized testing to present the reams of academic research documenting the damage wrought by federally mandated, test-based school accountability and to intensify pressure for the elimination of high-stakes testing in U.S. public schools.

New Research Yet Again Proves the Folly of Judging Teachers by Their Students’ Test Scores

The Obama Administration’s public education policy, administered by Secretary of Education Arne Duncan, was deeply flawed by its dependence on technocracy. In the 1990s, Congress had been wooed by researchers who had developed the capacity to produce giant, computer-generated data sets. What fell out of style in school evaluations were personal classroom observations by administrators who were more likely to notice the human connections that teachers and children depended on for building trusting relationships to foster learning.

Technocratic policy became law in 2002, when President George W. Bush signed the omnibus No Child Left Behind Act. Technocratic policy reached its apogee in 2009 as Arne Duncan’s Race to the Top grant program became a centerpiece of the federal stimulus bill passed by Congress to ameliorate the 2008 Great Recession.

In an important 2014 article, the late Mike Rose, a professor of education, challenged the dominant technocratic ideology.  He believed that excellent teaching cannot be measured by the number of correct answers any teacher’s students mark on a standardized test. Rose reports: The “classrooms (of excellent teachers) were safe. They provided physical safety…. but there was also safety from insult and diminishment…. Intimately related to safety is respect…. Talking about safety and respect leads to a consideration of authority…. A teacher’s authority came not just with age or with the role, but from multiple sources—knowing the subject, appreciating students’ backgrounds, and providing a safe and respectful space. And even in traditionally run classrooms, authority was distributed…. These classrooms, then, were places of expectation and responsibility…. Overall the students I talked to, from primary-grade children to graduating seniors, had the sense that their teachers had their best interests at heart and their classrooms were good places to be.”

In her 2012 book, Reign of Error, Diane Ravitch reviews the technocratic strategy of Arne Duncan’s Race to the Top. To qualify for a federal grant under this program, states had to promise to evaluate public school teachers by the standardized test scores of their students: “Unfortunately, President Obama’s Race to the Top adopted the same test-based accountability as No Child Left Behind. The two programs differed in one important respect: where NCLB held schools accountable for low scores, Race to the Top held both schools and teachers accountable. States were encouraged to create data systems to link the test scores of individual students to individual teachers. If the students’ scores went up, the teacher was an ‘effective’ teacher; if the students’ scores did not go up, the teacher was an ‘ineffective’ teacher  If schools persistently had low scores, the school was a ‘failing’ school, and its staff should be punished.” (Reign of Error, p. 99).

Ravitch reminds readers of a core principle: “The cardinal rule of psychometrics is this: a test should be used only for the purpose for which it is designed. The tests are designed to measure student performance in comparison to a norm; they are not designed to measure teacher quality or teacher ‘performance.'” (Reign of Error, p. 111)

This week, Education Week‘s Madeline Will covers major new longitudinal research documenting what we already knew: that holding teachers accountable for raising their students’ test scores neither improved teaching nor promoted students’ learning:

“Nationally, teacher evaluation reforms over the past decade had no impact on student test scores or educational attainment. ‘There was a tremendous amount of time and billions of dollars invested in putting these systems into place and they didn’t have the positive effects reformers were hoping for.’ said Joshua Bleiberg, an author of the study and a postdoctoral research associate at the Annenberg Institute for School Reform at Brown University… A team of researchers from Brown and Michigan State Universities and the Universities of Connecticut and North Carolina at Chapel Hill analyzed the timing of states’ adoption of the reforms alongside district-level student achievement data from 2009 to 2018 on standardized math and English/language arts test scores. They also analyzed the impact of the reforms on longer-term student outcomes including high school graduation and college enrollment. The researchers controlled for the adoption of other teacher accountability measures and reform efforts taking place around the same time, and found that their results remained unchanged. They found no evidence that, on average, the reforms had even a small positive effect on student achievement or educational attainment.”

Arne Duncan is no longer the U.S. Secretary of Education. And in 2015, Congress replaced the No Child Left Behind Act with a different federal education law, the Every Student Succeeds Act (ESSA), in which Congress permitted states more latitude in how they evaluate schoolteachers. So why is this new 2021 research so urgently important?  Madeline Will reports, “Evaluation reform has already changed course. States overhauled their teacher-evaluation systems quickly, and many reversed course within just a few years.”  Will adds, however, that in 2019,  34 states were still requiring “student-growth data in teacher evaluations.”

In 2019, for the Phi Delta Kappan, Kevin Close, Audrey Amrein-Beardsley, and Clarin Collins surveyed teacher evaluation systems across the states.  Many states still evaluate teachers according to how much each teacher adds to a student’s learning as measured by test scores, a statistic called the Value-Added Measure (VAM).  Practices across the states are slowly evolving: “While the legacy of VAMs as the ‘objective’ student growth measure remains in place to some degree, the definition of student growth in policy and practice is also changing. Before ESSA, student growth in terms of policy was synonymous with students’ year-to-year changes in performance on large-scale standardized tests (i.e., VAMs). Now, more states are using student learning objectives (SLOs) as alternative or sole ways to measure growth in student learning or teachers’ impact on growth. SLOs are defined as objectives set by teachers, sometimes in conjunction with teachers’ supervisors and/or students, to measure students’ growth. While SLOs can include one or more traditional assessments (e.g., statewide standardized tests), they can also include nontraditional assessments (e.g., district benchmarks, school-based assessments, teacher and classroom-based measures) to assess growth. Indeed, 55% (28 of 51) of states now report using or encouraging SLOs as part of their teacher evaluation systems, to some degree instead of VAMs.”

The Every Student Succeeds Act eased federal pressure on states to evaluate teachers by their students’ scores, but five years since its passage, remnants of these policies linger in the laws of many states.  Once bad policy based on technocratic ideology has become embedded in state law, it may not be so easy to change course.

In a profound book, The Testing Charade: Pretending to Make Schools Better, the Harvard University psychometrician, Daniel Koretz explains succinctly why students’ test scores cannot possibly separate “successful” from “failing” schools and why students’ test scores are an inaccurate and unfair standard for evaluating teachers:

“One aspect of the great inequity of the American educational system is that disadvantaged kids tend to be clustered in the same schools. The causes are complex, but the result is simple: some schools have far lower average scores…. Therefore, if one requires that all students must hit the proficient target by a certain date, these low-scoring schools will face far more demanding targets for gains than other schools do. This was not an accidental byproduct of the notion that ‘all children can learn to a high level.’ It was a deliberate and prominent part of many of the test-based accountability reforms…. Unfortunately… it seems that no one asked for evidence that these ambitious targets for gains were realistic. The specific targets were often an automatic consequence of where the Proficient standard was placed and the length of time schools were given to bring all students to that standard, which are both arbitrary.” (The Testing Charade, pp. 129-130)

Closing Achievement Gaps Will Require Closing Opportunity Gaps Outside of School

Last week this blog highlighted Advocates for Children of New York’s new report documenting that more than 10 percent of the over one million students in the New York City Public Schools—101,000 students—are homeless. These students are living in shelters, doubled up with friends or relatives, or living in cars and parks. What are the academic challenges for these homeless children and other children living in families with minimum wage employment, unemployment, unstable housing, food insecurity and inadequate medical care?

Although federal law continues to require that states measure the quality of schools and school districts with standardized tests, all sorts of research documents that students’ standardized test scores are indicators of their life circumstances and not a good measure of the quality of their public schools. Students concentrated in poor cities or scattered in impoverished and remote rural areas are more likely to struggle academically no matter the quality of their public school.

Here are just two examples of this research.

In 2017, Katherine Michelmore of Syracuse University and Susan Dynarski of the University of Michigan studied data from Michigan to identify the role of economic disadvantage in achievement gaps as measured by test scores: “We use administrative data from Michigan to develop a… detailed measure of economic disadvantage… Children who spend all of their school years eligible for subsidized meals have the lowest scores, whereas those who are never eligible have the highest. In eighth grade, the score gap between these two groups is nearly a standard deviation.” “Sixty percent of Michigan’s eighth graders were eligible for subsidized lunch at least once during their time in public schools. But just a quarter of these children (14% of all eighth graders) were economically disadvantaged in every year between kindergarten and eighth grade… Ninety percent of the test score gap we observe in eighth grade between the persistently disadvantaged and the never disadvantaged is present by third grade.”

In How Schools Really Matter: Why Our Assumption about Schools and Inequality Is Mostly Wrong, Douglas Downey, a professor of sociology at The Ohio State University describes academic research showing that evaluating public schools based on standardized test scores is unfair to educators and misleading to the public: “It turns out that gaps in skills between advantaged and disadvantaged children are largely formed prior to kindergarten entry and then do not grow appreciably when children are in school.” (How Schools Really Matter, p. 9) “Much of the ‘action’ of inequality therefore occurs very early in life… In addition to the fact that achievement gaps are primarily formed in early childhood, there is another reason to believe that schools are not as responsible for inequality as many think. It turns out that when children are in school during the nine-month academic year, achievement gaps are rather stable. Indeed, sometimes we even observe that socioeconomic gaps grow more slowly during school periods than during summers.” (How Schools Really Matter, p. 28)

In the context of this research, Downey examines the six indicators the Ohio Department of Education uses to evaluate public schools when it releases annual report cards on school performance. Although the state has ceased branding public schools with “A-F” letter grades, Downey explains that the state of Ohio continues to ignore the role outside-of-school variables in students’ lives when it blames educators and schools for low aggregate test scores:

“The report card for schools is constructed from six indicators and not a single one of them gauges performance independent of the children’s nonschool environments. First is achievement, which is based on the percentage of students  who pass state tests… By far, the biggest determinant of whether a school produces high or low test scores is the income level of the students’ families it serves… Second is the extent to which a district closes achievement gaps among subgroups. But performance on this indicator can also be influenced by factors out of the school’s control… Third, schools are gauged by the degree to which the school improved at-risk K-3 readers… Of course, it is much easier to make progress on this indicator if serving children who go home each evening to reinforce the school goals. Fourth, schools are evaluated on their progress, an indicator based on how much growth students exhibit on math and reading tests. This kind of indicator is better than most at isolating how schools matter, but again, growth is easier in schools where students enjoy home environments that also promote learning… Fifth, the graduation rate constitutes a component of the district’s (rating)… but this is only a measure of school quality if the likelihood of a child’s on-time graduation has nothing to do with the stress they experience at home, the access they have to health care, or the quality of their neighborhood.  Finally districts are evaluated on whether their students are prepared for success.  This indicator gauges the percentage of students at a school viewed as ready to succeed after high school… and is determined by how well the students performed on the ACT or SAT and whether they earned a 3 or higher on at least one AP exam… These report cards ‘are designed to give parents, communities, educators, and policymakers information about the performance of districts and schools,’ but what they really do is mix important factors outside of school with what is going on inside the schools in unknown ways.” (How Schools Really Matter, pp. 115-116)

What these reports and many others demonstrate is that we cannot expect that no child will be left behind merely because Congress passes a law declaring that schools can make every American child post proficient test scores by 2014. No Child Left Behind’s (and now the Every Student Succeeds Act’s)  policies—which have branded schools unable quickly to raise aggregate test scores as “failing schools”— have unfairly targeted school districts located in poor communities. In 2017, the Harvard University testing expert, Daniel Koretz published The Testing Charade: Pretending to Make Schools Better in which he shows that ameliorating opportunity gaps in the lives of children is not something schools can accomplish by themselves.

Koretz explains: “One aspect of the great inequity of the American educational system is that disadvantaged kids tend to be clustered in the same schools. The causes are complex, but the result is simple: some schools have far lower average scores—and, particularly important in this system, more kids who aren’t ‘proficient’—than others. Therefore, if one requires that all students must hit the proficient target by a certain date, these low-scoring schools will face far more demanding targets for gains than other schools do. This was not an accidental byproduct of the notion that ‘all children can learn to a high level.’ It was a deliberate and prominent part of many of the test-based accountability reforms… Unfortunately… it seems that no one asked for evidence that these ambitious targets for gains were realistic. The specific targets were often an automatic consequence of where the Proficient standard was placed and the length of time schools were given to bring all students to that standard, which are both arbitrary.” (pp. 129-130) Koretz continues: “(T)his decision backfired. The result was, in many cases, unrealistic expectations that teachers simply couldn’t meet by any legitimate means.” (p. 134)

New Education Secretary, Dr. Miguel Cardona, Should Not Require Annual Standardized Testing in This COVID-19 School Year

Last weekend, the NY Times editorialized to demand that President Elect Joe Biden’s new Secretary of Education promptly “clear the wreckage” from Betsy DeVos’s Department of Education. The newspaper is correct to criticize Betsy DeVos’s abandonment of the department’s mission of protecting the civil rights of America’s public school students. And the editorial writers deserve praise for condemning DeVos’s dogged support for for-profit colleges and trade schools at the expense of indebted student borrowers.

But pretty quickly the Times editorial board steps into the old trap of endorsing federally mandated high stakes standardized testing and the collection of big data at the expense of the children and teachers who are struggling to make it through this school year being shunted back and forth from on-line schooling to in-person school and then back on-line as the COVID-19 numbers rise and fall. The editorial board has slipped into the No Child Left Behind mindset that values data over the lived experience of students and teachers:

“Mr. Cardona would need to pay close attention to how districts plan to deal with learning loss that many children will suffer while the schools are closed. Fall testing data analyzed by the nonprofit research organization NWEA suggests that setbacks have been less severe than were feared with students showing continued academic progress in reading and only modest setbacks in math. However, given a shortage of testing data for Black, Hispanic and poor children, it could well be that these groups have fared worse in the pandemic than their white or more affluent peers. The country needs specific information on how these subgroups are doing so that it can allocate educational resources strategically.”

That is, of course, what No Child Left Behind and its massive state-by-state testing regime was supposed to be about, except that nobody ever “allocated educational resources strategically” once we had all the big data. President Elect Joe Biden has explained that across the United States: “There’s an estimated $23 billion annual funding gap between white and non-white school districts today, and gaps persist between high- and low-income districts as well.” Despite wide agreement that twenty years of data-driven school accountability failed to drive investment into the poorest schools, the narrative has been deeply embedded into the conventional wisdom.

It will be up to our new Secretary of Education Miguel Cardona to decide whether to cancel this spring’s federally mandated standardized tests in language arts and math for a second year. Betsy DeVos, to her credit, let the states and the nation’s public schools off the hook last year due to the chaos of the pandemic.

Last week the Washington Post‘s Valerie Strauss summarized the past two decades of mandated standardized testing and the choice which now faces Education Secretary Cardona: “The annual spring testing regime—complete with sometimes extensive test preparation in class and even testing ‘pep rallies’—has become a flash point in the two-decade-old school reform movement that has centered on using standardized tests to hold schools and teachers accountable.  First, under the 2002 No Child Left Behind law and now under its successor, the 2015 Every Student Succeeds Act, public schools are required to give most students tests each year in math and English language arts and to use the results in accountability formulas.  Districts evaluate teachers and states evaluate schools and districts—at least in part—on test scores.”

Strauss continues: “Supporters say that (the tests) are important to determine whether students are making progress and that two straight years of having no data from these tests would stunt student academic progress because teachers would not have critical information on how well their students are doing. Critics say that the results have no value to teachers because the scores come after the school year has ended and that they are not allowed to see test questions or know which ones their students get wrong. There are also concerns that some tests used for accountability purposes are not well-aligned to what students learn in school—and that the results only show what is already known: students from poor families do worse than students from families with more resources.”

Criticizing the NY Times editorial, Diane Ravitch elaborates as she suggests that Dr. Cardona should cancel the mandated state tests for a second year: “The results will be useless. The teachers are usually not allowed to see the questions, never allowed to discuss them, and never allowed to learn how individual students performed on specific questions. The results will be reported 4-6 months after students take the test. The students will have a new teacher. The students will get a score, but no one will get any information about what students do or don’t know… Anyone who thinks that it is necessary or fair to give standardized tests this spring is out of touch with the realities of schooling. More important than test scores right now is the health and safety of students, teachers, and staff.”

Writing for Education Week last month, Lorrie A. Shepard, a professor of research and evaluation methodology  at the University of Colorado School of Education cautions that, Testing Students This Spring Would Be a Mistake. Like many experts, Shepard worries about the use of standardized tests for high stakes accountability: “Even under normal circumstances, high-stakes testing has negative consequences. State assessment programs co-opt valuable instructional time, both for week-long test administration and for test preparation. Accountability pressures often distort curriculum, emphasizing test-like worksheets and focusing only on tested subjects. Recent studies of data-driven decision making warn us that test-score interpretations can lead to deficit narratives—blaming children and their families—instead of prompting instructional improvements… Most significantly, teachers report that they and their students experience high degrees of anxiety, even shame, when test scores are publicly reported… Clearly it would be unfair to hold schools and teachers accountable for outcomes when students’ learning opportunities have varied because of computer and internet access, home learning circumstances, and absences related to sickness or family disruption. Testing this year is counterproductive because it potentially demoralizes students and teachers without addressing the grave problems exacerbated by the pandemic.”

In The Testing Charade: Pretending to Make Schools Better, a profound and thorough exploration of the past two decades of the use of students’ standardized test scores to evaluate their schools and their teachers, Harvard University testing expert, Daniel Koretz concisely explains why the federal use of widespread standardized testing to drive teachers’ evaluations, school closures, the firing of school principals, state takeovers of schools, and the turnover of public schools to private operators has not only left us with a succession of dangerous policies, but also undermined the validity of the tests themselves as states manipulated their scoring to avoid sanctions.  Further the attachment of high stakes undermined the education process in the schools where children were farthest behind—schools where teachers were forced to teach to the test or fall back on deadly drilling.

Koretz cites social scientist Don Campbell’s well-known theory describing the universal human response when high stakes are attached to any quantitative social indicator: “The more any quantitative social indicator is is used for social decision making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor… Achievement tests may well be valuable indicators of… achievement under conditions of normal teaching aimed at general competence. But when test scores become the goal of the teaching process, they both lose their value as indicators of educational status and distort the educational process in undesirable ways.” (The Testing Charade, pp. 38-39)

Nobody Should Be Wasting Time Worrying About When to Administer Standardized Tests

Parents, children, teachers, principals, and school superintendents are living through a time of unknowns. COVID-19 is raging across the states with many public schools operating only online. Some public schools, which have been able to open in person or on hybrid schedules, have subsequently been forced to close already reopened buildings or specific classrooms as COVID-19 cases arise and everybody quarantines.

In the midst of a chaotic situation with no good and stable solutions for many public schools, suddenly last week everybody started worrying about what to do about this year’s standardized tests. The Washington Post‘s Perry Stein reports that outgoing Secretary of Education, Betsy DeVos postponed the winter administration of the National Assessment of Educational Progress, the one test administered across all the states, the test that tracks school achievement over the decades and is not distorted by high stakes consequences.

Representatives Bobby Scott (D-VA) and Patty Murray (D-WA), the Democratic leaders of the House Education Committee, agreed to delay the NAEP, but said the nation needs some kind of measure of learning loss during the pandemic.  They released a statement declaring that annual state tests mandated under the Every Student Succeeds Act must surely be administered: “Existing achievement gaps are widening for our most vulnerable students, including students from families with low incomes, students with disabilities, English learners, and students of color. In order for our nation to recover and rebuild from the pandemic, we must first understand the magnitude of learning loss that has impacted students across the country. That cannot happen without assessment data.”

While I frequently agree with Representatives Scott and Murray, I think worrying about standardized testing right now ought to be a low priority, and I think the state-by-state achievement tests mandated by the Every Student Succeeds Act are the wrong kind of test.  Neither do I believe that the mandated, annual state achievement tests are necessary to help teachers grasp their students’ learning needs during and following the widespread school closures and disruptions in the current school year.  Our schoolteachers are well trained professionals who are prepared to develop their students’ reading comprehension skills, to track problems with computational skills and mathematical conceptualization, and to help support their students emotionally after a period of disruption. The emphasis right now and when children return to classrooms must be supporting teachers facing the complex challenge of serving children who have been out of the classroom for too long. Standardized test scores very often don’t even arrive at schools for months after the tests are administered; they play little role in supporting teachers’ capacity to discern their students’ learning gains or losses.

If we are looking for complex data about the impact of the pandemic on public schools across communities and across states, at some point it will be realistic for the National Center for Education Statistics again to administer the National Assessment of Educational Progress, which is designed as a national audit test to determine learning trends over time.  When it is practical to administer NAEP, certainly that test should happen.

The annual standardized tests, mandated first by No Child Left Behind and, since 2015 by the Every Student Succeeds Act, are designed for an entirely different purpose.  And ironically the purpose and use of these tests for holding schools accountable distorts the results as schools struggle to raise scores at any cost in order to avoid the high stakes punishments that Congress attached to these tests or forced the states to attach. What are these high stakes? States still have to submit to the U.S. Department of Education plans for how to turnaround their lowest performing schools according to these tests.  Some states still evaluate teachers according to their students’ scores. States rate and rank particular schools and school districts according to their aggregate test scores. Many states publish these rankings, which encourages real estate redlining as well as racial and economic segregation across metropolitan areas. Different states place voucher programs or charter schools in school districts where scores are low. Some states take over low scoring schools and school districts and turn them over to appointed commissions that supplant locally elected school boards.  Some school districts have claimed to use school closure as a so-called turnaround plan.

In a profound 2017 book, The Testing Charade: Pretending to Make Schools Better, Daniel Koretz, a Harvard University expert on standardized testing, documents research exposing flaws in the entire strategy of No Child Left Behind, which combined standardized testing with high stakes punishments for schools unable quickly to raise students’ test scores. Koretz explains social scientist Don Campbell’s well-known theory describing the universal human response when high stakes are tied to a quantitative social indicator.  In this case, the social indicator is whether or not educators and particular schools can produce higher aggregate student test scores year after year:

“The more any quantitative social indicator is is used for social decision making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor… Achievement tests may well be valuable indicators of… achievement under conditions of normal teaching aimed at general competence. But when test scores become the goal of the teaching process, they both lose their value as indicators of educational status and distort the educational process in undesirable ways.” (The Testing Charade, pp. 38-39)

Koretz shows that imposing high stakes punishments on schools and educators unable quickly to raise students’ scores inevitably produces reallocation of instruction to what is being tested, causes states eventually to lower standards, causes some schools quietly to exclude from testing the students likely to fail. Under No Child Left Behind, the high stakes even led to abject cheating—as happened in Atlanta under Superintendent Beverly Hall.

What all this means is that the state achievement tests mandated by No Child Left Behind and the Every Student Succeeds Act—whether administered to students this year or put off until after vaccines are widely available and students return to their classrooms—are not an appropriate tool for measuring the long term impact of the pandemic on students’ lives and learning.

Ideological advocacy for holding public schools accountable drove the passage and implementation of the original No Child Left Behind Act. The idea was that educators can be motivated to work harder through fear if their schools are threatened with punishments.  The idea of attaching high stakes consequences for low test scores remains with us today. Last week Chester E. Finn, Jr., formerly of the Thomas B. Fordham Institute and now affiliated with the Hoover Institution, published a widely read column in the Washington Post.  Twenty years ago, Finn strongly promoted No Child Left Behind’s test-and-punish strategy, and clearly he continues to believe in using high stakes testing as a threat. Here is a paragraph from his recent column that Finn could easily have cut, pasted, and slightly updated from something he wrote back in 2001:

“The results from those state assessments are the main source of information about school performance and about pupil learning in the core subjects of the K-12 curriculum. The results also indicate whether America’s appalling — and persistent — achievement gaps are getting any narrower. These student statewide test results are the foundation of a school-performance measurement structure that the United States has been painstakingly constructing in the decades since being declared “A Nation at Risk” in 1983. The information from the tests is used at every level of the system. It enables parents to see how their children are faring on an “external” metric, beyond the grades conferred by their teachers, and it helps principals assess how their schools are doing. The results also equip superintendents to gauge what must be done to boost district-wide achievement, and they furnish state officials with the information needed to guide their assistance and interventions.”

Today, nearly two decades after the states were mandated to administer annual standardized tests and after No Child Left Behind imposed sanctions on the schools with the lowest scores, we know that the whole scheme failed to support children’s school achievement and failed to close achievement gaps. Some schools were charterized as a punishment; other schools were shut down; principals and teachers were fired.  And scores on the national audit test, the National Assessment of Education Progress (the NAEP), have fallen in some cases and in other cases remained flat.

I believe it is unnecessary—in the midst of a raging pandemic and a Presidential transition—to worry about when the federal government will mandate widespread standardized testing.  The bigger question is whether and how the federal government will manage a plan to get the pandemic under control and provide enough support to help states and school districts get all children and adolescents back in school.

I agree with Diane Ravitch, who explains: “Resumption of standardized testing is completely ridiculous in the midst of a pandemic. The validity of the tests has always been an issue; their validity in the midst of a national crisis will be zero. They will show, even more starkly, that students who are in economically secure families have higher test scores than those who do not. They will show that children in poverty and children with disabilities have suffered disproportionately due to lack of schooling.  We already know that.  Why put pressure on students and teachers to demonstrate what we already know?  At this point, we don’t even know whether all students will have the advantage of in-person instruction by March.  If anything, we need a thorough review of the value, validity, and reliability of annual standardized testing, a practice that is unknown in any high-performing nation in the world.  We are choking on the rotten fumes of No Child Left Behind, Race to the Top, and the Every Student Succeeds Act.”

If High Stakes Standardized Testing Fades, Lots of Awful Punishments for Students, Teachers, and Schools Would Disappear

In yesterday’s Washington Post, Valerie Strauss published a very hopeful column: It Looks Like the Beginning of the End of America’s Obsession with Student Standardized Tests.  I hope she is right.  Her column covers current efforts to stop the requirement for college entrance exams and the wave of testing in primary and secondary public schools that was enshrined in the 2002 No Child Left Behind Act. This post will be limited to examining the implications of the mandated standardized testing that, for two decades, has dominated America’s K-12 public schools.

Strauss begins: “America has been obsessed with student standardized tests for nearly 20 years.  Now it looks like the country is at the beginning of the end of our high-stakes testing mania—both for K-12 ‘accountability’ purposes and in college admissions.  When President George W. Bush signed the K-12 No Child Left Behind Act in 2002, the country began an experiment based on the belief that we could test our way to educational success and end the achievement gap.  His successor, Barack Obama, ratcheted up the stakes of test scores under that same philosophy. It didn’t work, which came as no surprise to teachers and other critics. They had long pointed to extensive research showing standardized test scores are most strongly correlated to a student’s life circumstances.”

Strauss explains what’s different this year: “Now, we are seeing the collapse of the two-decade-old bipartisan consensus among major policymakers that testing was the key lever for holding students, schools and teachers ‘accountable.’ And it is no coincidence that it is happening aginst the backdrop of the coronavirus pandemic that has forced educational institutions to revamp how they operate.  States are learning that they can live without them, having been given permission by the Department of Education to not give them this past spring… Former vice president Joe Biden, who is the presumptive Democratic presidential nominee and ahead of Trump in many polls, has tried to distance himself from the pro-testing policies of the Obama administration. He was not a cheerleader of testing during Obama’s two terms and has said recently he is opposed to high-stakes testing.  That’s not a promise that he will work to reduce it, but it is a promising suggestion.”

Strauss publishes six principles from FairTest, the National Center for Fair and Open Testing, principles designed to guide state policy by reducing reliance on high-stakes testing:

  1. “Limit state standardized test requirements to no more than the minimum required by ESSA (the Every Student Succeeds Act that replaced No Child Left Behind) once each in reading and math in grades 3-8, plus once in high school, as well as one science test each in elementary, middle, and high school…
  2. “Seek federal waiver of testing requirements, at least for the 2020-2021 school year but preferably longer…
  3. “Terminate high-stakes consequences that rely on test scores for students (grade promotion tests, exit exams, course/program placement), teachers (bonuses, job ratings) and schools/districts (simplistic grading systems).
  4. “Protect young children by banning mass standardized testing before grade 3…
  5. “Enforce testing transparency and enhance public oversight…
  6. “Develop and implement performance-based assessment systems that enhance academic quality and equity by focusing on improvements in student work done over time.”

One of the most misunderstood issues about our current wave of testing is the impact of attaching high-stakes punishments to test scores. Test-and-punish was the central strategy of the No Child Left Behind Act.  It was assumed that, under the threat of sanctions, teachers would raise their expectations for their students and quickly raise test scores in even the public schools with low aggregate scores. You will remember that when the law passed in 2002, Congress gave America’s public schools a dozen years until which, by 2014, all American children were going to achieve proficiency.  Except it didn’t work.  We now know that Congress’s assumptions underneath No Child Left Behind failed to recognize many factors inside and outside of schools that affect standardized test scores.

In a profound book, The Testing Charade: Pretending to Make Schools Better, Daniel Koretz, a Harvard University expert on the design and uses of standardized testing, explores serious problems that arise when high stakes are attached to testing.  First there is social science research evidence that attaching high stakes punishments for teachers and public schools when scores don’t rise in fact distorts the test results and at the same time undermines in several ways the entire educational experience for both students and teachers: “The more any quantitative social indicator is used for social decision making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor… Achievement tests may well be valuable indicators of… achievement under conditions of normal teaching aimed at general competence. But when test scores become the goal of the teaching process, they both lose their value as indicators of educational status and distort the education process in undesirable ways.” (The Testing Charade, pp. 38-39)  In chapter after chapter, Koretz demonstrates that the consequences have been particularly devastating in schools where child poverty is concentrated; testing has narrowed the curriculum to the tested subjects, forced teachers to coach students and teach to the test, and even resulted in cheating by educators to make a school’s or school district’s scores look better.

Second, Koretz demonstrates that, because children in some schools start farther behind and face far greater obstacles, No Child Left Behind’s uniform timeline for the testing and the law’s application of high-stakes punishments embodies a bias against public schools in the poorest communities and their teachers: “One aspect of the great inequity of the American educational system is that disadvantaged kids tend to be clustered in the same schools. The causes are complex, but the result is simple: some schools have far lower average scores—and, particularly important in this system, more kids who aren’t ‘proficient’—than others. Therefore, if one requires that all students must hit the proficient target by a certain date, these low-scoring schools will face far more demanding targets for gains than other schools do. This was not an accidental byproduct of the notion that ‘all children can learn to a high level.’ It was a deliberate and prominent part of many of the test-based accountability reforms… Unfortunately… it seems that no one asked for evidence that these ambitious targets for gains were realistic. The specific targets were often an automatic consequence of where the Proficient standard was placed and the length of time schools were given to bring all students to that standard, which are both arbitrary.” (pp. 129-130)

I believe FairTest’s third principle is designed to undo the greatest damage wrought by two decades of high stakes testing: “Terminate high-stakes consequences that rely on test scores for students (grade promotion tests, exit exams, course/program placement), teachers (bonuses, job ratings) and schools/districts (simplistic grading systems).”  As states have undertaken to follow the dictates of No Child Left Behind, they have attached punishments for the schools and school districts where scores have failed to rise or where they have risen too slowly.  States have branded those schools and school districts as failures, and continued in several significant ways to punish the nation’s most vulnerable schools instead of providing support.  Across the United States, public schools in the poorest communities continue to receive less funding than the schools in America’s wealthiest and most exclusive suburbs.

Here are the high stakes punishments—always based primarily on aggregate students’ scores on standardized tests—that states persist in imposing on the schools and school districts where scores are low:

The Third Grade Guarantee:  Students who do not meet the standardized test cut score for “proficient” in reading are in many states held back for another year in third grade.  This is despite that research shows that students are developmentally ready to begin reading at very different ages and that forcing children to read in Kindergarten (as the Third Grade Guarantee has encouraged many schools to push) may cause students to struggle and to dislike reading.  Holding children back has also been shown eventually to increase the chance that a student will drop out before graduating from high school.

High School Exit Exams and Graduation Tests:  By denying high school diplomas to students who don’t pass a graduation exit exam, many states continue to punish high school students even if these students have passed all the required classes.

Teacher EvaluationsSome states continue, according to what they promised Arne Duncan, to evaluate teachers by their students’ aggregate standardized test scores.  When the Every Student Succeeds Act replaced No Child Left Behind, that federal agreement states had made to qualify for Race to the Top grants and No Child Left Behind waivers was dropped. Tying teachers’ evaluations to their students’ standardized test scores remains in states’ policies as a remnant of another era.

State Report Cards:  FairTest mentions “simplistic grading systems” for school districts. I believe these grading systems may be the most damaging negative consequence of high stakes testing because all sorts of other serious punishments cascade from the state report card grades.  States were required by No Child Left Behind to rate school districts and individual schools primarily by aggregate standardized test scores. Many states created school district report cards that award school districts and particular schools letter grades: “A” through “F.”  One of the most damaging consequences is that real estate sales websites like Zillow and Great Schools have adopted these state-awarded grades to brand specific communities as desirable places to live and to brand others as undesirable. Because aggregate standardized test scores correlate most highly with family income, the state report card grades—based largely on the each school district’s aggregate students’ test scores—have created educational redlining that is driving racial and economic segregation across America’s metropolitan areas.

School Closures:  One of the original “turnaround” models under No Child Left Behind was school closure.  Some school districts have found ways to shutter or phase out low scoring schools.  In June of 2013, Chicago closed 50 schools, with over 80 percent in African American neighborhoods.  Research from the University of Chicago’s Consortium on School Research showed that students didn’t do better on the whole in receiving schools.  A University of Chicago sociologist, Eve Ewing published a profound book, Ghosts in the Schoolyard: Racism and School Closings on Chicago’s South Side, about widespread grieving across one African American neighborhood when public schools which had served for years as community anchors were shut down.

Targeting Particular School Districts for Privatization:  Some states use aggregate standardized test scores to identify so-called “failing school districts” and then to enable children in those districts to qualify for private school tuition vouchers. Some states locate charter schools primarily in the school districts where aggregate standardized test scores are lower. Instead of investing more financial support for smaller classes and more staff in the public schools in those school districts, some states take the voucher dollars or the per-pupil state funding for each charter school student right out of the local school district budget.

State School District Takeovers:   State takeovers are the ultimate damaging consequence  of the punishments imposed by state legislatures on their poorest and lowest scoring school districts.  Over the years many states have seized low scoring schools or school districts, imposed autocratic, state appointed CEOs to manage the schools or turned over the schools to a “state achievement authority.” Gradually after the long failure of such state seizures of schools and whole school districts, the schools are being returned to locally elected school boards, but the damage to local schools and the disruption of communities is a long, sad story.

If you are searching for good books to explore while you are at home due to the pandemic, check out Daniel Koretz’s The Testing Charade and Eve Ewing’s Ghosts in the Schoolyard.

In Open Letter, Network for Public Education Asks Joe Biden to Be a Champion for Public Schools

Yesterday, Diane Ravitch, President of the Network for Public Education (NPE) and Carol Burris, NPE’s Executive Director published an open letter pressing Joe Biden, as a candidate for President, to provide strong leadership for justice in public education: “Our public schools and their students desperately need a champion.  We hope you will be that champion.  For two decades our schools and their teachers have been micromanaged by misguided federal mandates that require states to judge students, teachers, and schools by standardized test scores, as though a test score could ever be the true measure of a child, a teacher or a school.”

Ravitch and Burris remind Biden of his promise on December 14, 2019, when seven candidates for the Democratic nomination for President appeared at a Public Education Forum 2020.  The meeting, sponsored by the Alliance for Educational Justice; the American Federation of State, County and Municipal Employees; the American Federation of Teachers; the Center for Popular Democracy Action; the Journey for Justice Alliance; the NAACP; the National Education Association; the Network for Public Education Action; the Schott Foundation for Public Education-Opportunity to Learn Action Fund; the Service Employees International Union; and Voto Latino, was one of the most inspiring events I have attended.  It followed a series of Presidential debates all fall in which not one of the candidates had been asked to speak to the complex and fraught political implications of two decades of test-and-punish school reform.  The sponsors had brought more than 1500 teachers, organized parents, and public school students on a winter day to a convention center overlooking the Allegheny River. I don’t think I have ever been part of a crowd that was so wonderfully diverse. I found myself sitting next to a woman who has been serving for 30 years in a public school on the Navajo Nation as a special education teacher.

Now that he will be this year’s Democratic nominee for President, Ravitch and Burris admonish Biden to remember his promise on that winter day in Pittsburgh: “NPE Board member Denisha Jones asked you whether you would commit to ending standardized testing in public schools. You did not hesitate when you said, ‘Yes. You are preaching to the choir… Teaching to a test underestimates and discounts the things that are most important for students to know.’ You explained that what is most important is building a child’s confidence and you referred to evaluating teachers by test scores as a ‘big mistake.'”

In their letter, Ravitch and Burris ask Biden to commit to three principles:

First:   End mandated high-stakes, standardized tests.  “Former supporters of President Obama’s Race to the Top program will whisper in your ear to persuade you to double down on failed policies. They will try to convince you that testing is a ‘civil right.’  It is not.  In fact, standardized testing has its roots in eugenics—it was used for years as a means by which to shut out immigrants, students of color, and students who live in poverty in order to reserve privilege for affluent students, who more typically excel on standardized tests.”

Second:   Fully fund schools to support the work of teachers and their students. “(W)e fully support your plan to triple Title I funding while giving educators voice in how that money should be best spent.” “All children deserve a well-resourced public school filled with high-quality educational experiences. All children deserve experienced and well-prepared teachers. All children deserve schools that have counselors, social workers, librarians, and nurses.  All children deserve a full curriculum, with science labs and arts programs… Research consistently demonstrates that increases in funding make a difference in the educational outcomes of children… We are pleased that you support Community Schools as a pathway for school improvement.”

Third:   End the federal subsidy for the expansion of charter schools.  “We are glad that you endorse district public school improvement instead of embracing the expansion of what has become a competing alternative system whose growth has drained funding from public schools. Banning for-profit charter schools is not enough. There are only a handful of for-profit charters, and they exist only in Arizona. There are, however, many for-profit charter management companies as well as nonprofit charter management companies whose CEOs enjoy exorbitant salaries, far exceeding the salaries of district school superintendents. These charter chains hide their lavish spending on travel, marketing, advertising, rental payments to related companies, and administrative salaries from community, state and federal taxpayers even as they claim to be public schools.”

In their letter to Joe Biden, Ravitch and Burris target their charter school critique to the federal government’s role since 1994 in promoting the expansion of charter schools. Because charter schools have always been authorized in state law and much of the oversight of these publicly funded but privately managed schools falls to the states, Ravitch and Burris emphasize the problems in the federal Charter Schools Program, a program the Network for Public Education had researched and condemned (see here and here) for the egregious failure by the U.S. Department of Education to oversee the states’ administration of the federal grant money, to prevent fraud in the Charter Management Companies, and to ensure any level of quality of the charter schools receiving federal grants. NPE has demonstrated that over 37 percent of the schools receiving funding either never opened or are now closed.

In their new letter, Ravitch and Burris ask Biden to end the federal Charter Schools Program: “Although the policies of the states regarding charter schools are beyond your control, the Federal Charter Schools Program is not. A once modest program intended to spark innovative community-led charter schools is now a program that sends hundreds of millions of dollars each year to corporate charter school chains… It is time to eliminate the federal Charter Schools Program….”

Are NPE’s three priorities—end high-stakes testing, fully fund public education, and end the federal Charter Schools Program—the right priorities?  I think so. 

I would expand a bit, however, on the first goal, ending high-stakes testing. There are two parts of the test-and-punish education policy that has dominated our schools for decades: There is the problem of ubiquitous testing and additionally there are the extremely damaging high-stakes punishments.

High-stakes testing has come to dominate the school year and to narrow and drive the curriculum. Harvard University testing expert, Daniel Koretz explains why attaching high stakes to the testing invalidates the tests themselves and at the same time undermines the education process. Koretz cites social scientist Don Campbell’s well-known theory describing the universal human response when high stakes are attached to any quantitative social indicator: “The more any quantitative social indicator is is used for social decision making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor… Achievement tests may well be valuable indicators of… achievement under conditions of normal teaching aimed at general competence. But when test scores become the goal of the teaching process, they both lose their value as indicators of educational status and distort the educational process in undesirable ways.” (The Testing Charade, pp. 38-39)  The attachment of high stakes has most severely undermined the education process in the schools where children are farthest behind—schools where teachers have been forced to try to catch children up with their more privileged peers by teaching to the test and falling back on deadly drilling.

I would also urge Biden to address the second problem with test-and-punish. The goal of No Child Left Behind was to threaten punishments as a way to motivate educators into finding a way quickly to raise test scores. The punishments embedded in No Child Left Behind and Race to the top used aggregate student test scores as the basis for the following sanctions for low scoring schools: threaten low scoring schools with closure; threaten to fire school principals; threaten schools with state takeover; threaten low scoring schools with takeover by Charter Management Organizations; and threaten teachers by evaluating them by their students’ aggregate test scores.  No Child Left Behind failed across the United States when scores did not rise during the period of its operation over more than a decade.  But the failure to raise scores did not directly affect public schools in affluent communities where economic privilege is known to raise scores.

In poor communities, however, the threatened punishments went into effect.  As a result, in Chicago, school closure was one result; 50 neighborhood schools were shut down in June of 2013, causing widespread community grieving for the loss of essential neighborhood anchors. The number of charter schools across American cities has grown enormously, and their growth is financially devastating the public school districts where they are located. For example, in a stunning report, political economist Gordon Lafer demonstrates that charter schools are draining $57.3 million every year out of the Oakland (California) Unified School District.  In Ohio, the sanctions have led to school privatization, to the closure of public schools in the poorest neighborhoods, to the damaging state report cards that brand schools and school districts in poor communities with “F”ratings. These school report cards have also been the basis for the astronomical expansion of private school tuition vouchers at the expense of local school districts. (See here and here.)

It is, of course, true that under No Child Left Behind and Race to the Top, states were required to adopt these sanctions into their own state laws. These are no longer federally mandated policies which can be eliminated by Congressional action or federal fiat. But strong federal leadership will be necessary to challenge what has become the conventional wisdom in too many state legislatures about about the necessity of high-stakes testing.

I add these last paragraphs merely to probe more deeply some of the politically fraught complexities of two decades of the kind of test-and-punish, business-driven school accountability launched by the federal No Child Let Behind Act and Race to the Top.  If Vice President Joe Biden were to launch the agenda Ravitch and Burris urge him to adopt, he will reset the direction of federal policy and the new direction will move us much closer to eliminating the dangerous punitive policies that still operate across many of our states.

Educational Redlining: GreatSchools Ratings Drive Housing Segregation

Back in 2015, Heights Community Congress (HCC) in Cleveland Heights, Ohio raised serious concerns (here and here) about the impact of online GreatSchools ratings of public schools. The GreatSchools ratings were, in 2015, being used in online real estate advertising by listing services like Zillow.  The practice continues.

HCC, founded in 1972, is Greater Cleveland, Ohio’s oldest fair housing enforcement organization. For over four decades HCC has been conducting audits of the real estate industry to expose and discourage racial steering and disparate treatment of African American and white home seekers. During 2015 and 2016, the fair housing committee of HCC held community meetings to demonstrate that such ads and ratings of public schools are steering home buyers to whiter and wealthier communities and redlining racially and economically diverse and majority black and Hispanic communities.

Last month, Chalkbeat published an in-depth examination of similar concerns on a national scale: “Arguably the most visible and influential school rating system in America comes from the nonprofit GreatSchools, whose 1-10 ratings appear in home listings on national real estate websites Zillow, Realtor.com, and Redfin.  Forty-three million people visited GreatSchools’ site in 2018…. Zillow and its affiliated sites count more than 150 million unique visitors per month.”

Chalkbeat reports that GreatSchools has calculated its ratings for schools using the annual standardized test scores mandated by the 2002 No Child Left Behind Act (NCLB), a requirement maintained in the Every Student Succeeds Act, which replaced NCLB in 2015.  Because the ratings were criticized for relying too much on one standardized test score, in 2017, GreatSchools revised its algorithm for rating schools by including a factor to reflect the rate of growth in each school’s student test scores over time.

But Chalkbeat reports that the overall bias still condemns schools in the poorest communities: “When the organization overhauled its ratings in 2017, it included a host of new metrics. A GreatSchools representative said at the time that the new ratings would ‘more accurately reflect what’s going on in a school besides just its demographics.’  It was a striking acknowledgement of the flaws in the prior system… Two years into this new system, Chalkbeat took a closer look.  We examined the ratings of elementary and middle schools in Chicago, Denver, Detroit, Indianapolis, Nashville, New York City, Phoenix, and San Francisco, combined with several of each city’s suburbs.  The results are striking. On average, the more black and Hispanic students a school enrolled, and the more low-income students it served, the lower its rating. The average 1-10 GreatSchools rating for schools with the most low-income and most black and Hispanic students is 4 to 6 points lower than the average score for schools with the fewest black and Hispanic students and fewest low-income students. In most places, only a tiny fraction of schools with the most low-income and most black and Hispanic students score a 7 or better, the number that earns an ‘above average’ label from GreatSchools.”

In December, the National Education Policy Center (NEPC) reported on a Newsday report from Long Island: “The newspaper found that realtors repeatedly steered White buyers away from school districts enrolling higher percentages of minority residents, typically using veiled language. For example, they told white buyers that one community was an area to avoid ‘school district-wise’ or ‘based on statistics.'” And the housing values increased more rapidly in school districts with high GreatSchools ratings.

NEPC explains that Amy Stuart Wells, a professor of sociology and education at Teachers College, Columbia University followed up on the NewsDay report.  Wells and her colleagues discovered that a one percent increase in Black/Hispanic enrollment corresponded with a 0.3 percent decrease in home values. In other words, a home worth $415,000 at the time of the study in 2010 would cost $50,000 more in a 30 percent Hispanic/Black district as compared to a 70 percent Hispanic/Black district.”  Wells and her colleagues examined and compared the schools themselves: “There didn’t seem to be a huge difference at all in the curriculum and the quality of the teachers… So they (real estate agents) do play an important role in steering people away from certain districts that are becoming more racially, ethnically diverse and less White, in particular.”

For over half a century, research has confirmed that standardized test scores are a poor measure of the quality of a public school. Instead aggregate standardized test scores are highly correlated with family and neighborhood income. Children educated in pockets of privilege regularly post high scores, while children in schools where poverty is concentrated post the lowest scores. Here are three examples of this research, two by academic experts and the third a recent correlation study by the Cleveland Plain Dealer.

For a decade now, Stanford University’s Sean Reardon has been studying the correlation of achievement gaps measured by standardized tests with economic and racial segregation. He has documented that standardized tests measure all of the inside- and outside-of-school factors in a child’s life. Children who live in pockets of wealth bring their privilege with them when they take standardized tests.  In a massive new study published last fall, Is Separate Still Unequal, Reardon explains: “The association of racial segregation with achievement gaps is completely accounted for by racial differences in school poverty.” “We examine racial test score gaps because they reflect racial differences in access to educational opportunities. By ‘educational opportunities,’ we mean all experiences in a child’s life, from birth onward, that provide opportunities for her to learn, including experiences in children’s homes, child care settings, neighborhoods, peer groups, and their schools. This implies that test score gaps may result from unequal opportunities either in or out of school; they are not necessarily the result of differences in school quality, resources, or experience. Moreover, in saying that test score gaps reflect differences in opportunities, we also mean that they are not the result of innate group differences in cognitive skills or other genetic endowments… Differences in average scores should be understood as reflecting opportunity gaps….”

Harvard University’s testing expert, Daniel Koretz, emphasizes that while children living in concentrated poverty take longer to catch up to their more privileged peers, our testing regime fails to consider the needs of children who start school farther behind: “One aspect of the great inequity of the American educational system is that disadvantaged kids tend to be clustered in the same schools. The causes are complex, but the result is simple: some schools have far lower average scores…. Therefore, if one requires that all students must hit the proficient target by a certain date, these low-scoring schools will face far more demanding targets for gains than other schools do. This was not an accidental byproduct of the notion that ‘all children can learn to a high level.’ It was a deliberate and prominent part of many of the test-based accountability reforms…. Unfortunately… it seems that no one asked for evidence that these ambitious targets for gains were realistic. The specific targets were often an automatic consequence of where the Proficient standard was placed and the length of time schools were given to bring all students to that standard, which are both arbitrary.” (The Testing Charade: Pretending to Make Schools Better, pp. 129-130)

And finally, the Cleveland Plain Dealer‘s data wonk, Rich Exner created a series of bar graphs when the Ohio state school district report cards were released last September. Exner demonstrates the correlation of the letter grades awarded to school districts by the state’s school rating system (letter grades based primarily on aggregate student’ standardized test scores) with the family income of the children in each school district.  School districts earning “A” ratings boasted median family income of $95,432, while the school districts rated “F” serve families whose median family income is $32,658.  The state of Ohio itself in its annual school report cards seems to be joining GreatSchools and Zillow to steer families to the affluent, white, exurbs surrounding our cities. These are the districts which regularly earn “A” grades on the state report card and the highest ratings from GreatSchools and Zillow.

It is alarming to see our society stepping back so completely from concerns about steering, disparate treatment, and redlining in the real estate market. These are the very issues the 1968 Fair Housing Act was intended to address.  The National Education Policy Center declares: “Realtors and real estate websites alike share assessments that downgrade schools that serve higher percentages of low-income and minority students, while also serving to maintain segregated housing patterns by steering Whites away from districts that serve students of color.”

Tim Slekar on the Exodus of Schoolteachers from Their Chosen Profession

Tim Slekar is the Dean of the College of Education at Edgewood College in Madison, Wisconsin.  Early in September, Slekar was interviewed on Wisconsin Public Radio, an interview recommended to me by a public school teacher who said it is the best statement she has heard of the truth about public education today.

You can listen to Slekar explain what is described in many places as a growing shortage of public school teachers.  Slekar believes we are not merely experiencing a shortage of teachers;  what is happening instead is an exodus of public school teachers from their chosen profession. If it were a classic labor shortage, explains Slekar, pay would be raised, conditions would be made better, and enrollment in teacher training programs would grow.  All of this would attract more people to teaching, according to how a labor market is supposed to work.  But, argues Slekar, fewer and fewer people now want to be schoolteachers.  He explains that in his office, he has listened as parents of his college students beg their children to choose another profession instead.

Slekar believes that teachers are being driven out of the profession by the impossibility of working under the conditions imposed by test based school accountability, a strategy designed to be punitive. The goal was to make teachers work harder and smarter for fear their schools would receive a low rating. Test based accountability was a bipartisan strategy designed in the 1990s and cast into law in 2002 in the federal No Child Left Behind Act, which required schools to test students annually in grades 3-8 and once in high school. Schools were then judged by their aggregate test scores, and the lowest scoring schools were punished.

Slekar also has a blog, Busted Pencils, where he has covered this subject extensively.  In a post last April, Slekar declares: “Accountability—loved by Democrats and Republicans—has almost become a religious movement. In fact the idea of even questioning the usefulness of test based accountability can cause enraged panic in accountability zealots. ‘How will we know what children are falling behind?’ ‘How will we close the achievement gap if we don’t measure it?’ ‘How will we fire bad teachers without the data?’ ‘How will we know what schools to close?’… Time for the hard truth.  Test based accountability has done one thing well. Over the past 35 years, we have beyond any doubt, measured and confirmed the achievement gap. That’s it. Nothing else.”

He continues: “However, test based accountability has destroyed the profession of teaching and caused a mass demoralization and ‘X’odus from public school classrooms. Oh, and let’s not forget about the thousands of hours of lost instruction time in the sciences, social studies, arts, music, and anything else that doesn’t conform to basic literacy and numeracy skills.”

There is a book which clearly examines all the problems with test based school accountability, a book written by Daniel Koretz, a Harvard University expert on the construction and use of standardized tests.  The Testing Charade: Pretending to Make Schools Better was published in 2017 by the University of Chicago Press, which is currently offering it on sale at the considerably reduced price of $11.00.

Daniel Koretz demonstrates how standardized testing in schools is corrupted—and how education itself is corrupted—when standardized tests become the basis of high-stakes accountability. The problem epitomizes the operation of Campbell’s Law: “The more any quantitative social indicator is used for social decision making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor… Achievement tests may well be valuable indicators of… achievement under conditions of normal teaching aimed at general competence. But when test scores become the goal of the teaching process, they both lose their value as indicators of educational status and distort the education process in undesirable ways.” (The Testing Charade, pp. 38-39)

Koretz demonstrates the many ways that testing undermines education—how scores can be inflated by various kinds of direct test preparation: cutting back on the important subject matter that isn’t tested; spending time within a particular subject on the material known to be emphasized by a particular test; and even in some cases cheating: “The entire logic of our reforms depends on rewarding the schools that do better and punishing those that don’t. However, because in most contexts we can’t separate score inflation from legitimate improvements, we are sometimes rewarding people who game the system more effectively, and we are punishing educators who do good work but appear to be doing relatively less well because they aren’t taking as many shortcuts. On top of that, we are holding out as examples to be emulated programs that look good only because of bogus score gains and overlooking programs that really are good because the teachers using them are doing less to game the system. In other words, the system can propagate bad practice.” (The Testing Charade, p. 64) (emphasis in the original)

Finally there is the problem—confirmed in a recent study by Stanford University’s Sean Reardon—that standardized test scores reflect primarily a school’s or a school district’s aggregate family income.  The tests do not accurately measure the quality of the school. In a series of very simple bar graphs, the Cleveland Plain Dealer‘s data wonk, Rich Exner also demonstrates the striking correlation of Ohio’s school district grades on the state’s school report card with family income and parents’ level of education.

Daniel Koretz explains the correlation: “One aspect of the great inequity of the American educational system is that disadvantaged kids tend to be clustered in the same schools. The causes are complex, but the result is simple: some schools have far lower average scores…. Therefore, if one requires that all students must hit the proficient target by a certain date, these low-scoring schools will face far more demanding targets for gains than other schools do. This was not an accidental byproduct of the notion that ‘all children can learn to a high level.’ It was a deliberate and prominent part of many of the test-based accountability reforms…. Unfortunately… it seems that no one asked for evidence that these ambitious targets for gains were realistic. The specific targets were often an automatic consequence of where the Proficient standard was placed and the length of time schools were given to bring all students to that standard, which are both arbitrary.” (The Testing Charade, pp. 129-130)

In the recent Wisconsin Public Radio interview, Tim Slekar emphasizes that in the United States, over a trillion dollars has been spent on standardized tests and the data systems that process the results.  As a professional educator, he recommends the money be spent instead to surround children with the best children’s literature because reading is at the heart of education. He would also spend part of the money on wraparound programs to ensure that the poorest children are well fed, they are healthy, and they have care and enrichment in after school programs.

In a recent legislative hearing of the Ohio Senate Education Committee, one state senator twice posed the following question: “How much time should we give those who drove the bus into the ditch to get it out?” This legislator’s attack on teachers epitomizes Tim Slekar’s diagnosis of the cause of an exodus of schoolteachers from their profession.

We now know that No Child Left Behind and Race to the Top—and all the state-by-state test based accountability these federal policies spun off—did not improve the education of our nation’s poorest children, who are still being left behind.

I wonder how long it will be before we stop allowing our elected leaders to get away with shifting the blame onto teachers while they—the policymakers—fail to invest the resources and power of government in equitable school funding and in programs to support the needs of our society’s poorest children.