Chalkbeat‘s Matt Barnum reports this week that 9 of the 43 school districts which adopted the use of students’ standardized test scores to evaluate teachers have stopped using students’ scores for teacher evaluation. This is an important development because all sorts of research has shown that students’ scores are unreliable as a measure of the quality of a teacher. But too many states are still evaluating their teachers with unreliable algorithms based on students’ test scores.
Barnum reminds us about the history of using students’ standardized test scores to evaluate teachers: “The push to remake teacher evaluations was jump-started by the Obama administration’s Race to the Top competition, which offered a chance at federal dollars to states that enacted favored policies—including linking teacher evaluation to student test scores… Philanthropies—most notably the Bill and Melinda Gates Foundation—provided support for a constellation of groups pushing these ideas.”
Evaluating teachers by their students’ standardized test scores also became a condition for states to qualify for a No Child Left Behind Waiver. After it became apparent that No Child Left Behind was going to declare a majority of schools “failures” because they were not going to be able to meet the law’s rigid schedule, in 2011, the federal government offered to relax some of the law’s most punitive consequences by offering states waivers from No Child Left Behind. But to qualify for a waiver, states had to promise to enact some of Arne Duncan’s pet policies. Using students’ standardized test scores for evaluating schoolteachers was one of the requirements for states to qualify for No Child Left Behind Waivers. Education Week explained: “In exchange, states had to agree to set standards aimed at preparing students for higher education and the workforce. Waiver states could either choose the Common Core State Standards, or get their higher education institutions to certify that their standards are rigorous enough. They also must put in place assessments aligned to those standards. And they have to institute teacher-evaluation systems that take into account student progress on state standardized tests, as well as single out 15 percent of schools for turnaround efforts or more targeted interventions.” (Emphasis is mine.)
Barnam explains the impact of these federal requirements: “Between 2009 and 2013, the number of states requiring test scores to be used in teacher evaluations spiked from 15 to 41, including Washington, DC.”
But in 2015, Congress replaced No Child Left Behind with a new federal education law, the Every Student Succeeds Act (ESSA). And the new law was partly shaped by a protest against Arne Duncan’s misguided teacher evaluation scheme. Barnum explains: “The backlash culminated with the 2015 passage of the Every Student Succeeds Act, which explicitly bars future secretaries of education from doing what Obama’s Education Secretary Arne Duncan did—trying to influence how teachers are evaluated.”
At the time, the Washington Post‘s Lyndsey Layton described how the new ESSA would specifically stop the U.S. Secretary of Education from intervening in the formulation of state laws by limiting, “the legal authority of the education secretary, who would be legally barred from influencing state decisions about academic benchmarks, such as the Common Core State Standards, teacher evaluations and other education policies.”
Barnum outlines many of the problems with the schemes states set up to comply with Arne Duncan’s requirement that—to qualify for a Race to the Top grant or a NCLB waiver—states must judge teachers by students’ scores: “States that complied with federal urging to overhaul their evaluation systems struggled with exactly how to measure teachers’ performance. Classroom observations were usually the biggest factor, with tests playing a key role. But since many teachers do not have a standardized test corresponding to their grade and subject, some districts created new tests or had teachers create their own, raising concerns about overtesting. In other instances, teachers were evaluated in part by student performance in subjects they didn’t teach—the situation for half of New York City teachers in 2016. In many states, the new evaluations debuted just as new academic standards and tests were being implemented, frustrating teachers and their unions who felt they were being held accountable for unfamiliar material without adequate training.”
It became popular to use statistical algorithms called Value Added Measures (VAMs) of student learning rather than merely the aggregate benchmark scores of a teacher’s students as the basis of each teacher’s evaluation. However, in 2014, the American Statistical Association, and in 2015, the American Education Research Association released evidence that calculations trying to measure each teacher’s discrete contribution to her students’ learning were statistically flawed. The American Statistical Association warned: “Research on VAMs has been fairly consistent that aspects of educational effectiveness that are measurable and within teacher control represent a small part of the total variation in student test scores or growth; most estimates in the literature attribute between 1% and 14% of the total variability to teachers… The majority of the variation in test scores is attributable to factors outside of the teacher’s control such as student and family background, poverty, curriculum, and unmeasured influences. The VAM scores themselves have large standard errors, even when calculated using several years of data. These large standard errors make rankings unstable, even under the best scenarios for modeling.”
The problem is that a lot of states continue to use students’ standardized test scores to evaluate teachers. Education Week‘s Madeline Will explains: “Now 34 states require student-growth measures in teacher evaluations… Ten states and the District of Columbia dropped the requirement, while two states (Alabama and Texas) added a student-growth requirement during the same time period. Among the states that do still require an objective measure of student growth, eight do not currently require that the state standardized test be the source of the data. Instead, districts can use measures like their own assessments, student portfolios, and student learning objectives to determine teachers’ contribution to student growth….”
The 2015 replacement for No Child Left Behind—the Every Student Succeeds Act—ended the federal policy pushing states to judge teachers by their students’ standardized test scores. It is reprehensible that so many states are still holding on to this kind of discredited teacher evaluation scheme.