Earlier this month, Christopher Tienken, a professor of education leadership, management, and policy at Seton Hall University published a short commentary on research he and colleagues have been conducting about whether standardized tests ought to be used for high stakes policy decisions.
He writes, “Every year, policymakers across the U.S. make life-changing decisions based on the results of standardized tests. These high-stakes decisions include, but are not limited to, student promotion to the next grade level, student eligibility to participate in advanced coursework, eligility to graduate high school and teacher tenure. In 40 states, teachers are evaluated in part based on the results from student standardized tests, as are school administrators in almost 30 states. However, research shows that the outcomes of standardized tests don’t reflect the quality of instruction, as they’re intended to… The results show that it’s possible to predict the percentages of students who will score proficient or above on some standardized tests. We can do this just by looking at some of the important characteristics of the community, rather than factors related to the schools themselves, like student-teacher ratios or teacher quality. This raises the possibility that there are serious flaws built into education accountability systems and the decisions about educators and students made within those systems.” Tienken and his colleagues have been investigating issues associated with the aggregate poverty (or wealth) in the communities where schools are located.
Questions about the reliability and validity of standardized testing, reports Rachel Cohen for The American Prospect, are finally contributing to growing doubts about the use of what is known as Value-Added Modeling (VAM) to evaluate teachers. VAM, writes Cohen, is “a controversial statistical method aimed at isolating each teacher’s effectiveness based on… (her) students’ standardized test scores.” VAM models are supposed to measure the “value added” by each particular teacher.
Concerns about VAM are not new. In the spring of 2014, the American Statistical Association warned: “(V)ariation among teachers accounts for a small part of the variation in scores. The majority of the variation in test scores is attributable to factors outside of the teacher’s control such as student and family background, poverty, curriculum, and unmeasured influences. The VAM scores themselves have large standard errors, even when calculated using several years of data. These large standard errors make rankings unstable, even under the best scenarios for modeling.”
A year later in June of 2015, the American Educational Research Association added, “Because of the adverse consequences of faulty evaluations for educators and the students they serve, use of VAM in any evaluation system must meet a very high technical bar… (T)he validity of inferences from VAM scores depends on the ability to isolate the contributions of teachers and leaders to student learning from the contributions of other factors not under their control. This is very difficult, not only because of data limitations but also because of the highly nonrandom sorting of students and teachers into schools and classes within schools. The resulting bias will not be distributed evenly among schools, given wide variation in critical factors like student mobility…. Therefore, due caution should be exercised in the interpretations of VAM scores, since we generally do not know how to properly adjust for the impact of these other factors.”
Nevertheless, the use of VAM to evaluate teachers has become widespread—driven by federal policy. Cohen traces the history, beginning with the influence of recommendations by the New Teacher Project and the National Council on Teacher Quality, both programs that oppose teachers unions and have sought to spread alarm about the quality of American teachers. “In 2009, an education reform group known as The New Teacher Project (TNTP) issued an influential report finding widespread ‘institutional indifference to variations in teacher performance.’… TNTP recommended an overhaul of teacher evaluations, urging districts to develop systems that rate teachers ‘based on their effectiveness in promoting students’ achievement’—which meant evaluating them by their students’ scores on standardized tests. The report heavily influenced the Obama administration’s $4 billion Race to the Top program, which rewarded states that created new evaluation systems based on student test scores and value-added modeling. (The administration also used No Child Left Behind waivers to incentivize similar policies.)” Basically Arne Duncan’s U.S.Department of Education drove states to adopt VAM for evaluating teachers as a condition for states to qualify for a waiver from NCLB’s ill-conceived punishments.
Cohen adds that, “By 2015, the anti-testing backlash had gained steam across the country, in part because the federal government had pushed for test scores to be used to evaluate teachers across all grades and subjects. States had begun to require assessments in such traditionally untested areas like art and early elementary. Parents, teachers unions, and conservatives rallied together for a rollback of federal testing mandates. With the enactment of the Every Student Succeeds Act in late 2015, they succeeded.” But although the federal government eliminated its requirement that states evaluate teachers with standardized test scores, many states have kept on using VAM to rate teachers.
Teachers and their unions have protested the unfairness and inaccuracy of VAM evaluation systems. Cohen summarizes lawsuits filed in a number of states by teachers and their unions, but these cases have been very hard for teachers to win. For example, a federal judge in Florida in 2013 wrote an opinion that explained why courts have often found that while VAM may be unfair, it is not illegal: “In 2013, the National Education Association and its Florida affiliate filed a federal lawsuit challenging a state law that required at least half of a teacher’s evaluation to be based on VAM. In practice, this meant that teachers in non-tested grades and subjects were graded based on the test scores of students they didn’t teach… Together, the seven public school teacher plaintiffs in Cook v. Chartrand argued that Florida’s law violated their equal protection and due process rights. But in 2014, a federal judge ruled against them, concluding that while the rating system seemed clearly unfair, it was nonetheless still legal. ‘Needless to say, this Court would be hard-pressed to find anyone who would fine this evaluation system fair to (teachers in non-tested subjects), let alone be willing to submit to a similar evaluation system… This case, however, is not about the fairness of the evaluation system. The standard of review is not whether the evaluation policies are good or bad, wise or unwise; but whether the evaluation policies are rational within the meaning of the law.'” The decision was upheld on appeal.
Cohen reports, however, that teachers are encouraged by a judge’s recent decision in a Houston case: “The lawsuit centered on the system’s use of value-added modeling (VAM)…. United States Magistrate Judge Stephen Smith concluded that the metric’s impenetrability could render it unconstitutional. If, he wrote, teachers have ‘no meaningful way to ensure’ that their value-added ratings are accurate, they are ‘subject to mistaken deprivation of constitutionally protected property interests in their jobs.’ More specifically, he continued, if the school district denies its teachers access to the computer algorithms and data that form the basis of each teacher’s VAM score, it ‘flunks the minimum procedural due process standard of providing the reason for termination in sufficient detail to enable (the teacher) to show any error that may exist.'”
Beyond the courts, reports Cohen, even prominent education “reformers” have begun to question the reliability of VAM-based teacher evaluations. Jay Greene, the director of the far-right, Walton-funded think tank at the University of Arkansas, the Department of Education Reform, has begun raising questions. Cohen describes her interview with Greene: “In an interview with the Prospect, Greene… said that test-based accountability advocates tend to imagine either that existing accountability systems are already designed according to best practices, or that states will eventually adopt best practices. “But there’s no sign that this will happen…'”
Despite widespread evidence of its flaws, VAM modeling has unfairly damaged the reputation of the teaching profession and seriously undermined morale. We have begun to see a shift in attitudes, but it will take considerable and persistent advocacy to rid VAM entirely from state policy.