New Research Yet Again Proves the Folly of Judging Teachers by Their Students’ Test Scores

The Obama Administration’s public education policy, administered by Secretary of Education Arne Duncan, was deeply flawed by its dependence on technocracy. In the 1990s, Congress had been wooed by researchers who had developed the capacity to produce giant, computer-generated data sets. What fell out of style in school evaluations were personal classroom observations by administrators who were more likely to notice the human connections that teachers and children depended on for building trusting relationships to foster learning.

Technocratic policy became law in 2002, when President George W. Bush signed the omnibus No Child Left Behind Act. Technocratic policy reached its apogee in 2009 as Arne Duncan’s Race to the Top grant program became a centerpiece of the federal stimulus bill passed by Congress to ameliorate the 2008 Great Recession.

In an important 2014 article, the late Mike Rose, a professor of education, challenged the dominant technocratic ideology.  He believed that excellent teaching cannot be measured by the number of correct answers any teacher’s students mark on a standardized test. Rose reports: The “classrooms (of excellent teachers) were safe. They provided physical safety…. but there was also safety from insult and diminishment…. Intimately related to safety is respect…. Talking about safety and respect leads to a consideration of authority…. A teacher’s authority came not just with age or with the role, but from multiple sources—knowing the subject, appreciating students’ backgrounds, and providing a safe and respectful space. And even in traditionally run classrooms, authority was distributed…. These classrooms, then, were places of expectation and responsibility…. Overall the students I talked to, from primary-grade children to graduating seniors, had the sense that their teachers had their best interests at heart and their classrooms were good places to be.”

In her 2012 book, Reign of Error, Diane Ravitch reviews the technocratic strategy of Arne Duncan’s Race to the Top. To qualify for a federal grant under this program, states had to promise to evaluate public school teachers by the standardized test scores of their students: “Unfortunately, President Obama’s Race to the Top adopted the same test-based accountability as No Child Left Behind. The two programs differed in one important respect: where NCLB held schools accountable for low scores, Race to the Top held both schools and teachers accountable. States were encouraged to create data systems to link the test scores of individual students to individual teachers. If the students’ scores went up, the teacher was an ‘effective’ teacher; if the students’ scores did not go up, the teacher was an ‘ineffective’ teacher  If schools persistently had low scores, the school was a ‘failing’ school, and its staff should be punished.” (Reign of Error, p. 99).

Ravitch reminds readers of a core principle: “The cardinal rule of psychometrics is this: a test should be used only for the purpose for which it is designed. The tests are designed to measure student performance in comparison to a norm; they are not designed to measure teacher quality or teacher ‘performance.'” (Reign of Error, p. 111)

This week, Education Week‘s Madeline Will covers major new longitudinal research documenting what we already knew: that holding teachers accountable for raising their students’ test scores neither improved teaching nor promoted students’ learning:

“Nationally, teacher evaluation reforms over the past decade had no impact on student test scores or educational attainment. ‘There was a tremendous amount of time and billions of dollars invested in putting these systems into place and they didn’t have the positive effects reformers were hoping for.’ said Joshua Bleiberg, an author of the study and a postdoctoral research associate at the Annenberg Institute for School Reform at Brown University… A team of researchers from Brown and Michigan State Universities and the Universities of Connecticut and North Carolina at Chapel Hill analyzed the timing of states’ adoption of the reforms alongside district-level student achievement data from 2009 to 2018 on standardized math and English/language arts test scores. They also analyzed the impact of the reforms on longer-term student outcomes including high school graduation and college enrollment. The researchers controlled for the adoption of other teacher accountability measures and reform efforts taking place around the same time, and found that their results remained unchanged. They found no evidence that, on average, the reforms had even a small positive effect on student achievement or educational attainment.”

Arne Duncan is no longer the U.S. Secretary of Education. And in 2015, Congress replaced the No Child Left Behind Act with a different federal education law, the Every Student Succeeds Act (ESSA), in which Congress permitted states more latitude in how they evaluate schoolteachers. So why is this new 2021 research so urgently important?  Madeline Will reports, “Evaluation reform has already changed course. States overhauled their teacher-evaluation systems quickly, and many reversed course within just a few years.”  Will adds, however, that in 2019,  34 states were still requiring “student-growth data in teacher evaluations.”

In 2019, for the Phi Delta Kappan, Kevin Close, Audrey Amrein-Beardsley, and Clarin Collins surveyed teacher evaluation systems across the states.  Many states still evaluate teachers according to how much each teacher adds to a student’s learning as measured by test scores, a statistic called the Value-Added Measure (VAM).  Practices across the states are slowly evolving: “While the legacy of VAMs as the ‘objective’ student growth measure remains in place to some degree, the definition of student growth in policy and practice is also changing. Before ESSA, student growth in terms of policy was synonymous with students’ year-to-year changes in performance on large-scale standardized tests (i.e., VAMs). Now, more states are using student learning objectives (SLOs) as alternative or sole ways to measure growth in student learning or teachers’ impact on growth. SLOs are defined as objectives set by teachers, sometimes in conjunction with teachers’ supervisors and/or students, to measure students’ growth. While SLOs can include one or more traditional assessments (e.g., statewide standardized tests), they can also include nontraditional assessments (e.g., district benchmarks, school-based assessments, teacher and classroom-based measures) to assess growth. Indeed, 55% (28 of 51) of states now report using or encouraging SLOs as part of their teacher evaluation systems, to some degree instead of VAMs.”

The Every Student Succeeds Act eased federal pressure on states to evaluate teachers by their students’ scores, but five years since its passage, remnants of these policies linger in the laws of many states.  Once bad policy based on technocratic ideology has become embedded in state law, it may not be so easy to change course.

In a profound book, The Testing Charade: Pretending to Make Schools Better, the Harvard University psychometrician, Daniel Koretz explains succinctly why students’ test scores cannot possibly separate “successful” from “failing” schools and why students’ test scores are an inaccurate and unfair standard for evaluating teachers:

“One aspect of the great inequity of the American educational system is that disadvantaged kids tend to be clustered in the same schools. The causes are complex, but the result is simple: some schools have far lower average scores…. Therefore, if one requires that all students must hit the proficient target by a certain date, these low-scoring schools will face far more demanding targets for gains than other schools do. This was not an accidental byproduct of the notion that ‘all children can learn to a high level.’ It was a deliberate and prominent part of many of the test-based accountability reforms…. Unfortunately… it seems that no one asked for evidence that these ambitious targets for gains were realistic. The specific targets were often an automatic consequence of where the Proficient standard was placed and the length of time schools were given to bring all students to that standard, which are both arbitrary.” (The Testing Charade, pp. 129-130)

2 thoughts on “New Research Yet Again Proves the Folly of Judging Teachers by Their Students’ Test Scores

  1. Thank you, Jan, for this information.  My daughter and I have often discussed how the tests she is required to give to her “littles”, her kindergarten students, do not give her the information she needs to help each child grow and learn and measure their progress.In addition to the mistakes the Bush and Obama administrations made, including appointing Arne Duncan as Secretary of Education, the standardized test makers, like Pearson, are making millions to supply the data to the technocracy.Regards,Gail Larson 

  2. Honestly, I was shocked when I first discovered that the teachers are evaluated based on students’ standardized tests (VAMs) in the US because I thought America must have a more advanced education system than where I came from – China. The exam-oriented education system is predominating the nation. The utmost purpose of K-12 education is to prepare students to take the National College Entrance Exam (NCEE). Over ten million Grade 12 students take the NCEE each year in June, and their score is the sole admission requirement and evaluation for university admission. Under this broad exam-oriented education system umbrella, it is rational that everything (teacher quality, school accountability and regional K-12 education quality) is being evaluated by students’ test scores alone. I feel sorry for my peer (teachers) about the current teacher evaluation situation they must face, and I hope it may change one day.

    Interestingly, this is the topic I chose for one of my course assignments (Translating Research for Educational Change) in my Educational Policy study. I want to share literature review findings and my policy proposal to back up your argument.
    Research evidence:
    1. The tests on which VAM estimates are based are inherently flawed for measuring student achievement in and of themselves. (Amrein-Beardsley & Holloway, 2017)
    2. 86% to 99% of effects on student achievement (standardized test scores) are caused by factors beyond the teacher. (Amrein-Beardsley & Holloway, 2017)
    3. In post- ESSA, states have decreased the use of VAMs within teacher evaluation systems, but offering more alternatives measurements for teacher effectiveness, and allowing districts to develop and implement varied teacher evaluation systems. (Close et al., 2020)
    4. Tests offer very narrow measures of what students have achieved, and they do not often effectively assess students’ depth of knowledge and understanding and their ability to think critically, analytically, or creatively; solve contextual problems; or even accomplish authentic, performance-based tasks (E. L. Baker et al., 2010; Corcoran, 2010; Harris, 2011; Toch & Rothman, 2008). So, test-score alone can never be served as a measure for teacher’s performance.

    Therefore, I propose that stop using VAMs or test scores as measures for any teacher evaluation to keep teachers accountable for the award or punitive purposes. · When designing teacher evaluation, more formative, diverse, and multiple measures including teacher observational system, student surveys, principals’ appraisal, and student growth should be integrated. · Two aspects must be considered while designing Teacher Evaluation: acknowledging and adjusting the dynamics of teachers’ power and supporting teachers’ psychological needs as learners to fulfill accountability/goal accomplishment (summative) and professional growth/improvement (formative) (Ford & Hewitt, 2020).

    Amrein-Beardsley, & Holloway, J. (2019). Value-Added Models for Teacher Evaluation and Accountability: Commonsense Assumptions. Educational Policy (Los Altos, Calif.), 33(3), 516–542.
    Close, Amrein-Beardsley, A., & Collins, C. (2020). Putting teacher evaluation systems on the map: An overview of states’ teacher evaluation systems post–Every Student Succeeds Act. Education Policy Analysis Archives, 28, 58–.
    Ford, T. G., & Hewitt, K.1 (2020). Better integrating summative and formative goals in the design of next generation teacher evaluation systems. Education Policy Analysis Archives, 28(63).

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s