Teacher Performance Evaluations: How Fair Are These Assessments?
The buzz in education right now is to evaluate teachers based upon their students' scores. Whatever value this has is certainly undone in New York City, where Joel Klein, the Attorney masquerading as Chancellor, allows principals to put 50 children in a classroom, place all the most difficult children under one teacher, and give teachers assignments with subjects and/or grades they have never taught and are not licensed to teach. Teacher sabotage is easy, if you are a Principal who is permitted to target teachers at the expense of students.
In NYC, the Performance Management document gives principals the tools to get rid of anyone
September 1, 2010
When Does Holding Teachers Accountable Go Too Far?
By DAVID LEONHARDT, NY Times magazine
The start of the school year brings another one of those nagging, often unquenchable worries of parenthood: How good will my child’s teachers be? Teachers tend to have word-of-mouth reputations, of course. But it is hard to know how well those reputations match up with a teacher’s actual abilities. Schools generally do not allow parents to see any part of a teacher’s past evaluations, for instance. And there is nothing resembling a rigorous, Consumer Reports-like analysis of schools, let alone of individual teachers. For the most part, parents just have to hope for the best.
That, however, may be starting to change. A few months ago, a team of reporters at The Los Angeles Times and an education economist set out to create precisely such a consumer guide to education in Los Angeles. The reporters requested and received seven years of students’ English and math elementary-school test scores from the school district. The economist then used a statistical technique called value-added analysis to see how much progress students had made, from one year to the next, under different third- through fifth-grade teachers. The variation was striking. Under some of the roughly 6,000 teachers, students made great strides year after year. Under others, often at the same school, students did not. The newspaper named a few teachers — both stars and laggards — and announced that it would release the approximate rankings for all teachers, along with their names.
The articles have caused an electric reaction. The president of the Los Angeles teachers union called for a boycott of the newspaper. But the union has also suggested it is willing to discuss whether such scores can become part of teachers’ official evaluations. Meanwhile, more than 1,700 teachers have privately reviewed their scores online, and hundreds have left comments that will accompany them.
It is not difficult to see how such attempts at measurement and accountability may be a part of the future of education. Presumably, other groups will try to repeat the exercise elsewhere. And several states, in their efforts to secure financing from the Obama administration’s Race to the Top program, have committed to using value-added analysis in teacher evaluation. The Washington, D.C., schools chancellor, Michelle Rhee, fired more than 100 teachers this summer based on evaluations from principals and other educators and, when available, value-added scores.
In many respects, this movement is overdue. Given the stakes, why should districts be allowed to pretend that nearly all their teachers are similarly successful? (The same question, by the way, applies to hospitals and doctors.) The argument for measurement is not just about firing the least effective sliver of teachers. It is also about helping decent and good teachers to become better. As Arne Duncan, the secretary of education, has pointed out, the Los Angeles school district has had the test-score data for years but didn’t use it to help teachers improve. When the Times reporters asked one teacher about his weak scores, he replied, “Obviously what I need to do is to look at what I’m doing and take some steps to make sure something changes.”
Yet for the all of the potential benefits of this new accountability, the full story is still not a simple one. You could tell as much by the ambivalent reaction to the Los Angeles imbroglio from education researchers and reform advocates. These are the people who have spent years urging schools to do better. Even so, many reformers were torn about the release of the data. Above all, they worried that although the data didn’t paint a complete picture, it would offer the promise of clear and open accountability — because teachers could be sorted and ranked — and would nonetheless become gospel.
Value-added data is not gospel. Among the limitations, scores can bounce around from year to year for any one teacher, notes Ross Wiener of the Aspen Institute, who is generally a fan of the value-added approach. So a single year of scores — which some states may use for evaluation — can be misleading. In addition, students are not randomly assigned to teachers; indeed, principals may deliberately assign slow learners to certain teachers, unfairly lowering their scores. As for the tests themselves, most do not even try to measure the social skills that are crucial to early learning.
The value-added data probably can identify the best and worst teachers, researchers say, but it may not be very reliable at distinguishing among teachers in the middle of the pack. Joel Klein, New York’s reformist superintendent, told me that he considered the Los Angeles data powerful stuff. He also said, “I wouldn’t try to make big distinctions between the 47th and 55th percentiles.” Yet what parent would not be tempted to?
One way to think about the Los Angeles case is as an understandable overreaction to an unacceptable status quo. For years, school administrators and union leaders have defeated almost any attempt at teacher measurement, partly by pointing to the limitations. Lately, though, the politics of education have changed. Parents know how much teachers matter and know that, just as with musicians or athletes or carpenters or money managers, some teachers are a lot better than others.
Test scores — that is, measuring students’ knowledge and skills — are surely part of the solution, even if the public ranking of teachers is not. Rob Manwaring of the research group Education Sector has suggested that districts release a breakdown of teachers’ value-added scores at every school, without tying the individual scores to teachers’ names. This would avoid humiliating teachers while still giving a principal an incentive to employ good ones. Improving standardized tests and making peer reports part of teacher evaluation, as many states are planning, would help, too.
But there is also another, less technocratic step that is part of building better schools: we will have to acknowledge that no system is perfect. If principals and teachers are allowed to grade themselves, as they long have been, our schools are guaranteed to betray many students. If schools instead try to measure the work of teachers, some will inevitably be misjudged. “On whose behalf do you want to make the mistake — the kids or the teachers?” asks Kati Haycock, president of the Education Trust. “We’ve always erred on behalf of the adults before.”
You may want to keep that in mind if you ever get a chance to look at a list of teachers and their value-added scores. Some teachers, no doubt, are being done a disservice. Then again, so were a whole lot of students.
David Leonhardt is an economics columnist for The Times and a staff writer for the magazine.
August 31, 2010
Formula to Grade Teachers’ Skill Gains Acceptance, and Critics
By SAM DILLON, NYTIMES
How good is one teacher compared with another?
A growing number of school districts have adopted a system called value-added modeling to answer that question, provoking battles from Washington to Los Angeles — with some saying it is an effective method for increasing teacher accountability, and others arguing that it can give an inaccurate picture of teachers’ work.
The system calculates the value teachers add to their students’ achievement, based on changes in test scores from year to year and how the students perform compared with others in their grade.
People who analyze the data, making a few statistical assumptions, can produce a list ranking teachers from best to worst.
Use of value-added modeling is exploding nationwide. Hundreds of school systems, including those in Chicago, New York and Washington, are already using it to measure the performance of schools or teachers. Many more are expected to join them, partly because the Obama administration has prodded states and districts to develop more effective teacher-evaluation systems than traditional classroom observation by administrators.
Though the value-added method is often used to help educators improve their classroom teaching, it has also been a factor in deciding who receives bonuses, how much they are and even who gets fired.
Michelle A. Rhee, the schools chancellor in Washington, fired about 25 teachers this summer after they rated poorly in evaluations based in part on a value-added analysis of scores.
And 6,000 elementary school teachers in Los Angeles have found themselves under scrutiny this summer after The Los Angeles Times published a series of articles about their performance, including a searchable database on its Web site that rates them from least effective to most effective. The teachers’ union has protested, urging a boycott of the paper.
Education Secretary Arne Duncan weighed in to support the newspaper’s work, calling it an exercise in healthy transparency. In a speech last week, though, he qualified that support, noting that he had never released to news media similar information on teachers when he was the Chicago schools superintendent.
“There are real issues and competing priorities and values that we must work through together — balancing transparency, privacy, fairness and respect for teachers,” Mr. Duncan said. On The Los Angeles Times’s publication of the teacher data, he added, “I don’t advocate that approach for other districts.”
A report released this month by several education researchers warned that the value-added methodology can be unreliable.
“If these teachers were measured in a different year, or a different model were used, the rankings might bounce around quite a bit,” said Edward Haertel, a Stanford professor who was a co-author of the report. “People are going to treat these scores as if they were reflections on the effectiveness of the teachers without any appreciation of how unstable they are.”
Other experts disagree.
William L. Sanders, a senior research manager for a North Carolina company, SAS, that does value-added estimates for districts in North Carolina, Tennessee and other states, said that “if you use rigorous, robust methods and surround them with safeguards, you can reliably distinguish highly effective teachers from average teachers and from ineffective teachers.”
Dr. Sanders helped develop value-added methods to evaluate teachers in Tennessee in the 1990s. Their use spread after the 2002 No Child Left Behind law required states to test in third to eighth grades every year, giving school districts mountains of test data that are the raw material for value-added analysis.
In value-added modeling, researchers use students’ scores on state tests administered at the end of third grade, for instance, to predict how they are likely to score on state tests at the end of fourth grade.
A student whose third-grade scores were higher than 60 percent of peers statewide is predicted to score higher than 60 percent of fourth graders a year later.
If, when actually taking the state tests at the end of fourth grade, the student scores higher than 70 percent of fourth graders, the leap in achievement represents the value the fourth-grade teacher added.
Even critics acknowledge that the method can be more accurate for rating schools than the system now required by federal law, which compares test scores of succeeding classes, for instance this year’s fifth graders with last year’s fifth graders.
But when the method is used to evaluate individual teachers, many factors can lead to inaccuracies. Different people crunching the numbers can get different results, said Douglas N. Harris, an education professor at the University of Wisconsin, Madison. For example, two analysts might rank teachers in a district differently if one analyst took into account certain student characteristics, like which students were eligible for free lunch, and the other did not.
Millions of students change classes or schools each year, so teachers can be evaluated on the performance of students they have taught only briefly, after students’ records were linked to them in the fall.
In many schools, students receive instruction from multiple teachers, or from after-school tutors, making it difficult to attribute learning gains to a specific instructor. Another problem is known as the ceiling effect. Advanced students can score so highly one year that standardized state tests are not sensitive enough to measure their learning gains a year later.
In Houston, a district that uses value-added methods to allocate teacher bonuses, Darilyn Krieger said she had seen the ceiling effect as a physics teacher at Carnegie Vanguard High School.
“My kids come in at a very high level of competence,” Ms. Krieger said.
After she teaches them for a year, most score highly on a state science test but show little gains, so her bonus is often small compared with those of other teachers, she said.
The Houston Chronicle reports teacher bonuses each year in a database, and readers view the size of the bonus as an indicator of teacher effectiveness, Ms. Krieger said.
“I have students in class ask me why I didn’t earn a higher bonus,” Ms. Krieger said. “I say: ‘Because the system decided I wasn’t doing a good enough job. But the system is flawed.’ ”
This year, the federal Department of Education’s own research arm warned in a study that value-added estimates “are subject to a considerable degree of random error.”
And last October, the Board on Testing and Assessments of the National Academies, a panel of 13 researchers led by Dr. Haertel, wrote to Mr. Duncan warning of “significant concerns” that the Race to the Top grant competition was placing “too much emphasis on measures of growth in student achievement that have not yet been adequately studied for the purposes of evaluating teachers and principals.”
“Value-added methodologies should be used only after careful consideration of their appropriateness for the data that are available, and if used, should be subjected to rigorous evaluation,” the panel wrote. “At present, the best use of VAM techniques is in closely studied pilot projects.”
Despite those warnings, the Department of Education made states with laws prohibiting linkages between student data and teachers ineligible to compete in Race to the Top, and it designed its scoring system to reward states that use value-added calculations in teacher evaluations.
“I’m uncomfortable with how fast a number of states are moving to develop teacher-evaluation systems that will make important decisions about teachers based on value-added results,” said Robert L. Linn, a testing expert who is an emeritus professor at the University of Colorado, Boulder.
“They haven’t taken caution into account as much as they need to,” Professor Linn said.
Finding the Link: Teacher Evaluation and Professional Development
June 30, 2010 9:00 AM - 11:30 AM (Resources and Conservation Center)
Contact Name: Sharon Cannon
Send page by email
Watch the full video of this event.
The race to reform teacher evaluation has begun. With plenty of evidence that current systems of teacher evaluation are seriously flawed, and unprecedented government and private funding to improve these systems, states and districts are in the midst of major overhauls. At the same time, there is near consensus that professional development systems are also in dire need of improvement.
The link between evaluation (how well are you doing?) and professional development (how can you improve?) is key to successful performance management systems in nearly every other industry. Yet, in education there is still far too little attention to how these two core elements of teacher performance inform one another, and how, in policy and practice, they can be systematically aligned.
On June 30, 2010, Education Sector hosted a live panel discussion on emerging strategies to link teacher evaluation and professional development and the implications of Race to the Top funding and the reauthorization of the Elementary and Secondary Education Act on the long-term improvement of these core elements.
Watch the full video of this event below:
(right click to view in full screen)
Also enjoy photo highlights from this event below, or directly on Flickr. Also follow coverage of this event (and ongoing conversation) on Twitter! Search the hashtag #esteach and join in!
Scott Thompson, IMPACT, the new teacher evaluation system for the Washington, D.C., public schools
Brad Jupp, senior program adviser for teacher quality initiatives, U.S. Department of Education
Jen Mulhern, The New Teacher Project, who worked with New Haven on their new evaluation system
Elena Silva, senior policy analyst, Education Sector (as moderator)
Four bloggers—all teachers provided reactions to the panel and asked the first questions. They will post their reflections on their own blog and on The Quick and the Ed.
The four teachers and their blogs are:
Wookie Kim teaches English at the secondary level in the District of Columbia Public Schools. He is also a first-year Teach for America D.C. region corps member. He reflects on his experience as a D.C. educator on his blog, ABCDE.
Dina Strasser teaches seventh-grade English in upstate New York. The former Fulbright scholar has been an educator for 11 years, spending eight of those years teaching English as a Second Language at all levels of education. She blogs at The Line.
Tom White has taught third grade in suburban Seattle for 26 years. He is a National Board Certified Teacher and blogs at Stories from School, which is sponsored by The Center for Strengthening the Teaching Profession.
Ann-Bailey Lipsett teaches at an extremely diverse public elementary school in suburban Washington. She blogs at Organized Chaos.
The Joyce Foundation provided funding for this project. We thank them for their support but acknowledge that the views presented during this event are those of the panelists alone and do not necessarily represent the opinions of the foundation.
Rush to Judgment: Teacher Evaluation in Public Education
January 29, 2008
Read more about
Send page by email
Click to download as a PDF The troubled state of teacher evaluation is a glaring and largely neglected problem in public education, an enterprise that spends $400 billion annually on salaries and benefits.
Because teacher evaluation is at the heart of the educational enterprise —the quality of teaching in the nation’s classrooms—it has the potential to be a powerful lever of teacher and school improvement. But that potential is being squandered throughout public education today.
A host of factors—a lack of accountability for school performance, staffing practices that strip school systems of incentives to take teacher evaluation seriously, union ambivalence, and public education’s practice of using teacher credentials as a proxy for teacher quality—have resulted in teacher evaluation systems throughout public education that are superficial, capricious, and often don’t even directly address the quality of instruction, much less measure students’ learning.
In this Education Sector report, Co-founder and Co-director Thomas Toch and Robert Rothman of the Annenberg Institute for School Reform examine the causes and consequences of the crisis in teacher evaluation, as well as its implications for the current national debate about performance pay for teachers. And the report examines a number of national, state, and local evaluation systems that point to a way out of the evaluation morass.
Download "Rush to Judgment: Teacher Evaluation in Public Education."
Also read an excerpt of this report in a recent issue of the Annenberg Institute's Voices in Urban Education (Summer 2008). The edition focuses on teacher quality issues in education.
This research was funded in part by KnowledgeWorks Foundation and the William T. Grant Foundation. We thank the foundations for their support but acknowledge that the findings and conclusions presented in this report are those of the authors alone and do not represent the opinions of the foundations.