NAEP Results
President Barack Obama and U.S. Education Secretary Arne Duncan say the results of the Nation's Report Card area cause for both optimism and concern. REUTERS

Of the different school reform measures advanced by the Obama administration, perhaps most contentious is the push to develop rigorous methods for evaluating teachers.

The administration and like-minded education reformers have focused on accountability by using test-based measures of student progress to reward or punish schools, and there is a similar push to formulate ways to gauge whether teachers are helping students improve. If made mandatory those evaluations could determine whether teachers get bonuses or lose their jobs, so the stakes are high.

I think teacher evaluation is certainly a prominent part of plans to improve schools, said Joan Herman, director of the University of California, Los Angeles' National Center for Research on Evaluations, Standards and Student Testing. I think it will continue to be a very contentious component in the sense that there are legitimate disagreements about how best to incorporate evidence of student learning into teacher evaluation.

Until recently, the prevailing methods for evaluating teachers carried virtually no consequences and in many districts entailed principals labeling teachers as satisfactory or unsatisfactory. Under that system, the vast majority of teachers -- 99 percent in some school districts - were deemed satisfactory.

Duncan: Nix 'No Child Left Behind,' Sub Teacher Evaluations Tied to Student Progress

The Obama administration, led by its hard-charging education secretary Arne Duncan, would like to change that. President Barack Obama recently offered waivers to exempt states from a provision of the No Child Left Behind education law, widely seen as unattainable, mandating that all students demonstrate proficiency in math and reading by 2014. One of the preconditions for obtaining the waivers is that states institute teacher evaluations tied to student progress, a directive that was not included in a No Child Left Behind overhaul bill subsequently released by Sen. Tom Harkin, D-Iowa, and Sen. Mike Enzi, R-Wyo.

The administration already prodded states with a program, Race to the Top, under which states competed for federal education dollars by pursuing a range of reform measures that included crafting innovative new teacher evaluation systems. Rhode Island, Tennessee and Delaware have begun implementing systems for the 2011-2012 school year, and a report by the National Council on Teacher Quality found that 23 states now require evaluations based on some measure of student growth.

There was some skepticism at first that states were just doing what the federal government wanted to be competitive for Race to the Top, but it's been fascinating that many states have made changes in 2011 without federal incentives, said Sandi Jacobs, vice president of the National Council on Teacher Quality. The report noted that Idaho, Indiana, Michigan, Minnesota and Nevada have passed teacher evaluation legislation in 2011 despite the absence of available Race to the Top funding.

New Teacher Evaluations: Applied Too Soon?

But some researchers and observers caution that in the zeal to come up with new ways to assess teachers, districts are rushing new systems into place without proving that they accurately judge teachers. The Obama administration's waiver plan could accelerate that, said Michael Petrilli, executive vice president of the Thomas B. Fordham Institute.

It's fairly loosely designed, but they're saying to the states if you want these waivers you need to do something with teacher evaluations, said Petrilli. The concern is that states are doing this just to have flexibility on other matters but won't do it well. If it's something being done just to placate the feds I worry that the product at the end of the day isn't going to be good.

That is a common refrain among education researchers, some of whom believe there is not yet any reliable way to consistently distinguish an effective teacher from an ineffective one. One of the most popular methods, commonly known as value-added modeling, compares student test scores at the start of the year to scores at the end of the year. In principle, it offers a data-based approach that reduces disparities between students, such as family income, and illustrates how well a teacher has elevated achievement. But a study found that of teachers who registered in the top 20 percent one year, fewer than a third were in top 20 percent the next year, and another third tumbled into the bottom 40 percent.

The correlation between major league baseball players' batting averages from one year to the next is about the same as the correlation between value-added from one year to the next, said Russ Whitehurst, director of the Brooking Institution's Brown Center for Education Policy. I'm not so much concerned about the fuzziness of the signal. I'm worried about rushing into place a particular way of doing this that's not informed by experience and research as we go forward.

Another Concern: Can Evaluations Measure All Knowledge/Skills Teachers Impart?

There are also aspects of teaching that researchers say cannot be captured by tests. Assessments mandated under No Child Left Behind are only in math and reading, which means that they would not apply to a majority and could, for example, credit an English teacher with writing skills imparted by a history teacher. Other factors that could be incorporated into evaluations, like classroom observation, are costly to implement and difficult to codify.

If the data are going to be used for hiring and firing decisions or merit pay increases it's important that they are highly standardized, and that results are comparable from one grade to the next, from one school to the next, Herman said.

Because of the uncertainty surrounding various proposals for teacher evaluation, some researchers caution against overly prescriptive policies dictating what an evaluation system would look like. Whitehurst advocated a localized approach that would allow for variation.

I think the secret here is going to be having several measures that at least have face validity, you look at them and say 'oh yeah, that's not an unreasonable way to go about it' and look for convergence, Whitehurst said. These systems are going to have lots of mistakes -- any system will generate errors of judgment -- but even remarkably unreliable systems can beat the odds as opposed to a system in which everyone is deemed above average and everyone gets tenure and everyone is a lifelong employee.

You can contact the reporter at j.white@ibtimes.com