What Do You Think?
The New York City Department of Education Tries to Salvage Their Failures by Creating an "Empowerment Zone"
Leonie Haimson of Class Size Matters provides an impressive analysis of the "Empowerment Zone" .
Leonie Haimson, a lawyer and coordinator of "Class Size Matters" website in New York City, analyzes the newest policy decision to come out of the New York City Department of Education: The Empowerment Zone and "autonomous schools".
WHAT IS THE EMPOWERMENT ZONE
The principals of schools entering the empowerment zone next year will receive about $150,000 in extra funding, with another $100,000 that they will flexibility as to how to spend, and will be able to design their own curriculum, programs and budgets. (For a list of more than 300 schools in the zone, see http://www.insideschools.org/nv/NV_empowerment_schools_331.php )
In exchange, their school’s performance will be judged according to yearly “progress reports”, with grades from A to F, as well as annual “quality reviews” administered by groups of “experienced educators.” Depending on their grades and review, schools will face serious sanctions or receive monetary rewards. They will also be required to perform interim testing of all their students, from three to five times a year.
WHAT’S POTENTIALLY GOOD
Each school’s grades will receive will depend largely on its students’ before and after test scores. Judging a school on the basis of the change in its students’ test scores is called a value-added approach, and in theory, should be a better way of measuring a school’s effectiveness than simply looking at how well students test at any one point in time. For example, if a school enrolls larger than average numbers of low-performing kids, a “progress report” based upon a value-added system should be able to recognize whether the school is achieving results by moving these students up from a level one (or essentially failing) to a level two, or from a low level two to a high level two.
Another value to this proposal, in theory, is the enhanced ability of principals to budget according to the real needs and priorities of their schools – especially given Tweed’s flawed priorities up to now. For example, why should all principals have to hire literacy and math coaches, if acquiring more classroom teachers to reduce class size might yield better results? Why should teachers have to attend DOE-sponsored professional development sessions, rather than those that principals believe might be more effective? Indeed, I have spoken to two principals who are entering the empowerment zone next year and are cautiously optimistic. One is looking forward to the extra funds he will finally have to provide air conditioning for his school, the other is encouraged because of the additional flexibility he’ll be given to devise his own summer programs. Neither one appears to believe that the interim testing required will be particularly helpful, however.
So what’s wrong? The problems with this initiative lie primarily in three areas: statistics, sanctions, and solutions.
According to experts in testing, the state exams used to make the value added scores to determine a school’s grade are not designed or “scaled” to make accurate cross grade comparisons. The process of scaling is a complicated one. The state is taking so long this year in scaling its new exams for 3rd, 5th and 7th grades that accurate results won’t be finished until late summer or even fall – making the city unable to use them to determine which kids will be held back. Nevertheless, DOE is still planning on retaining up to 10% of these kids, based on preliminary and possibly invalid results. (For more on this, see http://www.nydailynews.com/front/story/427421p-360424c.html and http://www.nytimes.com/2006/06/17/nyregion/17schools.html)
In fact, so many fifth graders performed poorly on the new state tests this year, who are essentially the same students who did so well last year in their fourth grade tests, that these results have called into question how well these exams are designed to do this sort of cross-grade comparison, or even to judge whether students are doing better than those in the same grade the previous year. Many times, flawed scaling by the state has caused kids to fail their Regents exams in Physics and/or Math who shouldn’t have; as a result, these exams have had to be rescaled, on an emergency basis, in order to ensure that a reasonable number of students could pass. In contrast, last year’s fourth grade tests were thought to be unusually easy, explaining why NYC as well as many other districts across the state made huge jumps in their fourth grade test scores. If the state tests are not designed to do the cross-grade comparisons of the sort that the administration is planning, then this will lead to unreliable school grades, with serious consequences, as we shall see.
Even if the state tests were designed to do this sort of cross-grade comparisons, only about half the variation in any school-based value-added system actually reflects school-based learning. As Tom Kane and Doug Staiger conclude, two experts on testing who have looked into this issue closely, about 50% of the annual difference between schools in their value-added test scores is either random, outside the control of schools, or due to statistical error:
“…by focusing on mean gains in test scores for students in a given year or changes in mean test score levels from one year to the next, many test-based
accountability systems are relying upon unreliable measures. …Moreover, those differences that do exist are often nonpersistent—either because of sampling variation or other causes. ..For the median-size school, roughly half of the variation between schools in gain scores (or value-added) for any given grade is also nonpersistent.” (see http://www.dartmouth.edu/~dstaiger/Papers/KaneStaiger_brookings2002.pdf )
Yet in NYC, officials are apparently proposing to base each school’s grade – including whether it receives an A or an F – primarily on the basis of one year’s worth of value-added test scores, a system which will be wrong about 50% of the time.
· Another problem is that even when using value-added test scores, schools serving mostly high income and white students typically receive higher scores than schools serving students from a disadvantaged background. Considerable adjustment is necessary to the scores to eliminate this bias, according to Helen Ladd of Duke, who has studied the value-added system in use in North and South Carolina. If appropriate adjustments are not made, this grading system will only further penalize NYC schools with large numbers of poor and minority students. Of course, the more adjustments are done to these measures, the less intelligible and objective they may appear.
· All elementary and middle schools in the empowerment zone will be required to test their students at least five times a year, or three times per year, in the case of high schools. These interim assessments, which have not yet been designed or even field tested, are problematic in a variety of ways:
The tests are supposed to be used for diagnostic purposes only, to give teachers a better guide as to what particular needs individual students have. In the words of Jim Liebman, the chief accountability officer, “Assessments provide real time information that can be used immediately to respond to each student's needs.” In the power point presentation that Liebman gives, there is a chart for each student, with a separate indicator in different colors in about twenty different categories for each subject, like “paraphrasing” or “context clues,” to show improvement or decline in each of the categories over time.
Yet according to many experts, given the difficulty of measuring short term improvements in learning, there is little chance that these tests will yield usable information in any of the particular categories. This is one of the reasons so many experts in testing oppose the use of standardized exams for interim purposes.
The results of the earlier system of interim assessment designed by Princeton Review was largely useless, as was widely acknowledged, in large part because the tests were not aligned to the curriculum or instruction in schools. Given that there will be even less uniformity of curriculum and instruction for schools in the empowerment zone, the new interim assessments will likely provide even less useful information. Liebman says that schools will be able to devise or “customize” their own assessments or provide portfolio assessments, after proving to him that their system is as “rigorous” as the one provided by DOE. But how many schools will have the time and/or the capacity to design or customize their own assessments, other than those which already use portfolios for this purpose? Instead, most will likely use the DOE-provided tests –with disappointing results.
In any event, whether or not the interim assessments provide useful information, one thing is clear: all this additional mandated testing will be extremely time-consuming and take even more resources from instruction. In the early grades, they intend to use a version of ECLAS, which currently takes several weeks per class for each teacher to give; five rounds of ECLAS will take up much of the year to provide, leaving most of the class essentially without supervision while each student is assessed. Liebman says that teachers will be given Palm pilots to score the ECLAS, but still it will require a massive effort simply to give the assessment five times a year to 20-25 students in a class, no less enter all the data and begin to analyze it.
It is likely that, even more than before, the schools adopting the new accountability system will devote even more time to test-prep, less time to untested subjects like history, science and art, with even more joy and discovery taken away from the learning (and teaching) experience. As it is, many schools have relegated art to their after school programs; and any school that has entered the empowerment zone will now be able to use funding meant for the arts for any purpose at all. (For more on this see today’s NY Times at http://www.nytimes.com/2006/06/26/arts/26blue.html?_r=1&pagewanted=print)
What’s wrong with supplying teachers with more information, however unreliable, about their students?
Given the excessive class sizes and scarce time in most schools, there are no substantive changes being proposed that would improve teachers’ ability to devote to individual students, in order to address whatever weaknesses they might find. In fact, most schools there is too much time devoted to testing and too little to instruction – and quite often, teachers have too many students, especially in middle and high school, to be able to consistently return homework with comments.
Teaching and learning are extremely labor-intensive endeavors. No one knows this better than parents, who help our children with homework on a daily basis and try to fill in all the gaps in learning that teachers too often do not have the opportunity to address. Even if we know all our children’s academic weaknesses, it still takes much time and practice, and going over their work sometimes repeatedly, before seeing any improvement. Somehow, in this administration’s view, capturing all this data will somehow automatically lead to better instruction, without altering one iota the basic conditions in schools or classrooms that make it so difficult to meet the needs of each student.
In fact, the empowerment/accountability proposal seems to be based on the presupposition that no systemic changes are necessary or even desirable to our schools. In response to the question, “what evidence is there that this system helps the lowest performing schools to improve? Liebman wrote: “Clear goals for academic improvement and measures of whether those goals are being met are widely associated with achievement gains in the research.”
· The entire grading system will ignore the fact that not all schools have an equal opportunity to succeed. We know that public schools in NYC have widely varying resources, levels of overcrowding, and class sizes, with some principals being able to cap enrollment and class sizes at a much lower levels than others. This is especially true of charter schools, many of which will be included in the empowerment zone, as well as some of the new smaller schools. How can schools be fairly compared without controlling for the size of classes, which is a significant determinant of academic achievement? Liebman’s response was that principals’ enhanced ability to “schedule” different classes in the empowerment zone will allow them to control the size of their classes, which is difficult to imagine, given that many of the larger schools are already so overcrowded that they’re on double and triple shifts.
I recently reviewed data from the NY State Education Department showing that several of the large, low-performing schools in the Bronx have class sizes 40-90% larger than the small schools that share their buildings. The ability of most charter schools as well many of the new small schools to provide much smaller classes is based not on creatively “scheduling” nearly as much as it is on the fact that they are given more classroom space per student, allowed to cap enrollment at a much lower level, and are able to supplement their meager budgets with private funds.
· So what if the comparative success of these schools depends on smaller classes? Won’t the new accountability system at least measure this disparity and thus help others argue for the same advantages? To the contrary, the officials at Tweed are so averse to the notion that certain inputs like class size might matter that they are excluding all such data from the progress reports, and most probably, the quality reviews as well.
If inputs are not systematically measured or even described, how will principals know what reforms to try in their own schools from the example of other, potentially more successful schools? Liebman’s initial response was that principals could visit other schools if they wanted to see what works. His subsequent answer was the following: “School principals, informed by all members of their school communities including parents, have the best information about the mix of inputs needed to make their students improve. Our progress reports and quality reviews are designed to be sure that principals and school communities are in fact focused on school improvement; if they are not, they will score poorly. “
So simply being “focused” on school improvements will in itself lead to school improvement, absent any information on inputs or anything else. The implication is that the main problem with NYC schools is that principals are insufficiently “focused” on this goal.
As Helen Ladd and Randall Walsh have written, without controlling for school resources, “the measure of school effectiveness that emerges from this approach should at best be used with caution as the basis for rewards and sanctions…Only if all schools had adequate resources that fully accounted for the mix of students they serve would it be fair and appropriate to use this measure of school effectiveness.”
When many parents at a District 2 meeting asked about where class size might fit into the system of accountability, Liebman said that to make any systematic attempt to reduce or even measure class size would be “extremely dangerous.”
Why? In his view, every school needs to do something different to improve. In fact, he added that anyone who thought certain systemic changes were necessary would be “crazy.” (In this, he seems to be ignoring not only the judgment of the Court of Appeals in the CFE case, but also that Klein himself argued this just a few months ago – so much so that DOE officials were willing to prescribe not just curriculum, but how bulletin boards should be displayed and how children should be arranged on rugs at all schools-- and claimed great success as a result.)
Even the flexibility given principals in the empowerment zone will only go so far. For those rules that the Mayor and the Chancellor believes are really important, no autonomy will be allowed. As Dennis Walcott testified at a recent City council hearing on the cell phone ban, there are certain “non-negotiable citywide standards” that apply to the really important stuff, like holding back children based on their test scores and not allowing them to bring cell phones to schools.
UNFAIR AND POTENTIALLY DESTRUCTIVE SANCTIONS
So even if there’s little or no consideration given to making the kind of improvements that most parents and teachers believe are necessary, including reducing class size, why are these accountability provisions dangerous? Not only will a school’s grade will be determined by its value-added test scores, which as we have seen are essentially unreliable in any one year, but the grade of any school will be determined by how these scores compare to the average changes system-wide –with about 15% of all schools set to receive a “D” or “F” in any one year.
Thus every school will be competing not against a certain goal of improvement, but against all other schools, in a zero-sum game. This provision is much more radical than any other accountability proposal I’ve seen imposed in another state or district. Moreover, if a school receives an “F” in just the first year, the principal could be removed or the school could face closure. Here is an excerpt from the “Empowerment Zone” reference guide: “Year 1: Consequences for Schools with a Grade of F and a Quality Score of Ø (undeveloped) in Year 1: These schools may potentially undergo a leadership change or close.(p.23) ”All schools that receive a “D” for two years could face similar fates.
Here, we see the dangers of such a radical system writ large; more instability and more churning for lower-performing schools and students. No matter that the interim assessments are supposed to be used only for diagnostic purposes, what principal facing the loss of his or her job might not try to force out students who are scoring poorly on these assessments? We already have a huge illegal discharge problem in many of our high schools – which the Department of Education has done little to address. This will likely add to the problem, by giving additional incentives to principals to rid their schools of as many low-scoring students as possible before taking exams at the end of the year.
Several researchers have shown that many districts across the country have responded to these sorts of excessive testing regimes by “gaming” the system. Techniques used to improve test scores include cheating, classifying more low-performing students as disabled and thus excluded from taking tests, holding back more of them, or even suspending them for longer periods so that they will be away from school during testing days..
As one researcher put it, “These results have significant implications for the design and implementation of school accountability systems. … the likelihood that schools will find other mechanisms through which they can inflate their observed test performance for the purposes of accountability suggests that all aggregate test scores should be taken with a grain of salt, and not viewed as perfect indicators of school productivity.”
And where will parents go for help if they believe their child is not receiving adequate services, unfairly being held back, and/or being ejected from school? Schools in the empowerment zone will no longer under the jurisdiction of districts, regional superintendents or even the Deputy Chancellor of Instruction, so much so that the new person in that position, Andres Alonso, was unable to answer any questions related to these issues at a recent CPAC meeting, saying that he has no knowledge or authority over any of the schools in the empowerment zone.
Three years ago, the earlier wave of reform at DOE let not to a smaller bureaucracy as promised, but cutbacks to our most vulnerable students. Instead of $200 million being taken from administration, (as now the administration promises once again) the headcount at Tweed grew, the percentage spent on instruction shrank, and spending on special education was cut by $400 million in one year; as a result, many special education students losing access to mandated services and/or their ability to get evaluations in a timely fashion. These sorts of consequences may recur, especially since the school report cards, as currently envisioned, do not seem to require data as to how many kids are being discharged and/or transferred to other settings.
What should have been done in the name of accountability:
Any new accountability system should have focused on improving transparency in spending, accurate reporting and abiding by the law, which we have seen, is a chronic problem for DOE. In his recent testimony, Deputy Mayor Walcott said that the DOE will not comply with any law passed by the City council that ensures the right of students to be able to carry cell phones. DOE is also in flagrant violation of the law as regards the state class size program, according to the State Comptroller, but still refuses to adopt any of the recommendations of the Comptroller in order to improve compliance.
Any grading system and/or new series of assessments should have been carefully researched, vetted by independent experts, and field tested on a limited number of schools -- without applying any serious consequences. This would have been the more responsible path to take, to know beforehand if the scoring system is accurate enough to be utilized in this fashion and also to find out what some of the unanticipated effects of this new system might entail. Instead, as usual, the DOE is rushing ahead, to judge more than 300 schools across the city by using an essentially untested system, and applying potentially severe sanctions and rewards without any validation of the results. .
A new system of accountability should also have focused on enlarging the variety in the ways outcomes are measured other than test scores, to require more reliable reporting on graduation rates, discharge rates, testing exclusion rates, and all the other sort of data that, as DOE currently reports it, is notoriously flawed. Just this week, Education Week released a new study showing NYC’s four year graduation rates third lowest in the nation, at only 39% --fully 15% lower than DOE has claimed.. (See http://www.nydailynews.com/news/local/story/428494p-361351c.html and for a chart, http://www.usatoday.com/news/education/2006-06-20-dropout-rates_x.htm#grad) This new figure is even lower than other recent estimates that held NYC’s graduation rate at about 43% -- and is far below the city’s official reported graduation rate of 54%, which the Mayor bragged about when running for re-election and the city is still defending as accurate.
Tweed should have involved parents, advocates, teachers, and independent experts on testing and assessment early on, before the system was formulated – to help work out some of its weaknesses. Liebman has repeatedly claimed that parents have been involved in devising the accountability system, as well as the quality reviews and the parent surveys whose results will be included in these reviews. Yet he has been unable to provide a single example of how the proposal has changed as a result of a parent’s comment or concern.
There should have been more attention given to the research that refers to the unanticipated results of such stringent testing systems, or indeed, more attention to to high quality research of any form. When asked to back up the claim that school districts that imposed such a regime were those that made the most improvements, Liebman has said that schools in Montgomery MD and Aldine Texas have seen great gains as a result. In Montgomery, class sizes in low-performing schools have been capped at 15 students in Kindergarten, and in grades 1-2 at 17 students per class. These reforms have contributed to some of the highest levels of proficiency and math in the nation, particularly among poor, minority and immigrant students.
As to what research might support the proposal, Liebman has repeatedly cited a report that Klein has also referred to, called “Why Some Schools With Latino Children Beat the Odds...and Others Don’t.” Here is an excerpt from this report:
“New to the field, Mary Jo Waits, Rebecca Gau, Heather Campbell, Ellen Jacobs, and Tom Rex started looking for answers and found a lot of argument about what it takes for high performance. The laundry list was long – more parental involvement, more funding, better teachers, higher pay, lower class size, and on and on. Most of these educational bromides seemed to assume that more money is the key to higher educational attainment – and it may be true that more resources can help. But after Waits and Lattie Coor, Chairman and CEO, Center for the Future of Arizona, happened to read the business book Good to Great – a book that concluded that business success wasn’t due to innovative programs, higher executive compensation, and other management bromides – they wondered whether Good to Great’s method might provide a way to answer the question of how to improve Latino educational attainment in Arizona.
Guess what? In an amazing coincidence, after surveying principals of the higher performing schools in Arizona, each of them appeared to follow the very same lessons found in this book, “Good to Great”, written by a corporate executive, Jim Collins for other corporate executives. These principles are summarized as follows: “Disciplined Thought”, “Disciplined People” and “Disciplined Action.” One of the elements in “Disciplined Thought” happens to be ongoing assessment. To read more, go to http://www.asu.edu/copp/morrison/LatinEd.pdf
All this, sadly, is reminiscent of three years ago, the last time that Joel Klein came up with a revolutionary new reform strategy, when he touted a different study to justify his approach, which at the time involved complete control over curriculum, professional development and instruction – all of which now he has renounced.
That report was called “Beyond Islands of Excellence: What Districts can do to improve instruction and achievement,” (at http://www.learningfirst.org/publications/districts/)
Like the “Beat the Odds” study, this report is another of the sort of poorly controlled, “find whatever you like” studies that are distressingly so common in the field of education. It focused on districts that had supposedly made great gains as a result of “a systemwide approach to improving instruction – one that articulated curricular content and provided instruction supports, “ including Providence, Minneapolis, Chula Vista and yes, Aldine, Texas. That the example of Aldine should have been employed only three years ago to justify a proposal almost exactly opposite to the one now espoused, also justified by the example of Aldine, is evidence of just how amorphous the evidence provided by these studies can be.
· Indeed, by appearing to discount that there might be systemic problems that hinder the ability of our schools to succeed and thus systemic solutions, the entire thrust of this proposal appears to shift the burden and onus off Tweed onto principals, teachers, and the students themselves.
In a system that is starved for resources, space, and smaller classes, in a city with a surplus of $5.5 billion and a state with a surplus almost as large, one might ask where the accountability is for those who are responsible for this sad state of affairs. Despite the fact that the city’s coffers are bursting at the seam, there is not a penny more for instruction in this year’s city budget, and the mayor has said that if he has to contribute one dollar extra to obtain more funding from the state, he would refuse to do so.
I asked Jim Liebman if there will be any accountability system proposed for those running Tweed, and he responded this way: “System is in development.”
So where does this lead us? The experts on testing whom I consulted have concluded that this new accountability system would only be proposed by a bunch of “non-educators”, and will likely lead to a further “corruption of instruction.” It should be no surprise that the announcement of the initiative happened at just about the same time as resignations from both Carmen Farina, the Deputy Chancellor for Instruction, and Lori Mei, head of testing for DOE. One can only hope that the proposal is abandoned or significantly modified before it causes too much damage to our kids.
City HS graduation rates get 'F' in national study
BY ERIN EINHORN
DAILY NEWS STAFF WRITER
With fewer than four out of every 10 kids earning high school diplomas, the city has one of the worst urban graduation rates in the country, according to a national study released yesterday.
Education Week, which published the study, determined that only Detroit, with a 21.7% graduation rate, and Baltimore, with 38.5%, had a smaller percentage of kids earning diplomas in 2003 - the most recent national data available.
But city officials blasted the study - saying it used inaccurate estimates that failed to account for students who don't graduate from city schools within four years because they move, transfer to another school or spend time with families abroad.
The Education Week study, which placed the city graduation rate at 38.9%, used a complicated formula. In part, it compared the average number of kids in each high school grade with the average number of kids in the following grade.
By contrast, the city Education Department tracks students as individuals, counting exactly how many of them earn diplomas and how many drop out, said Lori Mei, the city's top testing official.
According to city data from 2003, slightly more than 50% of high school students earned a diploma in four years.
Mei stressed that the city still needs to improve.
"We need to do a better job so more of our students leave high school with a diploma," she said. "We expect to see improvement going forward."
When New York State crunched the numbers for the first time this year, it put the city's graduation rate in 2004 at 43.5%. Other national studies have had similar findings.
Yet, many states and districts define "graduation" differently. That's why researchers at the Editorial Projects in Education Research Center say they conducted the Education Week study.
"We wanted to sort out the apples from the oranges," said Christopher Swanson, the center director.
Nationwide, the study found an average graduation rate of about 70% - with much lower rates, about 50%, among minorities. Girls also are more likely to graduate than boys.
Mayor Bloomberg and Chancellor Klein Move Successful School Reforms Forward with Major Expansion of Empowerment Schools – Increasing Autonomy and Accountability Systemwide
As Pledged In Mayor’s State of the City Address, More Schools Are Being Granted Greater Autonomy In Exchange for Greater Accountability.
1 Out of 5 City Schools Is Invited to Become an Empowerm