We Need a Common Yardstick for Cities

The most recent results from the National Assessment of Educational Progress (NAEP) are out and generating a lot of discussion, including some cautions about how to interpret the results. I know the benefits and limits of the NAEP all too well.

In our recent report, Measuring Up: Educational Improvement and Opportunity in 50 Cities, we set out to assess the overall health of public schools, regardless of whether they were district- or charter-run, in a sample of 50 cities. Using publicly available federal and state data, we developed nine indicators of school improvement and academic opportunity—things like overall proficiency gains and the share of schools that were “beating the odds”—and used them to benchmark the 50 cities against each other.

Despite some bright spots, the results were sobering. As one of the data analysts on the project, that’s not all I found sobering.

Throughout the project, I was struck by how hard it was to find a trustworthy yardstick for comparing schools across the cities. Directly comparing schools across cities wasn’t possible because, as others have pointed out before, state expectations for reading and math proficiency vary widely.

The following chart illustrates the problem. It compares 8th grade reading proficiency rates in the 27 states represented in our 50-city report (orange bars) to rates based on a common yardstick, the 2013 NAEP (green bars).

Clearly, the state results don’t line up with the NAEP. But what’s even more problematic is that the size of the gaps varies widely: some gaps are huge (Georgia), some small (Wisconsin), making direct comparisons across the cities impossible.

Early on in our project, we thought we had an answer to this problem. We saw that researchers at the Educational Testing Service and the Institute for Education Sciences had mapped state standards onto the NAEP scale using an approach called equipercentile linking. Their method incorporated NAEP’s complex sample design (weights) and the proportion of students meeting that individual state’s achievement standards and produced a new percentile score that allows for adjusted comparisons across states. We thought we could use a similar mapping procedure and adjust our city-level scores with a NAEP-based discounted version of their state proficiency rates.

When we applied the state-discount procedure to the cities, the results were disappointing. Our NAEP-discounted city ranking didn’t line up with rankings based on the NAEP-TUDA (NAEP’s trial urban district assessment) for a subset of our cities. Our NAEP-adjusted city rankings also didn’t line up with rankings based on scale scores for the subset of cities where they were available.

Our NAEP-adjusted city rankings would have worked if city score distributions were identical to state distributions (the NAEP assessment uses sampling weights to produce a representative sample of schools within each state). But in almost every state, it’s clear that city-level proficiency scores and distributions vary significantly from state scores and distributions.

So, in the end, we relied on relative measures of performance, like the share of low-income students in a city who were enrolled in its top-scoring elementary and middle schools.

Data can be an important tool in assessing the health of a school system. We need more consistent data beyond the state level so that we don’t end up comparing apples to oranges.

We hope the data in our report will serve as a catalyst for city leaders to take a look at where they might be falling short and identify other cities they might learn from. But the report is also a reminder that we need better cross-city yardsticks if we want leaders to understand the challenges their cities face and find ideas, lessons, inspiration—and cautions—about big-city school improvement from other cities.

AUTHORS

Related Publications

Skip to content