NRC Rankings: method in the madness

We are now just 12 hours from the release of the National Research Council Data Based Assessment of Graduate Programs.
The tension is just overwhelming...

An interesting thing about the 2010 NRC rankings is the methodology, and a final version seems to have been settled upon.

As you know, Bob, the primary purpose of the new methodology is to make sure Princeton wins, and Harvard is suitably humbled provide a robust and objective ranking of US graduate programs, for the ages, which is not a subjective grossly lagging metric.

The complaints about the methodology have already started to bubble out, and there will be many more.
That the process was flawed in detail is undoubtable, some of the numbers just can't be right, they don't pass a sanity check, some smell of being simple transcription errors, but there are over 100,000++ pieces of data on about 5,000 programs at a couple of hundred universities.
The numbers are annoyingly robust in aggregate, at a glance.

So, what have they wrought?

There are a couple of key things to know:

1) There are two overall rankings: the R-ranking and the S-ranking.
These are reported separately, not as a (weighted) sum.

a) the R-ranking, is, roughly speaking, the old style "reputation ranking", whereby senior faculty are asked to rate other departments

b) the S-ranking is a synthetic "objective ranking".
It is, roughly speaking, generated by a Monte Carlo simulation of what rank people would give departments, based on the same peoples' stated ranking of the relative importance of objective metrics of program performance, and the reported quantitative value of these metrics.

2) the NRC reports both rankings and the confidence interval for each ranking.
Specifically the 90% confidence interval. (I still think the original 50% interval would be more informative, better still to have reported both).
This, in some cases, gives you a confidence interval range you can drive a truck through, and in some cases gives you nice tight well defined rankings.

They also report the correlation co-efficient between the stated R-rankings and the S-rankings... it is, far as I can tell, mostly positive.

To the extent one can trust these things, the R-ranking is a lagging indicator and is directly comparable with the 1995 rankings

The S-rankings are better more current, though centered on performance data from the 2005 period, and also provide a lot of data on the weights of the individual metrics and how they correlate with mean rankings.

Programs can, and will, generate their own statistics on their performance. Data is all there.
A lot of such numbers will fly about in the next few days, starting with the simple ones, like the centred rankings from the confidence intervals, and unweighted means.

Data mining the metrics will take some time.

There are about 20 metrics, like publications, citations, funding, student funding, composition of faculty, composition of students, time to PhD, graduation rate etc

The metrics are clumped in 3 categories: academics, student issues and diversity.
Each metric has a weight and a correlation co-efficient.
Each field has different weighting for different metrics, but generally only a few metrics contribute significantly to the rankings - most are statistically insignificant.
If you like to think in principal component analysis terms, then typically, near as I can tell at a glance, the rankings are driven by the three most significant components. Possibly depending on field that is being ranked...

There will be some gloating, some defensive critiques, howls of complaint and quiet pats on the back.

Prospective graduate students, and postdocs and faculty, will rethink their priorities, programs will reconsider their strategies, and administrators will ponder weighty decisions.
There will be surprises, good and bad, especially in the obscure metrics that will take time to be mined, even with intense crowd sourcing of thousands of faculty.

Times are very hard in academia, there are persistent rumours of serious program cuts - not trims, amputations - and the NRC report will be used to judge programs.
Some programs can argue for changes since the data was collected, or promise of near future changes, other programs will be mercilessly and messily cut.
Decisions will have to be made on how to reward success, or build on strength, and on how to strengthen weaknesses and expand into new fields and abandoning old.

These rankings are important, for all the flaws in the process, and all our awareness of the subjectivity of some of the quantitative metrics.

In the end, there are single numbers and hard rankings, and that is what resonates with our psyches:
That Is Better Than This!

Interesting times.

More like this

The NRC rankings are out. Penn State Astronomy is ranked #3 - behind Princeton and Caltech. W00t! PSU doing the mostest with the leastest. The Data Based Assessment of Graduate Programs by the National Research Council, for 2010, is out, reporting on the 2005 state of the program. The full data…
Later this month, the National Research Council will, finally, release the much awaited and much anticipated Data-Based Assessment of Research-Doctorate Programs. Every 10 years, roughly, the NRC publishes graduate program rankings, the last having been come out in 1995... The rankings will be…
So, what do we make of the NRC Rankings? What drives the different rankings, and what are the issues and surprises? First, the R-rankings really are reputational - they are a bit more elaborate than just asking straight up, but what they reduce to is direct evaluation by respondents without…
there are many ways to rank a program: including its reputation, its performance, and more subtle quantitative indicators, some of which are contradictory and mutually inconsistent. Rankings are also generally lagging indicators and imperfect indicators of future performance, they are vulnerable to…

I'd take your tension building efforts more seriously if I didn't know you've already seen the results....

By Anon dean (not verified) on 27 Sep 2010 #permalink

Ok, so Agatha Christie I ain't...
But, if you've been holding your breath for 4 years, a certain sense of anticipatory anti-tension is to be allowed for.

Hey, maybe that is what Dark Energy is!
It is the collective negative tension built up since 1995 by administrators in US academia - it is tearing the universe apart!

You're mistaken, I believe, about the R-Ranking. It is an attempt to assign weight to the "objective" measures to get them to mimic the results of a secret survey of some untold number of alleged experts in the field. NRC specifically denies that it is a reputational ranking, and I think they're right. I will discuss this later today (after the embargo). It may be that the results in the sciences will be more worthwhile because of the incorporation of the citation measure (not used in the humanities), but if all the humanitieds results are as odd as the philosophy results, then the NRC wasted a lot of money for nothing.

well, there was a survey and it was not very secret

it asked, I believe, both for rankings, in the reputation style, and for weights to the quantitative metrics

the original intent, as I understand it, was to publish a single ranking that was a weighted sum of the R and S rankings, but this was squelched

the R-ranking is not a pure reputation ranking, they did something to complete the sample - they weighted the R rankings to make them internally consistent - something like if you ranked Uni of X high, but failed to mention Uni of Y which is objectively near identical to X, then then ranking of Y was weighted to the X ranking

I'm sure they'll explain at the presser

the correlation between R and S rankings should show to what extent the reputational rankings match the revealed preferences and be a good measure of ranking lag

The survey for which the weights in the R-Ranking were calculated was secret in the sense that (1) they are not publishing the results of the survey, and (2) they are not, or have not at this point, revealed who completed these surveys or what information they were provided in being asked to complete the survey.

By Brian Leiter (not verified) on 28 Sep 2010 #permalink

Er, the surveys are all on the NRC website.
You can see exactly what information was asked, and who the pool of people questioned was.
Individuals are of course not identified, that'd never pass IRB nor would anyone answer.