The Secret Order of the ArXiv

The astro/physics blogosphere is all atwitter about papers the Nature embargo policy (See Julianne If a paper is submitted to nature does it still make a sound, the cat herder Hear a paper, see a paper, speak no paper, and he of less than certain principles Unhealthy obsessions of academia. He of uncertain principles loses the catchy title contest :) )

In this discussion, the uncertain principal brings up an interesting effect for arXiv postings:

There's an obsession in science with the order of publication that I don't think is really healthy, and I think it's only gotten worse. At the Science21 meeting last fall, Paul Ginsparg talked about how there's a huge spike in arxiv submissions just after 4pm, because the daily update email puts papers in the order in which they were submitted, starting at 4pm. He said they can see scripts hitting the server to check the time, and then dumping papers in just as soon as the clock has ticked over. Apparently, the position of a paper in that email has a fairly significant effect on the number of views and citations that paper receives in the future.

Now I myself have been known to try to exploit this effect, but what I don't understand is why, given that the arXiv crew knows about this effect, that they don't fix it. I know it probably would be a bit of a hassle to rewrite the code, but really it shouldn't be impossible to make the order of papers appearing in a day's listing random.

Actually come to think of it is should be rather easy to fix this. Instead of ordering by date, one can just order by some hidden function of the, say, the title of the paper, the time submitted, and the author list. Of course that would just mean that we could spend some time cracking the arXiv's hiding function :)

On a related note, I just submitted a new version of arXiview to Apple (which means it will appear in a few days time) which has some new features, including....ordering the search and posting results by submitted time/date.

More like this

It is infuriating how stodgy biomedical sciences are in terms of information sharing. It's not clear how much of this is bred of inherent conservatism, the pressures of a very competitive field or just plain technobackwardness. But while mathematics and physics have had preprint servers for years,…
This post is about something I've wanted to write about for a while, but never found the time. That's still true, but I've just spent five days as a natural environment for a norovirus or something similar. The good news is I lost 5 pounds. But the bad news -- and there was a lot of it -- is that…
Which is a better metric of faculty research performance, H or G? I already pontificated about the Hirsch index - where you rank your published papers by citation rank, and the H-index is the largest number such that you have k papers with the number of cites greater than or equal to k. It is an…
Interesting conversation at lunch today: topic was academic performance metrics and of course the dreaded citation index came up, with all its variants, flaws and systematics. However, my attention was drawn to a citation metric which, on brief analysis, and testing, seems to be annoyingly reliable…

A simple salted hash of the title should provide an unbreakable deterministic, but unpredictable, ordering function.

By Michael Leuchtenburg (not verified) on 10 Jun 2009 #permalink

Wouldn't it be simpler just to strip the date and time from the listings and then vary the exact time at which the email information was collated.