Dividing up the pie

Another thing I meant to call out in the context of the Jupiter-goes-boom event was the nod to data gathered by people who aren't connected to the formal research enterprise save tangentially.

This event was first noted by someone not an astronomer by profession, and the article notes that this is hardly the first time astronomers have been scooped. My husband, who is an extremely amateur skygazer and likes to hang out on online astronomy bulletin boards, says that his impression is that astronomers mingle with enthusiasts fairly freely, all things considered, and both sides appear to benefit.

Astronomy isn't the only field where this happens, of course. The Center for History and New Media projects I mentioned in my previous post are essentially crowdsourced news-gathering turned into history. When I was a graduate student in linguistics back in the day, I had occasion to look at Mayan, which amateurs have been instrumental in deciphering. Birdwatchers no more skilled than I are of material help to ornithologists in providing localized bird counts and similar observations. I am also seeing some renewed excitement about "crowdsourcing" various scientific tasks that can't be done by computers but are too laborious and time-consuming to assign to researchers.

So my question about all this is… who's looking after their data? Do data have to come from an accredited scientist affiliated with an institution before they are worth preserving?

Sometimes these questions have answers. Sometimes, not so much.

This points to a larger question, an elephant-in-the-room question. Whose responsibility is all this data gathering and preservation, anyway? "Individual researchers" is an inadequate cop-out, let's just get that on the table right now; without sustainable support, data die when grants fade or retirements happen.

This leaves a few possibilities: funders (notably government), disciplines, and institutions. None of them is unproblematic—in fact, I would go so far as to say that none of them can solve this problem unaided.

Relying on funders assumes that funders will take a long-term perspective on sustainability. Funders can be fickle about this, even government funders; witness the troubled trajectories of the ERIC education database in the US and the Arts and Humanities Data Service in the UK. Worse, outside government vanishingly few funders have resources and infrastructure to throw at this problem; the most they can do is throw money at it in the form of grants, which is not a sustainable funding model by any means.

The line between disciplines and institutions is often a fuzzy one, honestly. The arXiv is the paradigmatic disciplinary preprint repository—but it is sustained by the Cornell University Libraries. Things were not always thus, but such a handoff isn't exactly unusual.

However. When you ask a researcher about her "discipline," she'll probably start talking about her favorite scholarly society. Where are the scholarly societies in all this ferment about data? Gosh, wish I knew. We'll just pass by the American Chemical Society in silence, shall we? They're an outlier and we should all be glad of that… but where's everybody else? Looking for services that members need? Materials that keep members coming back to the society? Why aren't scholarly societies in the data business? I wonder.

Institutions. Institutions have a built-in challenge dealing with data: they have to deal with it over a wide swathe of disciplines. I can't emphasize enough how hard that is! Different formats, different metadata standards (where there are any at all), different ontologies, different patterns of thought, different workflows… there's just no end to the differences.

In these early days, I see a few different institutional approaches to this problem. One is "follow the money." If you've got million-dollar grants, you'll get red-carpet treatment. No grants? No service. When this model is accused of inequity, it throws its hands up and says "since when was life fair?" Another approach is what I call "help the First Son." In the Pesach parable, the first son is the one who approaches his father asking detailed and intelligent questions about Pesach observance, and receives detailed and intelligent answers.

I don't know about you, but I don't know many First Sons among researchers. A few, yes, but not many. A lot of the researchers I know are Third Sons. "What is this?" they say. And a lot are Fourth Sons, who do not even know how to ask. A First-Son approach leaves our Third and Fourth Sons with no answers.

So what we're left with, when we ask who's responsible for data, is a big muddle. Some disciplines have this pretty much sorted. For them, institutional support may be redundant. Other disciplines are under the funder gun; it's still unclear what the institutional role will be there. Many researchers fall into neither group; either their institution helps them or they get no help.

My worry is that as the pie is currently divided, a lot of researchers aren't getting any.

Tags
Categories

More like this

Many of my readers will already have seen the Nature special issue on data, data curation, and data sharing. If you haven't, go now and read; it's impossible to overestimate the importance of this issue turning up in such a widely-read venue. I read the opening of "Data sharing: Empty archives"…
I commented here earlier, not without frustration, about a pair of researchers who built and abandoned a disciplinary repository. I was particularly annoyed that they seemed to have done this purely for self-aggrandizement, apparently feeling no particular attachment to the resulting repository.…
Many people, first confronted with the idea of data curation, think it's a storage problem. A commonly-expressed notion is "give them enough disk and they'll be fine." Terabyte drives are cheap. Put one on the desk of every researcher, network it, and the problem evaporates, right? Right? Let me…
I said awhile ago that we don't know who's going to do data curation yet. I absolutely believe that. I probably should have added, though, that we can have a pretty good idea who's not going to do it: anybody who isn't right this very minute planning to do it. Make no mistake, there's money (from…