If you're not reading the comments here, you're missing the best part of the blog. Case in point, this comment from the incomparable Chris Rusbridge, which I reproduce as a post so that those who are missing the best part of the blog don't miss it:
Several things I wanted to respond to. You say you are "not at all sure we need to prove ab initio that keeping data is a good thing". Well, yes, I kind of agree... but I'm also quite sure that keeping all data is not a good thing. So keeping some, but not all data is good. Which data? Ah, that's a question for much, much more debate (one could postulate some classes of data but specifying a good set of data appraisal criteria is still a really tough challenge).
I also agree that there is no "killer-app magic bullet that will take an unholy mess of undescribed, undifferentiated digital stuff and miraculously organize it", and further that "Data curation requires skill, time, process change (a tall order all by itself), and resources". But two things occur to me here.
To the first order, dealing with the mess and providing the skills and changing processes is not the library's job, or any other "central" organisation's job. Dealing with data is the researcher's job. The way forward is to make it increasingly clear that data messes equal bad, un-reproducible research. Good data management is essential for good research. Period. The only way out of this that I can see (other than bribery and scandal as motivators, both of which we might be getting) is to include better data management training in the preparations for new researchers, ie PhD and Post-Doc courses. And that's a truly long-haul approach. Once we have some better managed, better curated data, then some central or shared group (eg library, data centre, whatever) has a reasonable chance of ingesting it and making it available. But rubbish data should be rejected. Always.
However, the second thing is that managing data in a research context is hard, and as far as I can see the tools (and standards) are not very good. There are some, but they tend not to be portable, and to be limited to a subset of disciplines. Even making sure your research group backs up its data is hard, when they use 3 different operating systems on 3 continents with 3 different sets of institutional requirements. Getting some "killer apps" to make that hard-grind technical stuff that bit easier (or even feasible, in some cases) would sure help to make the culture change work.
No-one could have forced academia to adopt the web if we had stuck with lynx, or whatever the character browser was. It took a smart set of standards AND a good piece of technology (Mosaic etc) to allow academics (and eventually others) to see how it could make their lives easier.
Mind you, I don't agree with it all. Some parts of this puzzle are the library's job, notably persistence of digital materials past the expiry date of grants, labs, and entire departments. I also believe that if you have to embed smart people in a lab to ensure that data is managed successfully—and while that can be debated, I do believe it; if researchers could do this on their own, I think they would have already—why shouldn't those smart people be librarians?
But agree or no, I did think everyone should read it.
- Log in to post comments
Back in the dark ages of the late 1970s (when I started my career in libraries), there was something called a 'data librarian', whose role involved keeping track of mainframe-based datasets used in social sciences research. I worked with one of the early data librarians in the early 1980s, but from memory those jobs disappeared at about the same time mainframes did. I think the assumption was that researchers would learn to look after their own data, but as we now know, that didn't happen.
Well thank you Dorothea for the compliment! I do agree with you re the persistence issue being a library-type job... for the non-rubbish data, of course. And of course the smart embedded people could be librarians, although I suspect they would need to be extremely tech-savvy librarians. They could also be metadata-savvy IT folk too. I think the latter represents the most common class; we hear quite a lot about bio-informaticians, and I've even met a few, who were clearly crucial to the success of associated operations (think the Edinburgh Mouse Atlas for example, see also various comments in posts on Jennifer Rohn's blog). I think my ideal support package might be metadata-savvy IT person during the project, and tech-savvy data librarian after it!
@Brenda, a few data librarians do still exist; at least Edinburgh, Southampton and Oxford have them, see the JISC-funded Datashare project for example. The current JISC-funded Managing Research Data programme should stimulate more, which makes it doubly a shame that the current funding crisis has put the second round of that programme on hold for now. Let's hope they realise how important it is!
Dorothea, my new year's resolution is to start using my newsreader again and I'm happy to add the Book of Trogool to it and see if I can keep up with you. I'm coming to this post a bit late, but just to add to Chris' comments about data librarians.
Brenda, there are many more in North America, i.e. they still exist! Many of them have evolved the job of academic data librarian over time from "keeping track of mainframe-based datasets used in social sciences research" to various forms of academic support for the use of data in learning, teaching and research.
There are over 300 members in IASSIST - the International Association for Social Sciences Information and Technology. Their conference in June 2010 at Cornell takes the theme of Social Data and Social Networking. http://ciser.cornell.edu/IASSIST/
Interestingly, Alma Swan reinvented the concept of data librarian to be more of a data curator in her 2008 report for JISC, The Skills, Role and Career Structure of Data Scientists: An Assessment of Current Practice and Future Needs, http://www.jisc.ac.uk/publications/publications/dataskillscareersfinalr…