bookoftrogool | ScienceBlogs

IRs, "data," and incentive

dsalo | September 10, 2009

Many of my readers will already have seen the Nature special issue on data, data curation, and data sharing. If you haven't, go now and read; it's impossible to overestimate the importance of this issue turning up in such a widely-read venue. I read the opening of "Data sharing: Empty archives" with a certain amount of bemusement, as one who has been running institutional repositories in libraries for four years. I think Bryn Nelson has confusingly conflated different notions of "data" in his discussion of the University of Rochester's IR. By the definition Nelson appears to be thinking about…

Classification

dsalo | September 9, 2009

Now that we've looked at how back-of-book indexes endeavor to organize and present the information found in a book, we can consider organizing books themselves. It's quite astonishing, how many people go to libraries and bookstores who never seem to stop to think about how books end up on particular shelves in particular areas. There is no magic Book Placement Fairy! Let's consider the problems we're trying to solve for a moment. A library has a lot of books, on which ordinary inventory-control processes must operate. So librarians as well as patrons must be able to locate the specific book…

When is text in a PDF not text?

dsalo | September 9, 2009

I see this confusion so often it seems worth addressing. If you scan a page of text, what you have is a picture. A computer sees it not as letters, numbers, and punctuation—but as pixels, bits of light and shade and color, just like the pixels in your favorite family photo on Flickr. You can't search for, extract, highlight, or cut-and-paste such "text." It doesn't matter whether you embed the picture in a PDF; you still can't search it. Ceci n'est pas une texte! Compare this to creating a PDF from a word-processing or page-layout document. The computer already thinks of the text in these…

Tidbits, 7 September 2009

Free Thought

dsalo | September 7, 2009

Happy Labor Day, US readers. Time to clean out the "toblog" tag on del.icio.us again: Everyone else has already linked to this Wall Street Journal article on data curation, so who am I to go against the tide? My chief takeaway is the trenchant observation that judging the value of data is not straightforward. One scientist's noise is another's signal, and everything is grist for the history-of-science mill. My friend from ebook days Gene Golovchinsky is learning by experience some hard truths about migration versus emulation. Welcome to the fold, Gene! Let's all play with supercomputers!…

Welcome AL Direct readers!

dsalo | September 3, 2009

I found out from a few different sources (thanks, all!) that my post about back-of-book indexes made it into American Libraries Direct yesterday. Welcome to any and all new readers! I hope you stick around. I'm going to tackle classification next…

Migration versus emulation

Free Thought

dsalo | September 2, 2009

Just a quickie post today— In answer to my post about intertwingularity, commenter Andy Arenson suggested that the way to rescue an Excel spreadsheet whose functions or other behaviors depended on a particular version of Excel was to keep that specific version of Excel runnable indefinitely. This is called "emulation," and it assuredly has its place in the digital-preservation pantheon. Some digital cultural artifacts are practically all behavior—games, for instance—and just hanging onto the source code honestly doesn't do very much good. The artifact is what happens when that code is run,…

The problem of "expert location"

dsalo | September 1, 2009

A common problem adduced in e-research (not just e-research, but it does come up quite a bit here) is expertise location, both local and global. You need a statistician. Or (ahem) a metadata or digital-preservation expert. Or a researcher in an allied area. Or a researcher in a completely different area. Or a copyright expert (you poor thing). Very possibly the person you want works right down the hall, or in the building next door, or in the library, or somewhere on campus. But how on earth do you know? You could call around to the offices or departments most likely to contain the expertise…

The dangers of intertwingularity

dsalo | August 31, 2009

When I was but a young digital preservationist, I was presented with an archival problem I couldn't solve. This should not sound unusual. It happens a lot, for all sorts of reasons. If I can keep a few people from falling into traps that make digital preservationists throw up their hands in despair, I'm happy. Anyway, the problem was a website with some interactions coded in Javascript. If those interactions didn't work, the site made significantly less sense. (It could have been worse; even without the Javascript, the materials on the site were still reachable.) The Javascript had been coded…

If not now, when?

dsalo | August 27, 2009

I said awhile ago that we don't know who's going to do data curation yet. I absolutely believe that. I probably should have added, though, that we can have a pretty good idea who's not going to do it: anybody who isn't right this very minute planning to do it. Make no mistake, there's money (from funders and institutions) and hard-won relevance to be had in this line of work. Quite a few people and organizations are eyeing it: IT, libraries, scholarly societies, journals, entrepreneurs. If you want to get into the scrum, if you want a piece of the pie, better get your plan on now. This is no…

Last push for Louisville Free Public Library

dsalo | August 27, 2009

Steve Lawson and the LSW are three-fifths of the way to the goal of $5000 for the flood-ravaged Louisville Free Public Library by September 1. The last two-fifths are the hard part. If you can help, please do. Comment here or send me email (dorothea.salo at gmail) to let me know you've donated, and I'll do a random-number drawing for a PLoS travel mug and a size-large, never-worn PLoS One t-shirt. Thanks.

The humble index

dsalo | August 25, 2009

I'd like to start our tour of book and library information-management techniques with a glance at the humble back-of-book index. I started the USDA's excellent indexing course back in the day, and while it became clear fairly quickly that I do not have the chops to be a good indexer and so I never finished the course, I surely learned to respect those who do have indexing chops. It's not an easy job. Go find a book with an index and flip through it. Seriously, go ahead. I'll wait. Just bask in the lovely indentedness and order of it all. Now answer me a question: Should Google be calling that…

Tidbits, 24 August 2009

dsalo | August 24, 2009

Hello, Monday. My tidbits folder overfloweth. Want to text-mine JSTOR? Looks like you can. Garret McMahon talks about FriendFeed, scholarly communication, and embedded librarianship. Part of the reason I'm here is that I believe, with Garret, that we librarians can't kvetch about gettin' no respect if we don't put ourselves out there in the general research scrum. It's as true locally as globally. Jason Hoyt tells scientists to step up and own their part in the dysfunctionality of scientific communication. Three cheers from this librarian! Indulge me in further Cliff Lynch adulation: check…

A little Friday metablogging

dsalo | August 21, 2009

Well, I've been here for about a month now, and I've quite enjoyed myself! (And I finally did send in my contract, Erin. Really. I did.) Thanks to all who have commented. (Well, except a spammer or two, but I got rid of them posthaste.) You're a civil, engaged, and smart bunch, and I appreciate you very much—especially when you keep me honest. Please, if you will, introduce yourselves and tell me (and Trogool's other commenters) a bit about yourself in the comments to this post. Thanks!

Let Them Eat Disk

dsalo | August 20, 2009

Many people, first confronted with the idea of data curation, think it's a storage problem. A commonly-expressed notion is "give them enough disk and they'll be fine." Terabyte drives are cheap. Put one on the desk of every researcher, network it, and the problem evaporates, right? Right? Let me just ask a few questions about this approach. What happens when a drive on somebody's desk fails? What do we do about the astronomers, physicists, and climatologists, who can eat a whole terabyte before breakfast and hardly notice? What do we do about the social scientists, medical researchers, and…

The classical librarian

dsalo | August 19, 2009

Five years ago (really? goodness, it hardly seems possible) I gave a preconference session at the Extreme Markup Languages conference (which is now Balisage) entitled "Classification, Cataloguing, and Categorization Systems: Past, Present, and Future." I have learned to write better talk titles since then. However. The talk was actually a runthrough of library standards and practices for an audience of markup wonks. Like any field, librarianship has its share of jargon and history that legitimately seems impenetrable to outsiders. I'm going to try to reprise some of that talk here in blog…

Please don't do this! A word about keywords

dsalo | August 18, 2009

I see a lot of metadata out there in the wild woolly world of repositories. Seriously, a lot. Thesis metadata, article metadata, learning-object metadata, image metadata, metadata about research data, lots of metadata. And a lot of it is horrible. I'm sorry, it just is—and amateur metadata is, on the whole, worse than most. I clean up the metadata I have cleaning rights to as best I am able, but I am one person and the metadata ocean is frighteningly huge even in my tiny corner of the metadata universe. So here's a bit of advice that would save me a lot of frustration and effort, and is…

The accidental informaticist

Free Thought

dsalo | August 17, 2009

The publisher Information Today runs a good and useful book series for librarians who find themselves with job duties they weren't expecting and don't feel prepared for. There's The Accidental Systems Librarian and The Accidental Library Marketer (that one's new) and a whole raft of other accidents. I suspect "The Accidental Informaticist" would find an audience, and not just among librarians. The long and short of it is, we just don't know who is going to do a lot of the e-research gruntwork at this point. Campus IT at major research institutions is seizing on the fun grid-computing work,…

Tidbits, 14 August 2009

dsalo | August 14, 2009

I am furloughed today and going out of town, so here, have an early tidbits post. I won't be at the iPRES 2009 conference, but I do recommend looking over the program; it gives a pretty good overview of what digital preservationists think about and study, and what keeps them awake at night. (Midwesterners: the International Digital Curation Conference is coming to Chicago in 2010. I'll be there!) The strength of weak ties: why Twitter matters to scholarly communication. Spot on, and true of FriendFeed as well. This is why, privacy concerns aside, the Facebook acquisition of FriendFeed is a…

Not turning up our noses

dsalo | August 12, 2009

I gave a talk for PALINET some little while ago about institutional repositories. The audience had been primed by the fantastic Peter Murray to think about looking after digital content as the "fourth great wave" of library work. (I wish that talk was online. It was absolutely brilliant.) But not everyone was entirely onboard with that. I recall distinctly one distinguished-looking white-haired gentleman raising his hand. "We in libraries," he said (paraphrase mine), "have historically been purveyors of quality information. Authoritative information. On what basis should we jeopardize that…

Community and archival

Social Sciences

dsalo | August 11, 2009

FriendFeed, now due to be absorbed into the Borg the Facebook empire, allowed me to lurk on the fringes of the scientific community Cameron Neylon mentions in his post on the takeover. Insert all the usual clichés here: it was enormously valuable, I learned a lot, and I wouldn't have missed it for the world. My humanities training wouldn't normally gain me entrée into such a circle, and neither would my professional identity. Insofar as I have professional ambitions in scientific data management, every bit of acculturation I can come by is priceless. That community wasn't the only one I…

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

IRs, "data," and incentive

Classification

When is text in a PDF not text?

Tidbits, 7 September 2009

Welcome AL Direct readers!

Migration versus emulation

The problem of "expert location"

The dangers of intertwingularity

If not now, when?

Last push for Louisville Free Public Library

The humble index

Tidbits, 24 August 2009

A little Friday metablogging

Let Them Eat Disk

The classical librarian

Please don't do this! A word about keywords

The accidental informaticist

Tidbits, 14 August 2009

Not turning up our noses

Community and archival

Cristina Eisenberg's The Wolf's Tooth: Keystone Predators, Trophic Cascades and Biodiversity

Beyond Black Holes: Could LIGO Have Detected Merging Neutron Stars For The First Time? (Synopsis)

Experimental Biology 2011 - Gypsies