Promoting a comment: "Open and shared format"

Richard Wallis has taken my ribbing in good part, which I appreciate; his response is here and will reward your perusal.

He also left a comment here, part of which I will make bold to reproduce:

As to RDF underpinning the Linked Data Web - it is only as necessary as HTML was to the growth of the Web itself. Documents were being posted on the Internet in all sorts of formats well before Tim Berners-Lee introduced us to the open and shared HTML format which facilitated the exponential growth of the Web. Some of the above comments are very reminiscent of the "why do I need to use HTML" discussions from the mid 1990's.

It is an open and shared format, such as RDF, that will power the exponential growth of the Linked Data web, but the conversations around it are still at the equivalent of 1995 stage.

If I read this right, Richard is not actually saying that the web is all HTML and therefore HTML is Good and All Web Things Must Be HTML. That's good, because that would be a silly thing to say. The web I use has plenty of CSS and Javascript and XML and JSON and JPEGs and PNGs and Flash (gah) and PDF (double gah) and other stuff on it.

What Richard is saying (again, as I read it) is more subtle: widespread growth of the data web requires an open standard to cut through the Babel of competing and closed formats the same way that HTML cut through the Babel of document formats, because without that interoperability is too much effort and so no one realizes the benefits.

Richard is welcome to check my understanding; I may have this completely wrong. Nonetheless, I don't believe a word of it, and I especially don't believe it if RDF is the HTML analogue (which, let's be clear, Richard very carefully did not say). Here's why I don't.

First, HTML was hardly the only part of the web stack necessary to its explosion. TCP/IP, anyone? Moreover, HTML by itself is obviously insufficient as the driver of that explosion, or we'd all still be on Gopher (remember Gopher?). Formatted strings of words are not all we monkeys interact with. Neither are assertions, about documents or anything else. (The whole thing about "not all data are assertions" seems to escape some of the die-hardiest RDF devotees. I keep telling them to express Hamlet in RDF and then we can talk.)

Second, I don't know that we need to rely on a single data format for interoperability. It's not impossible, but remains to be proven. The data web that I personally think is more likely closely resembles today's mashup and microformats cultures: lots of formats with suitable documentation (one hopes) and APIs, available for use by whoever's willing to suss out how the various datasets work and write code to glue them together. It's a rough-and-ready sort of interoperability, arguably an inefficient one, but eppur si muove, as Galileo did not say of the web.

Third, I'm not entirely convinced we need to rely on interoperability and its network effects as our incentive toward data-sharing. Tim BL certainly did; there wasn't much technical precedent for what he was up to. But we have the web already, a cogent argument if ever there was one. We also have governments, grant agencies, and businesses wanting to multiply return on investment in data. RDF seems downright small-potatoes by comparison, as incentives go.

Finally, the HTML:RDF analogy falls down in one area that I think is utterly crucial: ease of adoption. I can teach enough HTML (and CSS) to be going on with in a couple of hours; I've done it. I still touch RDF only with great fear and loathing and a constant sensation that I must be doing it wrong, and I'll teach it only when I absolutely must and with a great many "I don't pretend to understand this" disclaimers. You can't frighten me with XML namespaces, XPath, XSLT, or regexes, but RDF scares me stiff. This is not an open standard that's going to rule the world. Not today, not tomorrow, and in my opinion not ever.

There's another danger lurking in the one-format-to-rule-them-all argument, a danger I hinted at above: what happens to data that for whatever reason aren't expressible in the format of choice? Second-class citizens? Invisible? I hope not.

Anyway, I say again: if the data web depends on RDF, the data web is a pipe dream and we should look for something else to do. I'd much rather believe the "if" clause counterfactual.

Tags

More like this

Dorothea has written a typically good post challenging the role of RDF in the linked data web, and in particular, its necessity as a common data format. I was struck by how many of her analyses were spot on, though my conclusions are different from hers. But she nails it when she says: First, HTML…
I was in a roundtable yesterday talking about Health IT with a bunch of very smart people in the bay area. It was sort of a briefing of ourselves and others about the real issues underpinning what it would take to generate real disruptive innovation in health technology and health costs. The vast…
When I wrote this post, I left out a whole second "trigger" because of time and energy. That trigger--once again, wondering whether my humanities background (rhetoric major, math minor) leaves me simply unable to cope with the true Scientific Mind--regarded the format used for publication. Or, to…
I was reading the latest issue of the Journal of Digital Information today, and I found myself wishing I could turn the Readability bookmarklet loose on half its PDF-only articles. I'm sorry, authors. I know you tried, but those PDFs are terrible-looking. Times New Roman, really? (The one in Arial…

Strong stuff. And I agree completely. I especially like: "The data web that I personally think is more likely closely resembles today's mashup and microformats cultures: lots of formats with suitable documentation (one hopes) and APIs, available for use by whoever's willing to suss out how the various datasets work and write code to glue them together."

I *do* think RDF has a place at the table, but not as the one-format-to-rule them all. It may also be the case that bits of the RDF will sneak in through the back door (look at Facebook's drop-dead easy Open Graph Protocol work -- it's just HTML!). Reuse-friendly vs. Not resuse-friendly is a continuum, not an all (RDF) or none (not-RDF) proposition.

My real concern is that we are operating on two separate tracks -- the Linked Data side forges ahead w/ a specific vision, and the rest of the world blissfully ignores. I would love to see the Linked Data movement take a more realistic approach and accept the fact that the viewpoint you express here is *widely* held (if not in specifics, in basic conclusions). Of all the work happening right now, efforts like the Open Graph Protocol are the most exciting. I also happen to think that JSON offers an absolutely superb format for data that is drop dead simple to create, share and reuse (e.g., check out an Picasa Web album in Google's JSONC format in a nice JSON viewer: http://bit.ly/brZdSU as-simple-as-it-gets "linked data").

"It's a rough-and-ready sort of interoperability, arguably an inefficient one, but eppur si muove, as Galileo did not say of the web."

Though I like to think he would have if he'd been around. It's worth recalling that the web itself was not infrequently seen in its early days as taking a too rough-and-ready approach to interoperability. There were other networked hypertext systems out there, after all, ones that in many ways were quite graceful from a purely formal standpoint.

The web? No way to follow links *back* as well as forward! Why, there's no guarantee the links will even *work*! And don't get started on HTML-- it may *claim* to be SGML-compliant, but most web pages just string tags together, and don't pass formal validation at all!

Yet somehow the Web managed to take off stratospherically where the other hypertext systems didn't. It was enough to have something that basically worked for the common use cases, and had a modicum of structure on which additional services could be built. (You can use Google, referrer logs, or Technorati to find out who's linking to you, for instance; that capability didn't have to be baked into the Web architecture itself.)

Similarly, I think that if linked data's going to really take off, people will have to accept, and find better ways to cope with, the inevitable messiness that occurs when people put data online. Yes, that means that sometimes people will incorrectly refer to objects instead of documents, or vice versa, or (to take a library example) works instead of expressions, or any of the many pet peeves one sees recurring in mailing list discussions. You either deal with that, or you resign yourself to engaging with a niche instead of the world.

Hmmm. Not sure if I agree with the analysis. The web was 3 things: HTTP for transfer, URLs for links, and HTML for rich documents that could contain links, and transferred easily over HTTP. The 3 parts worked together. Gopher was pretty much just a simple protocol; gopher documents were pretty much endpoints (hec, many of them were .doc files, pre Word-for-Windows, about as flat as you can get) whereas HTML documents are fundamentally rich and linked.

I write this as someone who was pretty confident, back in 1993 or thereabouts, that the web would fail compared to gopher. The reason was, the web needed all those existing documents to be re-coded into HTML, whereas gopher just let you serve them up. So I clearly got that wrong; people (slowly at first, but with ever-gathering pace) saw enough of the advantages to do that recoding, and by late 1994 I was giving courses to librarians on HTML.

But what does that say wrt RDF and Semantic Web or Linked Data? Almost nothing, except that RDF (and a superstructure of vocabularies) does seem to be a simple, reductionist way of expressing enough kinds of data constructs and connections, that enough people see value in, that it is gathering pace. Who would have believed two years ago the quantities of Linked Data we have available now?

I don't think most researchers should have to think in RDF terms, any more than most researchers have to think in HTML terms. More of the former than the latter, perhaps, as there is ten years' less maturity in Linked Data, so the tools are... well, crap. But if your data are in structured form right now (eg in a database), then making them available as RDF is something your favourite geek can probably do in much less than a weekend.

It does seem to have something of the momentum of the late 1990s web!

By Chris Rusbridge (not verified) on 09 Jul 2010 #permalink

Not sure about that last bit, Chris. An awful lot of the get-it-into-RDF efforts I've seen end up spiraling down the ontology rathole. It can be and usually is unbelievably non-obvious how best to represent something in RDF.

I think the web, even the early web, was more than those three things. ;)

"First, HTML was hardly the only part of the web stack necessary to its explosion. TCP/IP, anyone? "

A nitpick, but HTTP would be a better example. TCP/IP is to "the internet" as HTTP is to "the web". The twin standards of HTTP (transport) and HTML (content) are what created the web, and were indeed succesful at doing it.

Richard's analogy of HTML->web as RDF->linked-web is a good one for his argument, it is somewhat thought-provokingly persuasive. But I still tend to fall on your side of things.

I guess I lack faith that RDF _will_ catch on in the ways semantic web enthusiasts hope (predict?). The analogy with HTML can be returned too -- what factors led to the actual success of HTML? It's technical superiority for solving certain problems is probably NOT it. (If it even has such superiority, it's really a pretty inelegant hacky standard from some perspectives). Probably more to do with the ultimate success of HTML/HTTP was the incredible simplicity of creating simple web pages that _worked_ (even if they were not actually 'legal', as many of them were not!), without having to know what you were doing.

What does "working" means in that 'the web' context? Provide content that can be easily accessed by other clueless users, and easily linked to by other web authors. Creating, as this caught on, the, well, "web" of content that we know and love/hate.

What does "working" mean for "linked data"? (This question does not neccesarily have a simple or universally agreed upon answer, and it's difficult to be all talking about the same thing until we know what each other means by this, which we don't really).

How likely is RDF to catch on in order to achieve that? How possible is it for non-RDF to achieve that kind of "working"? How difficult is it for the individual to use RDF to achieve that "working"? If that "working" _requires_ some fairly challenging work to achieve... what does that say for the likelyhood of the "working" goal occuring? Is there a way to approach "working" with less challenging means (than RDF?), means where simple things are incredibly simple and complexity of implementation rises proportionally to complexity of goals?

I do wonder if there is any "there" there, when it comes to RDF. I see it shoe-horned into some very ungrounded situations and I worry that it is one of those geek-fashioned abstractions that fills a much-needed gap.

Actually, most of my experience with RDF is confirmation of how little we understand what language is for us and our willingness to project meaning where it doesn't live.

I would not be surprised that RDF has power as a data structure in very well-circumscribed domains where the using community can maintain a coherent conception of the application. But as a hammer looking for nails, I find RDF as worrisome. And, amidst from the misappropriation of "ontology" I wonder if we will every figure out what the "semantic" bit is (although it is, I suppose, an ontological commitment of sorts to grant being to whatever semantics is with regard to RDF).

Hey, I've been too long without a Dorothea fix, and I am overjoyed (whatever that means).