Chemistry: on the internet or in cyberspace?

By jwilbanks on October 23, 2008.

I'm at a workshop on eChemistry today, and we were asked to prepare position statements. I'm not going to blog the conference - it's a private thing - but figured I would post my position statement here.

We were asked to answer some questions. I chose to answer this one: "do you assess the potential of new web-based communication models in Chemistry, i.e. their benefits or liabilities, their transformational power, and their chance of success?"

Full text is after the jump.

A good place to start is the transformation of scholarly communication from "using the internet" to "existing in cyberspace." I take this distinction from my own personal introduction to this world, Larry Lessig:

>>>>
"EVERYONE WHO IS READING THIS BOOK HAS USED THE INTERNET. SOME HAVE BEEN in "cyberspace." The Internet is that medium through which your e-mail is delivered and web pages get published. It's what you use to order books on Amazon or to check the times for local movies at Fandango. Google is on the Internet, as are Microsoft "help pages."

But "cyberspace" is something more. Though built on top of the Internet, cyberspace is a richer experience. Cyberspace is something you get pulled "into," perhaps by the intimacy of instant message chat or the intricacy of "massively multiple online games" ("MMOGs" for short, or if the game is a role-playing game, then "MMORPGs"). Some in cyberspace believe they're in a community; some confuse their lives with their cyberspace existence. Of course, no sharp line divides cyberspace from the Internet. But there is an important difference in experience between the two. Those who see the Internet simply as a kind of Yellow-Pages-on-steroids won't recognize what citizens of cyberspace speak of. For them, "cyberspace" is simply obscure. "

(Lessig, Code and Other Laws of Cyberspace v. 2)
>>>>

What we've been doing for the most part in scholarly communication is using the internet. We've been making digital versions of papers - PDFs - and using the network to post them. You can use the network to order them, rent them, read them. But they're not in cyberspace in this concept - they're not interactive by technical terms, social terms, or legal terms. They are actually less free - thanks to DRM and the move to lease terms from sale terms - than they used to be. OA in many ways is a reaction to this irony, as well as a response to the two pressing problems of increased serials pricing and filter failure for scientific information.

The transformational power of getting into cyberspace for scholarly publishing is huge. If we can start to leverage both a) the power of the crowd and b) the power of technological enhancement more efficiently - i.e., without the high transaction costs, permission barriers, and information exclusion - then the mathematical odds of someone, somewhere making a breakthrough discovery go up.

That could be innovations in scholarly communication itself - perhaps a new way to index information, like google represented in the late 1990s. Imagine if Brin and Page had been forced to negotiate access to web pages before hacking, and the paucity of the excuse that "we let Google index our content" serves. If that had been the attitude in the Web we'd still all be using yahoo taxonomies, because everyone would have done the deal with the existing dominant force, blocking the emergence of innovative entrepreneurial search. That's where we are with scholarly search and indexing now.

It could also be innovations in the science itself. There are a lot of smart people in this world who don't have access to integrated information - who can't afford access to the literature, or to the costly indexing and integration services that surround it. Who have hypotheses they can't test rapidly against the published information space. A world in which the data and the literature are more densely integrated means a world where model-building gets a lot easier. What we are doing here is reducing the time and cost at which the Kuhnian revolution cycles operate - dumb ideas get exposed faster, and good ideas get validated faster. This is about the only way to accelerate those revolutions that does not rely on magical thinking: if we can make the things we know more useful in the evaluation of hypotheses and models, we are simply increasing the mathematical odds of discovery. This is the transformational potential. It is treating the literature and data online as elements in a vast periodic table of knowledge, a common reference point against which we can test how things fit together.

This potential does not mean the end of publishers or of peer review. In my view, it makes them both even more important, though both will of necessity be forced to evolve some new methods to deal with the new world. In science cyberspace the guarantees of provenance and persistence will be essential - who said what, and when, and how can we ensure the durability of a citation? Was a link crowdsourced or peer reviewed? And what business models emerge from this - in particular the sale of peer reviewed links ahead of publication, so that publishers can sell a set of links that have extra "trust" attached while releasing the underlying text for crowdsourced, free linking to follow publication?

But, as we move into a more OA world (even if not a total OA world), as semantics come of age in publishing prose and data and the hybrid "object-relation" world that much more accurately reflects the reality of any given experimental research output, we have a lot of choices to make. None of the outcomes are foreordained, because none of them are natural - they're all creatures of code that humans write, and that code can be changed, to make it more open or less open. We can think here about the idealistic early days of the network, when many wrote of the "innate openness" of cyberspace. The current battles over net neutrality reflect a more realistic world: we made the network open, and we can also make it closed.

So the signal question is: should the models be walled gardens or www? Walled gardens bring more short-term benefit, but also more long term liabilities. AOL in 1991 was a much richer user experience than the Web in 1991, but the very openness and chaos of the Web included the seeds of its explosion, just as the enclosure of AOL ensured it would fail to receive the benefits of its users' innovations. There is compelling research into the nature of "generative" systems - systems that *by their design* result in the creation of unexpected outputs, where the design ethos is to give the user power, rather than to give the user what the designer thinks they want. Generative systems are capable of unbelievable explosive power. Thus, compared to AOL 1991, WWW 1991 brings more transformational power over time and thus more likelihood of "success" - if we define success as a generative system akin to the www. But the definition of success is as important here as anything.

It's also essential to note, as Tony Hey of Microsoft has repeatedly pointed out to me, that generativity is not the exclusive province of open source. The PC is a generative platform: anyone can write code to the C prompt. That simple decision is one of the primary reasons the Apple computers failed to keep up with the PCs for so long. Apple chose control, and a beautiful user interface. PCs chose generativity, and dominate the market today. We'll have a chance to watch the same battle take place in phones with Android and the iPhone, though the iPhone is at least now partially generative.

Generative systems are also vulnerable to abuse. Spam is the example that we all probably know most personally. But the openness of a system seems to correlate quite well with its abuse - the openness creates a generative power, which creates value and draws users, and those users in turn draw spammers, liars, phishers, Nigerian bank fraud, and more. The pain of this in email is one thing. In scholarly communications it could be truly dreadful - how would you track down and fix the malicious impact of a virus writer who systematically screws with the numbers in tables on clinical trial data, or drug structures, or QSAR data, in an integrated open web?

Also, any such system that requires major internal investment in infrastructure (i.e. bespoke design, as opposed to www systems) will likely result in the concentration of power in the corporate entities that have the funds to invest in r&d. Very few small society publishers or small independent journals will survive in a world where semantic enhancement of publication requires much more than a few clicks to achieve, and will instead continue to find shelter inside a rapidly shrinking number of corporate homes. This is a major potential consequence of OA + SW and must be mitigated through good strategy and investment in open semantic web infrastructure if it is to be avoided.

I come back again to the importance of the publishing and peer review. Provenance and persistence, citation and verifiability, these are the services that will be essential to figuring out what pieces of content can be trusted. But these services do not rely on control of copyright or content - they are quality control services, and they are much better suited to the province of trademark and brand than to copyright. Trademarks and quality certification represent a method to both create new business models and ensure trust on the generative web that do not at the same time restrict the very generative powers that we need to accelerate innovation and discovery cycles.

There will always be a need for the trust that publishers create through peer review. That trust might come from lots of angles, including the traditional publishers. But we cannot continue to stymie the power of the network to help us make discoveries and advancements in science. This is one of our only non-miraculous avenues to improving the way science works. We need science too much, and we need science to work better, now.

More like this

Semantic Enhancements of a Research Article

In today's PLoS Computational Biology: Adventures in Semantic Publishing: Exemplar Semantic Enhancements of a Research Article: Scientific innovation depends on finding, integrating, and re-using the products of previous research. Here we explore how recent developments in Web technology,…

Doing Science at ScienceOnline2010 - data, search, publishing and putting it all together

Of course, this conference would not be itself if it was not full of Open Access evangelists and a lot of sessions about the world of publishing, the data, repositories, building a semantic web, networking and other things that scientists can now do in the age of WWW. This year, apart from…

Blogging and Fair Use

How do copyright and fair use laws, framed before the internet was a twinkle in the eye, apply in the world of blogging? The answer, as a case that unfolded on ScienceBlogs this week demonstrates, may be "not so clearly." Ergo, we've asked a few experts and stakeholders to weigh in on the issue of…

This PRISM does not turn white light into the beautiful colors of the rainbow

When technological or social changes start altering the business landscape in a particular industry, people involved in that business tend to respond in three general ways. The visionaries immediately see where their world is going, jump to the front edge of it and make sure that the change is as…

Internet is very usefull.
I have gotten my Finall study topic from internet.
I can find alot of information there.But we must aware from
phisingactivity.

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

More by this author

World Opera, Collaborative Science, and Getting On The One

March 3, 2011

(blows off the dust since the last entry) (Life trumped blogging; my first child was born in March) Just before I went into the parent tunnel, which is awesome by the by, I attended a seminar conducted by Niels Windfeld Lund, General Manager of the World Opera. Not my usual event. But music's…

Documents and Data...

September 10, 2010

Last month I was on Dr. Kiki's Science Hour. Besides being a lot of fun (despite my technical problems, which were part of my recent move to GNU/Linux and away from Mac!), I also discovered that at least one person I went to high school with is a fan of Dr. Kiki, because he told everyone about the…

Marking and Tagging the Public Domain

August 11, 2010

I am cribbing significant amounts of this post from a Creative Commons blogpost about tagging the public domain. Attribution is to Diane Peters for the stuff I've incorporated :-) The big news is that, 18 months since we launched CC0 1.0, our public domain waiver that allows rights holders to place…

rdf:about="Shakespeare"

July 11, 2010

Dorothea has written a typically good post challenging the role of RDF in the linked data web, and in particular, its necessity as a common data format. I was struck by how many of her analyses were spot on, though my conclusions are different from hers. But she nails it when she says: First, HTML…

Of Pepsi and ScienceBlogs...

July 7, 2010

I've gotten a few emails about the Pepsi-ScienceBlogs tempest. It's clearly taken a toll on ScienceBlogs' credibility. Some of my SciBlings have resigned in protest, and others are taking shots on the topic. Sponsorship is part of scientific publishing, even in the peer reviewed world. Remember how…