A big topic of conversation at Scifoo seems to be the future of scientific communication. I have renounced using the term Open Access, this term has been applied to so many different aspects of scientific publishing that it is utterly worthless. It's a buzz word. It's cool. But what does it mean? And instead of talking about the practicalities, much of the conversation is ethereal.
We need to define the real issues. What scientists want. What scientists need. How does science publishing impact the lives of scientists, both as a producer of scientific data and as a consumer. After a session on Open Access and Web 2.0, organized by Bora, Andrew Walkingshaw and I were talking to Eric Lander about all the anxieties of a young scientist who is subjected to the changes within scientific publishing. We need to clearly define the issues.
Tomorrow Andrew and I will present these issues in a session. We will then write up a position paper summarizing all the components that must be taken into account before we head into the brave new world of "free scientific information".
I've cut'n pasted the main points that we would like to get across (see bellow the fold). Some online feedback would be much appreciated.
1) Open communication
This means the free and rapid dissemination of ideas and results. Ideally this would include feedback from readers and the authors.
We all want this in principle. It benefits researchers (1st, 2nd and 3rd parties) and the public. It democratizes access to science publishing. It allows for data mining.
2) Credit
Scientists are subjected to the publish or perish system. This affects the funding of our research and our career prospects. Gaining credit for key ideas and findings is of paramount importance. It is thus the largest source of anxiety for young scientists. One of the emerging issues is whether scientists should get credit for data presented within a prepublication forum, such as a blog or in an openly disclosed lab notebook (such as that of Jean-Claude Bradley). Who and what takes precedence?
3) Peer review I - objective assessment of scientific data
We need ti have a system to assess whether the data from a publication is self consistent. One interesting aspect is the degree of "correctness". For example to publish in PLoS One a manuscript must contain a properly formatted complete story, but to publish in Nature Precedings a manuscript must meet minimal requirements (i.e. that it is not pseudoscience).
4) Peer review II - subjective assessment of scientific data
In other words, what is the value and significance of a body of work. This is an important issue for those who asses a scientists' contribution to his or her field. It impacts how a scientist is funded and his/her career. Subjective assessment also helps consumers find the most relevant and most important work. The problem is who should be the judge? Should it be the journal editors, the scientific establishment, citations, trackbacks or through a user voting system?
5) Practicality of open scientific communication
One problem for the scientific publishing industry is how will this resource be funded? If journals have simply objective filters (i.e. PLoS One) then publication is cheep. If we use the journals to filter our work (i.e. PLoS Biology) then the extra filtering gets expensive (about 2500$ per article in the case of PLoS Biology). One important question is whether the cost of publication affect where any individual can publish?
So what is the solution? Will it vary between fields? One could imagine that perhaps we all develop some standards for prepublication data and that publication would occur exclusively in repositories such as PLoS one or Nature Precedings. In that case the major journals such as Nature and Science would act as a guide to what is relevant. But is this feasible?
I'll see what the folks at scifoo say tomorrow.
- Log in to post comments
Great points! You should vlog about these as it's easier to communicate such long explanations. Don't you plan to create a videocast about these for your readers?
The Open Communication issue needs more coverage. I agree with you that rapid, free access is important. But what about the ability to reproduce published material (with proper credit given)?
I see point 1 divided into three sub-points:
1. Rapid publication of results.
2. Free access to those results.
3. Ability to reproduce the publish material.
This is something all publishers, regardless of how stringently they screen for subjective criteria, can strive for.
Great points. You absolutely should not vlog about these, as who the hell wants to watch someone read a list when they could simply read it for themselves?
(Sorry, multimedia makes me cranky. If you do podcast or vlog or whatever, please have transcripts for luddites like me!)
In re: journals and subjective assessment, I think that the current system of using journal "reputation" as a proxy for the value of a paper is badly broken. If all publications were freely available, download and citation metrics would become far richer and more flexible methods of assessing subjective value. In this vein, Pedro Beltrao has some interesting ideas about the future of journals.
I think that a lot of these questions will get sorted out as people actually do science more openly. The Open Notebook Science approach that my group is taking is not the only option. This does not have to be an all or nothing question. There is little risk (and my opinion a lot to be gained) to experimenting with a small project and just see how it evolves.
Nice points. I would add that developing digital data standards is absolutely essential. Data from different authors need to comparable, searchable, etc...
I think today redundancy is more important than establishing one standard. Just make sure that your data is searchable on the uber-database: Google.
If you are trying to get scientists to adopt a common standard to the exclusion of other systems before doing anything you will likely wait a really long time. In the meantime, just duplicate the data in the competing emerging systems and watch what happens.
JCB,
What is holding many back is that they want to be sure that they can duplicate the data without being punished for it. Two possible problems are 1) not receiving precedent if some one else publishes their stuff and 2) not being able to publish in traditional media once they've published in Web2.0 media.
Beyond searchability, I guess I was trying to get at the need to standardize biological data to make it useful. Maybe not such a big deal in cell biology for example, but in so-called "systems biology" unprecedented amounts of data are being generated and published. Although the papers can easily be found using Google scholar or pubmed, the associated data (ie microarray, chip-chip) are presented as unusable heaps of information, never to be used by anyone but the original authors. Publishers haven't come together to develop a standard for scientists to share this kind of data. Maybe there's an opening here for an open-source community to develop standards, start sharing data and kick-start open publishing in the process?
Anthony,
Don't misunderstand what I am saying. I think developing standards is very important and ultimately necessary. And I agree that an open bottom-up approach is more likely to be successful than depending on publishers.
However, I have seen the scenario where people wait for a standard to emerge before participating. Don't use the lack of a widely accepted standard as a reason for not sharing data - there are plenty of other reasons why you might not want to share but this is not a strong argument. The worse case scenario is that you'll have to reformat your data when a standard emerges.
And your point about the field dependence is quite valid. There are different obstacles to overcome in sharing data form various disciplines.
Apaiazzo,
1) not receiving precedent if some one else publishes their stuff and
If someone actually tries to plagiarize text or take your data, pretending that they did the experiments in their lab, you would be in a better position with the data on a public wiki with a third party time stamp, compared with a private research proposal reviewed only by your closest competitors.
2) not being able to publish in traditional media once they've published in Web2.0 media.
Yes there is a concern there but lets break it down. If you make data public, you could probably still write the article with unpublished TEXT summarizing the work in a new way then happily give away your copyright to that text to your favorite publisher. Maybe Science or Nature won't take it but many others will I am sure.
Anthony,
There is a push for standardization in microarrays. The biggest is at NCBI:
http://www.ncbi.nlm.nih.gov/sites/entrez?db=geo
But there is much more data (qualitatively and quantitatively) in small biology publications - this includes papers in Cell Biology, Immunology, Microbiology, Biochemistry ... There are already data miners that are trying to take advantage of this wealth of data. For example:
http://string.embl.de/ (or see my entry on this topic: http://scienceblogs.com/transcript/2007/04/string_search_tool_for_the_r… )
But since data mining small biology papers is more informative and technically challenging, people should focus their work there. Items such as microarrays are being standardized.
JCB,
Point well taken - lack of standards is no excuse not to share data in open forums, but seamless integration of developing databases (such as those pointed out by apalazzo) into web-based journals (something traditional publishers cannot offer) might provide incentive to those who would otherwise lack the motivation/foresight to jump into the open publishing arena.