Subscribe to the BMJ Web Development Blog blog feed here. Either copy this feed address and paste it into your news reader software or click the relevant one-click subscription button:
There is general consensus in the publishing community that online documents have too long been like yesterday’s paper—flat, lifeless, inactive. For many years, we have been trying to move away from ‘paper under glass’ and reconceptualise scholarly output using the technologies available now.
Elsevier have invested significant time in their Article of the Future project, many have experimented with semantic publishing, and services such as Utopia Docs have attempted to breathe new life into PDFs. Now the innovative open access journal, eLife, has released a new tool in the hope of making articles easier to read online for researchers, authors and editors alike: eLife Lens.
Research and discovery in the life sciences is a pretty complicated business. The complexity of the modern scientific process seems to be a reflection of the intricacies of life and the processes associated with disease and its treatment. Furthermore, as technologies become more advanced, so too does the problem of managing the ever expanding quantity of data being generated.
Currently, pharmaceutical companies expend significant and duplicated efforts aligning and integrating their internal information with public data sources. This process is largely incompatible with large-scale computational approaches and the vast majority of drug discovery sources find it difficult to complicate with eachother. more…
BioMed Central have launched a free Cases Database that allows clinicians, researchers, teachers and patients to explore peer-reviewed medical case reports from multiple journal publishers (including BMJ Group, BioMed Central and Springer). The database is freely accessible and contains 11627 cases from 100 different journals.
It is hoped that bringing together so many case reports will encourage the identification of trends and patterns, thereby providing researchers with hypotheses for further systematic research.
The database offers structured search and filtering by condition, symptom, intervention, pathogen, patient demographic and many other data fields, allowing fast identification of relevant case reports to support clinical practice and research.
At last week’s STM Innovations Seminar, thought leaders from a range of disciplines converged to discuss the latest developments in publishing.
The opening keynote speaker was Richard Padley, MD of Semantico, who announced to a surprised audience that the Semantic Web was in fact dead. Next up was Herbert Van de Sompel, who unveiled his work on recreating the web-based scholarly record as it was at a certain point in time; a plug-in called Memento. This allows the user to see resources as they existed in the past (including citations that point to archived copies of papers, if available). Anita de Waard, Disruptive Technologies Director for Elsevier, shared a number of recent projects that aim to accelerate the revolution in executable research. Of particular interest was the Claim-Evidence Network in Medicine, which will aggregate data to automatically update clinical decision support systems (CDS) using linked data. more…
At its most basic level, Utopia Documents is a PDF reading tool that allows articles to be augmented with interactive content, and helps the reader explore data associated with a particular paper. It’s a desktop application for reading and exploring papers, and functions in many respects like a normal PDF reader. Its real potential becomes clear when configured with appropriate domain-specific ontologies and plugins. Once these are in place, the software transforms PDF versions of articles from static facsimiles of their printed counterparts into dynamic gateways to additional knowledge, linking both explicit and implicit information embedded in the articles to online resources, as well as providing access to auxiliary data and interactive visualisation and analysis tools. For a thorough demonstration of the software, take a look at the video below:
Why concentrate on PDF’s and not HTML?
Given the huge investment in XML/HTML versions of articles, are we taking a step backwards by semantically-tagging our PDF’s? Professor Teresa K. Attwood, who led the bio-informatics component of the EPSRC/DTI-funded UTOPIA(d) project, argues that:
“Utopia Documents was developed in response to the realization that, in spite of the benefits of ‘enhanced HTML’ articles online, most papers are still read, and stored by researchers in personal archives, as PDF files. Several factors likely contribute to this reluctance to move entirely to reading articles online: PDFs can be ‘owned’ and stored locally, without concerns about web sites disappearing, papers being withdrawn or modified, or journal subscriptions expiring; as self-contained objects, PDFs are easy to read offline and share with peers (even if the legality of the latter may sometimes be dubious); and, centuries of typographic craft have led to convergence on journal formats that (on paper and in PDF) are familiar, broadly similar, aesthetically pleasing and easy to read.”
Further authors have responded to reservations regarding the semantically-limited nature of PDF’s as being a non-issue.
“We argue that PDFs are merely a mechanism for rendering words and figures, and are thus no more or less ‘semantic’ than the HTML used to generate web pages. Utopia Documents is hence an attempt to provide a semantic bridge that connects the benefits of both the static and the dynamic online incarnations of published texts.”
“Utopia Documents links scientific research papers to the data and to the community. It enables publishers to enhance their publications with additional material, interactive graphs and models. It allow the reader to access a wealth of data resources directly from the paper they are viewing, makes private notes and start public conversations. It does all this on normal PDFs, and never alters the original file. We are targeting the PDF, since they still have around 80% readership over online viewing.”
Explore article content
An integrated semantic search bar enables users to explore the biological content of an article from within a PDF reader. This offers readers the opportunity to investigate aspects of a scientific article further or clarify given terms.
Discover published metadata
If a publisher has invested in the appropriate domain-specific ontologies and plugins, Utopia Documents can provide access to additional context, from database entries to golssary definitions. All new articles in the Semantic Biochemical Journal, for example, include publisher-curated annotations of the most salient facts.
Comment on articles
The software allows readers to annotate their PDF’s, either privately for personal reference or publicly as part of an online discussion.
Interact with live data
Utopia Documents allows users to interact directly with curated database entries. Within the familiar setting of a PDF reader, they can play with molecular structures; edit sequence and alignment data and even plot curated tabular data.
Many scholars of research behaviour argue that for electronic journals to survive and thrive, they must be different from their print antecedents. Although it is certainly true that online journals must offer added functionality, it would be more appropriate to refer to the printed versions as competitors rather than predecessors. Designers and publishers must therefore fully exploit the electronic medium’s basic properties, with ‘interactivity’ as the primary characteristic of new technologies. Utopia Documents allow the user to search through an integrated search bar, play with molecular structures and annotate documents for online collaboration. While reading electronic journals is not the same as reading a print copy, it’s time to fully exploit the opportunity of these electronic documents by offering users advanced features and novel forms of functionality beyond what is possible in print.
Following a previous post on the Semantic Web, this week we’ll be exploring the implications of this web of data for the publishing world. Semantic web technologies, as opposed to the grander idea of the Semantic Web itself, offer tools that can help publishers assemble and distribute their content more efficiently.
What is semantic publishing?
Fundamentally, semantic web publishing refers to information published on the web, accompanied by semantic markup. Semantic publication makes information search and data integration more effective by equipping computers with the ability to understand the structure and even the meaning of the published information. In the Semantic Web, published information is accompanied by metadata describing the information, thereby providing a ‘semantic’ context.
What difference could this make to the publishing world?
Many believe that semantic publishing has the potential to revolutionise scientific publishing. Tim Berners-Lee predicted in 2001 that the Semantic Web “will likely profoundly change the very nature of how scientific knowledge is produced and shared, in ways that we can now barely imagine”.Revisiting the Semantic Web in 2006, he and his colleagues argued that it “could bring about a revolution in how, for example, scientific content is managed throughout its life cycle”.Researchers could directly self-publish their experiment data in ‘semantic’ format on the web and semantic search engines could then make these data widely available.
Creating richer metadata – the technical bit
Metadata is used by most publishers in some capacity. The majority also use taxonomies (a hierarchy of terms used to categorise content), although they might not be aware of this name. The next step towards richer metadata is the use of ontologies. Mimicking the relationship between taxonomies and metadata, ontologies make taxonomies look ‘flat’. Ontologies describe more detailed relationships among concepts and provide a higher level of richness in the metadata.
Taxonomies are very similar to the animal and plant kingdom taxonomies, in which every species is located in a particular branch. However, more conceptual objects don’t always fit so nicely into this basic lineage. If a publisher created a taxonomy based on colours with the following—red, yellow, and blue—as the top nodes, purple would need to be related to both red and blue. In a simple taxonomy, the term ‘purple’ would probably be repeated under both, but in a technical sense they would actually be two distinct nodes that have the same name.
In an ontology, however, purple can be represented as the same concept appearing in multiple nodes on the tree. However, rather than being tree-like, ontologies are a complex mapping of concepts with defined relationships between those concepts (such as ‘subclass of’ or ‘part of’).
In the video below, Louise Tutton, COO at Publishing Technology, talks about the Semantic Web and its opportunities at Online Information, London (30th November).