You don't need to be signed in to read BMJ Blogs, but you can register here to receive updates about other BMJ products and services via our site.

A brave new world for PDF’s? Utopia Documents explained.

21 Jan, 11 | by BMJ

Following on from last week’s discussion of information-seeking behaviour, today we’ll be exploring one way of transforming individual articles into portals to greater information; Utopia Documents .

What is Utopia Documents?

At its most basic level, Utopia Documents is a PDF reading tool that allows articles to be augmented with interactive content, and helps the reader explore data associated with a particular paper. It’s a desktop application for reading and exploring papers, and functions in many respects like a normal PDF reader. Its real potential becomes  clear when configured with appropriate domain-specific ontologies and plugins. Once these are in place, the software transforms PDF versions of articles from static facsimiles of their printed counterparts into dynamic gateways to additional knowledge, linking both explicit and implicit information embedded in the articles to online resources, as well as providing access to auxiliary data and interactive visualisation and analysis tools. For a thorough demonstration of the software, take a look at the video below:

Why concentrate on PDF’s and not HTML?

Given the huge investment in XML/HTML versions of articles, are we taking a step backwards by semantically-tagging our PDF’s? Professor Teresa K. Attwood, who led the bio-informatics component of the EPSRC/DTI-funded UTOPIA(d) project, argues that:

“Utopia Documents was developed in response to the realization that, in spite of the benefits of ‘enhanced HTML’ articles online, most papers are still read, and stored by researchers in personal archives, as PDF files. Several factors likely contribute to this reluctance to move entirely to reading articles online: PDFs can be ‘owned’ and stored locally, without concerns about web sites disappearing, papers being withdrawn or modified, or journal subscriptions expiring; as self-contained objects, PDFs are easy to read offline and share with peers (even if the legality of the latter may sometimes be dubious); and, centuries of typographic craft have led to convergence on journal formats that (on paper and in PDF) are familiar, broadly similar, aesthetically pleasing and easy to read.”

Further authors have responded to reservations regarding the semantically-limited nature of PDF’s as being a non-issue.

“We argue that PDFs are merely a mechanism for rendering words and figures, and are thus no more or less ‘semantic’ than the HTML used to generate web pages. Utopia Documents is hence an attempt to provide a semantic bridge that connects the benefits of both the static and the dynamic online incarnations of published texts.”

What are the main features of Utopia Documents?

In an interview at the Guardian, Utopia’s Phillip McDermott says:

“Utopia Documents links scientific research papers to the data and to the community. It enables publishers to enhance their publications with additional material, interactive graphs and models. It allow the reader to access a wealth of data resources directly from the paper they are viewing, makes private notes and start public conversations. It does all this on normal PDFs, and never alters the original file. We are targeting the PDF, since they still have around 80% readership over online viewing.”

Explore article content

An integrated semantic search bar enables users to explore the biological content of an article from within a PDF reader. This offers readers the opportunity to investigate aspects of a scientific article further or clarify given terms.

Discover published metadata

If a publisher has invested in the appropriate domain-specific ontologies and plugins,  Utopia Documents can provide access to additional context, from database entries to golssary definitions. All new articles in the Semantic Biochemical Journal, for example, include publisher-curated annotations of the most salient facts.

Comment on articles

The software allows readers to annotate their PDF’s, either privately for personal reference or publicly as part of an online discussion.

Interact with live data

Utopia Documents allows users to interact directly with curated database entries. Within the familiar setting of a PDF reader, they can play with molecular structures; edit sequence and alignment data and even plot curated tabular data.

Many scholars of research behaviour argue that for electronic journals to survive and thrive, they must be different from their print antecedents. Although it is certainly true that online journals must offer added functionality, it would be more appropriate to refer to the printed versions as competitors rather than predecessors. Designers and publishers must therefore fully exploit the electronic medium’s basic properties, with ‘interactivity’ as the primary characteristic of new technologies. Utopia Documents allow the user to search through an integrated search bar, play with molecular structures and annotate documents for online collaboration. While reading electronic journals is not the same as reading a print copy, it’s time to fully exploit the opportunity of these electronic documents by offering users advanced features and novel forms of functionality beyond what is possible in print.

Utopia Documents is free and can be downloaded here: http://getutopia.com/documents/

Semantic publishing: how to create richer metadata

10 Dec, 10 | by BMJ

Following a previous post on the Semantic Web, this week we’ll be exploring the implications of this web of data for the publishing world. Semantic web technologies, as opposed to the grander idea of the Semantic Web itself, offer tools that can help publishers assemble and distribute their content more efficiently.

What is semantic publishing?

Fundamentally, semantic web publishing refers to information published on the web, accompanied by semantic markup. Semantic publication makes information search and data integration more effective by equipping computers with the ability to understand the structure and even the meaning of the published information. In the Semantic Web, published information is accompanied by metadata describing the information, thereby providing a ‘semantic’ context.

What difference could this make to the publishing world?

Many believe that semantic publishing has the potential to revolutionise scientific publishing. Tim Berners-Lee predicted in 2001 that the Semantic Web “will likely profoundly change the very nature of how scientific knowledge is produced and shared, in ways that we can now barely imagine”. Revisiting the Semantic Web in 2006, he and his colleagues argued that it “could bring about a revolution in how, for example, scientific content is managed throughout its life cycle”. Researchers could directly self-publish their experiment data in ‘semantic’ format on the web and semantic search engines could then make these data widely available.

Creating richer metadata – the technical bit

Metadata is used by most publishers in some capacity. The majority also use taxonomies (a hierarchy of terms used to categorise content), although they might not be aware of this name. The next step towards richer metadata is the use of ontologies. Mimicking the relationship between taxonomies and metadata, ontologies make taxonomies look ‘flat’. Ontologies describe more detailed relationships among concepts and provide a higher level of richness in the metadata.

Taxonomies are very similar to the animal and plant kingdom taxonomies, in which every species is located in a particular branch. However, more conceptual objects don’t always fit so nicely into this basic lineage. If a publisher created a taxonomy based on colours with the following—red, yellow, and blue—as the top nodes, purple would need to be related to both red and blue. In a simple taxonomy, the term ‘purple’ would probably be repeated under both, but in a technical sense they would actually be two distinct nodes that have the same name.

In an ontology, however, purple can be represented as the same concept appearing in multiple nodes on the tree. However, rather than being tree-like, ontologies are a complex mapping of concepts with defined relationships between those concepts (such as ‘subclass of’ or ‘part of’).

In the video below, Louise Tutton, COO at Publishing Technology, talks about the Semantic Web and its opportunities at Online Information, London (30th November).

http://www.youtube.com/watch?v=Ky_JUDWXEDU

BMJ Journals Development blog homepage

BMJ Web Development Blog

Keep abreast of the technological developments being implemented on the BMJ journal websites.



Creative Comms logo