Much of the data we use on a daily basis is not part of the Web. We can see bank statements and photographs online, as well as appointments in a calendar. But can we view our photos in a calendar to ascertain what we were doing when we took them? Or on a map so we know where we took them? Can we see bank statement lines in a calendar to help us put our purchases into context? The answer, currently, is no.
But why not? The simple answer is that we don’t have a web of data. Data is controlled by applications, and each application keeps its data to itself; applications don’t often like to share.
What’s different about the Semantic Web?
The Semantic Web (sometimes referred to as Web 3.0, Web 2.1 or Web 2.0++) is a web of data. The original Web mainly concentrated on the interchange of documents. The Semantic Web, however, is about more than that. It concentrates on common formats for integration and the combination of data drawn from diverse sources. It is also about language for recording how the data relates to real world objects. This allows a person, or a machine, to start off in one database, and then move through an unending set of databases which are connected not by wires but by being about the same thing.
Tim Berners-Lee described the Semantic Web vision in the following terms:
I have a dream for the Web [in which computers] become capable of analysing all the data on the Web, the content, links, and transactions between people and computers. A Semantic Web, which should make this possible, has yet to emerge, but when it does, the day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machines. The intelligent agents people have touted for ages will finally materialise. (1999)
Whereas Web 2.0 is focused on people, the Semantic Web is focused on machines. The Web requires a human operator, using computer systems to perform the tasks required to find, search and aggregate its information. It’s impossible for a computer to do these tasks without human guidance because Web pages are specifically designed for human readers. The Semantic Web is a project that aims to change that by presenting Web page data in such a way that it is understood by computers, enabling machines to do the searching, aggregating and combining of the Web’s information — without a human operator. So what are the real benefits offered by this web of data?
Intelligent search results
The major advantage of the Semantic Web is more intelligent searches, either across the web or in large-scale data repositories, where intelligence is referred to in contrast to the conventional keyword-based search methods employed by search engines. For instance, when performing a search in Google for ‘medical publishing’ you will notice that among the first pages of the results returned, the vast majority contain the keywords ‘medical publishing’ in the respective page text. That is because the search engine does not process the content available semantically and therefore the results, though accurate, will be far from complete.
This is where the semantic web comes in to play. The vision is to get a list of what you asked for even if your keyword does not exist within the web page. In the example above, a page with BMJ Group articles will not be considered relevant if the words ‘medical publishing’ do not exist within our page. In the semantic web world the system would ‘know’ that the BMJ Group publishes medical articles and therefore our articles would be returned to the user performing the query.
Inferring knowledge
Another benefit is the capacity to infer knowledge from existing data. A system built using semantic web technologies, with the support of reasoning procedures could then logically (and independently) deduce information. A classic example is that from the statements ‘all men are mortal’ and ‘Socrates is a man’, we can deduce that ‘Socrates is mortal’. This property (transitive property) in combination with a wider set of properties can augment the knowledge inserted in a system, without requiring human insertion of each and every fact, thereby reducing both error and workload.
By stating 5 facts to a system, using an ontology (a glossary) and a reasoner, the system will be able to deduce 15 facts by applying rules of logic (reasoning). This is precisely what allows the intelligent queries mentioned in the medical publishing example. Such a system, when asked “is Socrates mortal”? will return a YES. Systems without reasoning would produce the answer NO (or UNKNOWN in other cases). Similarly, Socrates would be included in a search like “show me all the mortals in the system”. This is, in fact, what is meant by ‘machine understandable’ information; the ability for a machine to process information independently.
For a good basic introduction to the Semantic Web, take a few minutes to watch the following video. No previous knowledge required!
http://www.youtube.com/watch?v=OGg8A2zfWKg