The Semantic Web

The web has primarily been used, up to now, as a tool for allowing humans to communicate with machines. When the Internet was designed (and this predates the Web), it consisted of a number of fairly low-level protocols like sockets, readers, writers, ports, and IP connections. These protocols were designed for machine-machine communication and were certainly not easy for humans to use. On top of these low-level protocols were built higher level programs like email and ftp connections, relatively easier for humans but still challenging by today's standards. Then the Web came along, also based on the same low-level protocols, but designed to make it as easy as possible for humans to interact with the Internet. (People who've been frustrated by slow response times when trying to access their favorite web sites may dispute its user friendliness, but compared to where we were a few years ago, it's massively improved.) We now find ourselves in a situation somewhat unusual in the technology field: in a way, the web is now easier for humans to use than for computers.

What do we mean when we say the Web is easier for humans to use? Well, there's really an enormous amount of data out there on the Web which is quite easy for people to get at with just a few minutes and a search engine. Everything from where to get a great pizza in Boise, Idaho, to who my congressman is and what his or her voting record is, to lots of less "family oriented" data, is readily available within a couple of mouse clicks. Ideally, this should create a golden opportunity for AI, since all an intelligent agent need be able to do is automatically surf the internet and it should be as intelligent as a human.

The problem is that the Web is not necessarily an easy place for an automated agent to get around. If it has a question it wants to find, it can probably find a list of related sites pretty quickly through a keyword search (and even that's amazing compared to where we were a few years ago). But how can it actually get the data it needs out of a web site? The formats that different web sites use are probably as many as there are web sites, and the formats within a given web site often change every few months. What's an intelligent agent to do? The answer lies partly in some of the ideas that we've already discussed, like natural language-intelligent agents have got to become more effective at understanding human conversation. But it also lies in creating protocols which make it easier for computers to access data on the Internet, and this is the basic idea of the Semantic Web.

The vision of the Semantic Web-which is new and still very much under construction-consists of three main parts: ontologies, markup languages, and inference (rule-based) engines. The first component, ontologies, are databases of information about a particular domain which are likely to be relevant to particular intelligent agents. We've already seen one example of an ontology in the form of the Cyc knowledge base, an effort to build up information about general human knowledge which has been ongoing for a number of years. However, ontologies can and should be built for almost any area of human knowledge, especially business knowledge. Furthermore, these ontologies are likely to be distributed: an ontology related to medical information is likely to be maintained by a different institution than one related to tax knowledge, probably in a different city. So the need for a means to communicate over the Web in an automated fashion is thus underscored. The means for communicating over the Web is provided by markup languages.

We've already talked a little bit about markup languages in a previous edition. The basic idea is to provide XML-based markup languages that allow for ease of communication across the Internet. Perhaps there will be a one-to-one mapping between ontologies and markup languages. In any event, documents using particular markup languages are what will actually be exchanged over the Semantic Web between the ontologies and the inference engines.

So this brings us to the final main component of the Semantic Web: the inference engines. We've touched on rule-based systems in the past. The inference engines on the Semantic Web will be similar, but they involve some additional challenges. For one thing, they are reasoning over information which is coming in from all over the Web, and they won't have control over when that information arrives, how often it is updated, or even how accurate or reliable it is. The engines will therefore have to have built in mechanisms for dealing with these real-time uncertainties. On the other hand, the standardization of the data through the markup languages should make some aspects of building these systems easier than before.

At a very high level, then, the relationship between these three components of the Semantic Web is roughly described by the following diagram:

The ontologies (for this medical insurance based simple example) are based in a large number of different cities. As a result, they are updated asynchronously and under the control of different organizations. The markup languages are used to communicate between the inference engines and the ontologies. And the inference engines themselves (which also may be distributed across platforms, cities, and organizations) do most of the actual "work". Note that this diagram helps to explain why things must be structured in this manner. The medical ontology must be maintained by experts in the medical field; likewise the insurance ontology must be maintained by insurance folks. Yet to process a claim, one needs to know a little about both medicine (what precisely is the claim for) and insurance (what are the rules of this particular policy). Therein lies the challenge of the Semantic Web.

More information about the Semantic Web can be found at the Semantic Web homepage or at the W3C site.

We've already had editions about inference (rule-based) engines and markup languages, so in the next edition we will focus on ontologies.


Home: Ramalila.NET



All copyrights are maintained by respective contributors and may not be reused without permission. Graphics and scripts may not be directly linked to. Site assets copyright © 2000 RamaLila.com and respective authors.
By using this site, you agree to relinquish all liabilities and claims financial or otherwise against RamaLila and its contributors. Visit this site at your own risk.