From URI to ontologies
It is some sort of well known pattern, the "
http://" stuff. It is so
frequently used that browsers just fill it in automatically by default.
But what does exactly mean? What's behind it? In reality, behind a so
called URI, there's more than meets the eye.
What is an URI?
First of all, we need some definition. an URI, or Uniform Resource Identifier, is a string identifier that, as the name implies, identifies some resource. A resource is "something" that can be referred to. It could be an internet document, a book, whatever. The official definition of an URI is detailed in its latest form in RFC 3986.
The URI is made of the following parts:
- A scheme
- A hierarchical part
- Optionally, a query
- Optionally, a fragment
Some URIs can be URLs (Locators) or URNs (Names). Both are URIs, but the
URL identifies a resource and the place where to find the resource,
while a URN describes a name of a resource through the special scheme
urn:. When you type an address in your browser, you are
specifying a URL. When you talk about
urn:isbn:0-395-36341-1, you are
using a URN to talk about a book with a specific
ISBN. Please note that the
relationship is not bijective: although every URL is a URI, not all URIs
are URLs, and the same can be said for URNs instead of URL.
Although many scheme names are named after protocols (eg. http, ftp, ldap) this is mostly incidental. There is no "technical correspondence" between the scheme and the protocol: for example by saying a URI with the scheme name http, trying to get the resource (therefore assuming the URI is also a location) will trigger not only HTTP, but also DNS. Moreover, if you have the document in cache, you will not use HTTP at all, but you will get a resource that is on your hard drive. "Resolving" a URI means finding an access strategy to "use" (in very broad term) the resource identified by that URI, an operation called "dereference". The most common dereferencing is retrieval, like downloading the resource.
The hierarchical part includes either an authority and a path, or just a
path. Please note that the "path" concept is very loose: for example,
firstname.lastname@example.org is a path.
urn:isbn:0-395-36341-1, the path is
(which corresponds to the Webster Dictionary, in case you are
wondering). When you have an authority involved for the resolution of
the path, and only in this case, then you have the "//" to indicate
the authority, for example
http://example.com/foo/bar, where the
example.com and the path is
/foo/bar. An apparent
telnet://example.com, but in any case you do have an
authority, and the path is always empty. Another apparent strange
mailto:email@example.com: in this case, be careful not to
confuse a path with an authority (maybe user qualified, as in
What can you use URIs for ?
So, what does the distinction of URI, URL and URN is really useful for? The fact is that, while a URL refers to a "place" where to find a resource, a URN refers to a "name"of a resource in the urn scheme, and a URI as a name of a resource in any scheme. When we talk about resource, the meaning is very broad. It could be anything, even an abstract concept. An example is the definition of namespaces in XML, something like
<h:html xmlns:h="http://www.w3.org/1999/xhtml"> <h:head><h:title>Hello!</h:title></h:head> <h:body> <h:h1>Big Title</h:h1> </h:body> </h:html>
http://www.w3.org/1999/xhtml is just an URI. It is not a
location (the fact that W3C is actually using the address to provide an
informative page can be seen as a coincidence). It is a symbol to mean
something, namely, the fact that some elements in that XML documents
belong to the vocabulary of XHTML.
In Chestnut package manager, the XML manifest starts with
<Package xmlns="urn:uuid:d195be0c-200a-40a4-9d05-35fdf42eb29f" version="1.0.0">
Where the URI
means something: the fact that the "grammar" used is the one of
Chestnut package manager. I created this URI with the utility uuidgen,
and by definition is a unique id. I could have used anything else, even
the URI of this post. The important point is that it has to be a URI,
and has to be unique for this specific use. In this particular case, the
URI is also a URN.
Another interesting usage of URIs is in ontologies and RDF descriptions. Briefly and roughly, an ontology is a "description of a world". Suppose that you want to describe the following information
my guitar is white
and you want a computer to be able to understand it. Moreover, you would like to make the computer understand that the guitar is a musical instrument
The guitar is a musical instrument
and that white is a color
white is a color
Now, if you ask the computer to search all the musical instruments that
are white, you would like to get my guitar, because it is white, and
because it is a musical instrument. Seems easy, but it's not. The fact
is that humans are very smart at interpreting those phrases, but a
computer is not. Quoting Bill
it depends on what the meaning of "is" is. By saying
my guitar is white we describe a property of the guitar, specifically
its color. We have a subject (
my guitar), a verb (
has color) and a predicate (
white). This is also known
as a triplet, and is the basis of
Of course, if we take
the guitar is a musical instrument we have again
a case of triplet:
the guitar is the subject,
is a in terms of
is a kind of is the verb, and
musical instrument is the predicate.
Please note that we have another interesting fact here: while
my guitar is a very specific guitar (that is, the guitar I own),
the guitar is an abstract concept that applies to any guitar. They are
not the same thing, they are two different concepts, but we can say
without doubt that
my guitar (the first concept) is a guitar (the second concept). This
is a case of instance/class relationship.
We can form very complex networks of subject-verb-predicate triplets
describing the digital and non-digital world we live in, so that a
computer can help us in doing complex search and analysis. How do we
differentiate all the concepts and ambiguities we just encountered? You
guessed it: with URIs. We will have a URI to express the concept of
my guitar, another URI to express the concept of
white, another URI
to express the concept of
we will also have different URIs expressing the concepts of
is (as a color) and
is (a kind of). All these concepts are part of
the description (also known as ontology) we want to grant to the
small world we created in this example. This description is rather
simple and loose, but we can define way more complex ontologies. For
example, we can create a color ontology, describing the colors and their
white is a color red is a color snow white is a kind of white blood red is a kind of red red is a warm color blue is a cold color
or an instrument ontology describing
guitar is a six-string instrument violin is a four-string instrument four-string instrument is a string instrument six-string instrument is a string instrument string instrument is a musical instrument
Each of these concepts (
will then be referred by means of a unique URI.
I hope I gave a reasonable and down-to-earth introduction to URIs and Ontologies, what they mean and represent, and what they can be used for. As usual, you are welcome to comment.