Semantic Web: Beyond Metadata

From: J. R. Molloy (
Date: Fri Oct 19 2001 - 07:37:53 MDT

Smarter Web Update
You can't have a Semantic Web without metadata, but metadata alone won't
suffice. The metadata in Web pages will have to be linked to special documents
that define metadata terms and the relationships between the terms. These sets
of shared concepts and their interconnections are called "ontologies."

Say, for example, that you've made a Web page listing the members of a
faculty. You would tag the names of the different members with metadata terms
such as "chair," "associate professor," "professor" and so on. Then you'd link
the page to an ontology-one that you created yourself or one that someone else
has already made-that defines educational job positions and how they relate to
each other. An appropriate ontology would in this case define a chair as a
person, not a thing you sit on, and it would indicate that a chair is the most
senior position in a department.

By defining the relationships between terms, ontologies can then be used by
applications to infer new facts. Suppose you have created a Web page that
teaches schoolchildren about condors, and have added metadata to the content.
You could link to an ontology (or more likely, several ontologies) that define
the various terms and their relationships: "California condor is a type of
condor from California." "Condor is a member of the raptor family." "All
raptors are carnivores." "California is a state in the United States."
"Carnivores are meat eaters." By using both metadata and ontologies, a search
engine or other software agent could find your condor site based on a search
request for "carnivores in the U.S."-even if your site made no mention of
carnivores or the United States.

Because ontology development is a big undertaking, it's likely that site
creators will link to third-party ontologies. Some will be free, others will
be sold or licensed. One issue that will have to be confronted: just as with
dictionaries and atlases, political and cultural bias will creep into
ontologies. A geography-based ontology maintained by the Chinese government,
for instance, would probably not define Taiwan as a "country."

But that hardly impedes the vision. As the World Wide Web Consortium continues
to develop standards and technologies for the Semantic Web, hundreds of
organizations, companies and individuals are contributing to the effort by
creating tools, languages and ontologies.

One major contributor is DARPA-the folks responsible for a great deal of the
technology behind the Internet (see "DARPA's Disruptive Technologies," TR
October 2001). These days, DARPA is contributing tens of millions of dollars
to the Web consortium's Semantic Web project and has developed a semantic
language for the U.S. Department of Defense called DARPA Agent Markup Language
that allows users to add metadata to Web documents and relate it to
ontologies. University of Maryland computer science professor Jim Hendler-who
was until August manager of the DARPA program-has been working closely with
Berners-Lee and Miller to ensure consistency with the consortium's efforts.
Last December, Hendler announced the creation of a language that combines the
DARPA Agent Markup Language's capabilities with an ontology language,
developed in Europe, called OIL (which stands for both Ontology Inference
Layer and Ontology Interchange Language).

A developer of this new language, University of Manchester lecturer Ian
Horrocks, also advises the World Wide Web Consortium on the Semantic Web. In
January, he cofounded a company called Network Inference to develop technology
that uses ontologies and automated inference to give Semantic Web capabilities
to existing relational databases and large Web sites. Recently, an Isle of
Man-based data services company called PDMS began using Network Inference's
technology to add Semantic Web capabilities to corporate databases. Dozens of
other companies, from Hewlett-Packard to Nokia, are contributing to Semantic
Web development

Too Much, Too Late?
Miller believes the seamless flow and integration of information resulting
from these moves will make it possible to process knowledge in a way "that
solves problems, brings people closer and spurs on new ideas that never could
happen before." Others, though, are not so optimistic about the Semantic Web.
"It's rather ambitious," says R. V. Guha, who led development of the Web
consortium's Resource Description Framework efforts in the late 1990s. (This
framework is an essential tool for describing and sharing metadata.) "It would
be nice if such things existed," he says, "but there are some really hard
research problems that need to be solved first."

One issue concerns inference. The time it takes a computer to draw new
conclusions from data, metadata and ontologies on the Web increases rapidly as
rules are added to a system. Inference falls into the same category as the
classic "traveling-salesman problem" of planning the shortest route through a
number of cities. It's not hard to figure out the best of all possible routes
when you're dealing with just a very few locations. But when you get up to
only 15 cities, there are more than 43 billion possible routes. The same kind
of runaway situation exists for inference, where brute-force searches for
answers could lead to time-wasting paradoxes or contradictions.

And even if Berners-Lee and his cohorts meet the technical challenges, that
won't be enough for the Semantic Web to click into place. There is a big
question as to whether people will think the benefits are worth the extra
effort of adding metadata to their content in the first place. One reason the
Web became so wildly successful, after all, was its sublime ease of creation.

"The Web today is the simplest, most primitive form of hypertext," says former
Sun Microsystems Distinguished Engineer Jakob Nielsen, cofounder of the
Nielsen Norman Group, a Web design firm in Fremont, CA. "And that's why it was
so easy to implement; that's why everybody could...start putting up their own
Web pages; that's why the Web is so big." However, while most people may be
comfortable doing simplistic editing, such as marking a text as "bold,"
Nielsen points out, "They cannot do semantic editing, where they say, 'This is
the author's name,' or 'This is the name of people I'm quoting.'"

Of course, such pessimism may be ignoring recent history. Not so long ago, the
notion of millions of people learning to write HTML code seemed
far-fetched-yet that's exactly what happened. Still, the hurdle of creating a
Semantic Web will be higher. People can use HTML any way they want. They
commonly use tables for nontabular purposes, for instance, and slap on the
"subhead" tag merely to apply boldface. These kluges and shortcuts usually
have only cosmetic consequences. But the same type of fudging-say, by
employing "bibliography" tags to list a DVD collection-could make a page's
metadata unusable.

The fact that metadata wasn't implemented right from the Web's start could
also make it harder for the Semantic Web to gain acceptance. One particularly
tough skeptic is Peter Merholz, cofounder of Adaptive Path, a San
Francisco-based user experience consultancy. "This stuff has to be baked in
from the beginning," says Merholz, who calls the Semantic Web "an interesting
academic pursuit" with little bearing on society. "The Semantic Web is getting
a lot of hype simply because Tim Berners-Lee-the inventor of the World Wide
Web-is so interested in it," he says. "If it were just some schmuck at some
university in Indiana, nobody would care."

Initial Threads
Even Berners-Lee admits that the path to the Semantic Web may be a bit slower
than that to the World Wide Web. "In a way we don't need to move too fast," he
says, "because the theory people need to look at it to make sure we're not too
crazy, and other people need to check out the ideas in practice before they're
picked up and used too extensively."

When asked to peek into his crystal ball, the evangelist of exchangeable data
predicts that some of the Semantic Web's first commercial applications will
aim to integrate the different information systems that typically coexist in
large organizations. (Wouldn't it be nice to take care of business at the
motor vehicle department or hospital without having to fill out a half-dozen
largely redundant forms? The Semantic Web can help here.)

And even though the Semantic Web still resides chiefly on the drawing board,
you can see hints of its power on some existing Web sites. Consider Moreover
Technologies' search engine that crawls thousands of news sites several times
a day, making it a favorite for news junkies. Moreover's software agents have
been programmed to look at the font tags (the HTML labels that tell Web
browsers how large or small to make the text appear on the screen) to
determine whether or not a particular page is a news story. If a Moreover
agent finds a string of six to 18 words tagged as large type near the top of a
page, it will assume it is a headline and place it in a database. Of course,
since the agent is only making a guess, sometimes it selects a page that isn't
news after all. So Moreover has to apply additional filtering to get rid of
pages that don't contain articles.

That's still a far cry from the ultimate goal-but it's a good start. And even
the Semantic Web champions don't pretend to grasp exactly where such steps
will lead. After all, who predicted or eBay back when Berners-Lee
turned on the switch of the world's first Web server in December 1990?

But the point is that people want more intelligence from the Web than they're
getting-and a growing number of computer scientists share the twinkle in
Berners-Lee's eye, and the feeling that the Semantic Web holds the answer.
"It's great," says the inventor of the World Wide Web, "to have that
grass-roots enthusiasm around again."

--- --- --- --- ---

Useless hypotheses, etc.:
 consciousness, phlogiston, philosophy, vitalism, mind, free will, qualia,
analog computing, cultural relativism, GAC, Cyc, Eliza, cryonics, individual
uniqueness, ego, human values, scientific relinquishment

We move into a better future in proportion as science displaces superstition.

This archive was generated by hypermail 2b30 : Sat May 11 2002 - 17:44:14 MDT