Tim Anderson
Tim Anderson
R E L A T E D   C O N T E N T
Jargon Buster

ADVERTISEMENT

Tim Anderson

Prepare now for tomorrow's web

Searching major sites will remain akin to wading through glue until semantic web standards are widely adopted

IT Week, 30 Sep 2003
ADVERTISEMENT

The Google search site is just five years old and processes 200 million searches a day. That's a hard number to visualise, but there is a better way to judge the importance of Google, which is to watch how people interact with the web.

I've seen folk type web addresses into Google rather than use the address bar on their browser, and of course it works. What's crazy is that Google frequently offers a better, quicker way to find what you want even when you know the target web site. The other day I was looking for drivers for a Sony AIT tape drive. Navigating Sony's site is frustrating, but Google found the actual download page on the first page of results.

Google's clever search algorithms do a great but imperfect job of sifting out junk and identifying highly relevant web pages. What's sad is that the web itself does so little to help. Under the surface, a typical web site is a hotch-potch of scripts, images, Flash movies, Java applets, ActiveX controls, and strewn hither and thither some scraps of meaningful content for Google to root through.

The person credited with inventing the web, Tim Berners-Lee, can't be blamed for this situation. He has long advocated the development of what is called the semantic web, with data that describes itself, either to you or to automated tools like Google.

An example of something approaching the semantic web can be found in newsfeeds and weblogs, which are generally delivered via an XML vocabulary called RSS.

The acronym is contentious and might stand for RDF Site Summary, or possibly Really Simple Syndication. RSS wraps a collection of web links in metadata that defines properties such as title, description, creator, date, and rights.

There's no way to use conventional search engines to find only articles by a certain author, for example. You can search for the author's name, but you may find any page that references the name. On the other hand, if you are searching RSS files the creator or author property instantly gives you this information.

The key feature of RSS is that it enables aggregation. Bloggers use it to enable easy access on one page to all their favourite blogs. News sites use it to monitor and search the latest headlines.

Unfortunately RSS itself is a mess. The specification has forked into two strands, one more pragmatic, the other based on the more generic Resource Description Framework (RDF), which is a W3C work in progress. In its current form, RSS offers just a glimpse of what the semantic web could do.

The struggle most of us have in extracting the information we want from the web, and the success of Google, should make web authors pause for thought. I understand that the average company doesn't care much about the details of W3C protocols or tracking the latest web standards. For them, what counts is that their web site looks good in most browsers, stays online, and does a reasonable job of promoting the business. However, they overlook the fact that moving towards the semantic web offers huge commercial advantages.

The Google of the future will search metadata, not just content. The time to start building that metadata is now.

Have your say: reply to IT Week


Like this story? Spread the news by clicking below:

Post this to Delicious del.icio.us    Post this to Digg Digg this    Post this to reddit reddit!

Permalink for this story
RELATED ARTICLES
M A R K E T P L A C E
Sponsored links
F E A T U R E D   J O B S
| Aston Carter
C#, GUI Developer – Fixed Income – Investment Bank. My client is seeking a strong C# ASP.Net developer to join their Fixed Income area and operate within one of the top tier investment banks in ... more >
| Computer People
Technical Project Manager / SDLC West London, £75k - (Software Development, SDLC), RUP Serious opportunity for hands on Technical Project Manager to join a leading blue chip organisation based in an easily accessible area of ... more >
| Computer People
C# Developer - Nottingham 4 Month Contract Market Rates I have an exciting opportunity for a C# ASP.NETDeveloper working for an established client within Computer People. Working from their offices in Nottingham you’ll be providing ... more >
| JAM Recruitment
Job Ref: AS/20356/TAX Package: c£60,000.00 + Bonus + Benefits Location: Middlesex Job type: International Assignment / Global Mobility / Expatriate Tax Manager Position type: Permanent Hours: Full-time Contact name: Andy Shaw Contact Company: JAM Mobility ... more >
More job opportunities