The Google search site is just five years old and processes 200 million searches a day. That's a hard number to visualise, but there is a better way to judge the importance of Google, which is to watch how people interact with the web.
I've seen folk type web addresses into Google rather than use the address bar on their browser, and of course it works. What's crazy is that Google frequently offers a better, quicker way to find what you want even when you know the target web site. The other day I was looking for drivers for a Sony AIT tape drive. Navigating Sony's site is frustrating, but Google found the actual download page on the first page of results.
Google's clever search algorithms do a great but imperfect job of sifting out junk and identifying highly relevant web pages. What's sad is that the web itself does so little to help. Under the surface, a typical web site is a hotch-potch of scripts, images, Flash movies, Java applets, ActiveX controls, and strewn hither and thither some scraps of meaningful content for Google to root through.
The person credited with inventing the web, Tim Berners-Lee, can't be blamed for this situation. He has long advocated the development of what is called the semantic web, with data that describes itself, either to you or to automated tools like Google.
An example of something approaching the semantic web can be found in newsfeeds and weblogs, which are generally delivered via an XML vocabulary called RSS.
The acronym is contentious and might stand for RDF Site Summary, or possibly Really Simple Syndication. RSS wraps a collection of web links in metadata that defines properties such as title, description, creator, date, and rights.
There's no way to use conventional search engines to find only articles by a certain author, for example. You can search for the author's name, but you may find any page that references the name. On the other hand, if you are searching RSS files the creator or author property instantly gives you this information.
The key feature of RSS is that it enables aggregation. Bloggers use it to enable easy access on one page to all their favourite blogs. News sites use it to monitor and search the latest headlines.
Unfortunately RSS itself is a mess. The specification has forked into two strands, one more pragmatic, the other based on the more generic Resource Description Framework (RDF), which is a W3C work in progress. In its current form, RSS offers just a glimpse of what the semantic web could do.
The struggle most of us have in extracting the information we want from the web, and the success of Google, should make web authors pause for thought. I understand that the average company doesn't care much about the details of W3C protocols or tracking the latest web standards. For them, what counts is that their web site looks good in most browsers, stays online, and does a reasonable job of promoting the business. However, they overlook the fact that moving towards the semantic web offers huge commercial advantages.
The Google of the future will search metadata, not just content. The time to start building that metadata is now.
Have your say: reply to IT Week

