Daniel Lemire's blog

, 2 min read

Yahoo! to exploit more metadata

Long ago, search engines stopped using the metadata available in the header of HTML pages, because people would lie or enter misleading data by mistake. Many web sites still provide Dublin Core metadata as part of their HTML, but this data is known to be misleading, incomplete and wrong. There is no evidence that metadata can enhance search. Period.

Nevertheless, Yahoo ! announced that it is going to enhance its search results with RDF metadata. They give linkedin as an example: apparently, linkedin pages are filled with metadata waiting to be exploited. Using this metadata is great idea because linkedin can be trusted. Some other things would make sense, like GeoRSS. It would be great to know where some pages say they live.

Extracting metadata from one trusted web site is one thing. Exploiting the metadata out there is another.

A few things should be pointed out:

  • As far as I can tell, Yahoo! is not talking about using metadata to improve its result sets in general. It would fail. They merely want to better describe the links found and maybe provide specialized services. If I were them, I would go around and entice various important web sites (amazon to begin with!) to provide more trusted metadata. They probably have been doing just that.
  • Beside some specific instances, I do not see how it will make their search engine better than Google. No matter what, the vast majority of web sites will contain no metadata, or wrong metadata.
  • There is no talk of non-trivial inference engines. Yahoo! still won’t be able to tell you whether G. W. Bush is a drunk or not.
  • Graduate students worldwide, stay calm. I could not find one occurrence of the word ontology in Yahoo!’s post. They are talking about RDF, not OWL. So you can stop describing the whole world in a RDF graph.