Daniel Lemire's blog

, 1 min read

How do search engines handle special characters? Should you care?

Matt Cutts is Google’s search engine optimization expert. He runs a great YouTube channel called Google Webmaster Central. He was recently asked how Google handles special characters such as ligatures, soft hyphens, interpuncts and hyphenation points. His answer? He doesn’t know.

Being a scientist, I decided to compare Google to the young upstart (Bing). Using Google, the result sets from Kurt Goedel differ from those with Kurt Gödel. For example, I could only find the Kurt Goedel article from the uncyclopedia when searching for Kurt Goedel. Similarly, Google fails to realize that cœur and coeur are the same words. However, Bing knows that Goedel and Gödel is the same person. Bing knows that cœur and coeur is the same word.

While the consequences are small, they are nevertheless real:

  • Students may fail to find great references on Kurt Gödel because they search for Kurt Goedel. Indeed, most academic papers seem to prefer Gödel to Goedel.
  • A writer who tries to be typographically correct and writes cÅ“ur may get penalized when people search for coeur.

Score: Bing 1. Google 0.

Source: Will Fitzgerald. Special thanks to Mark Reid, Marek Krajewski, Jeff Erickson and Christer Ericson for the debate on Twitter motivating this blog post.