14th November 2008, 1 min read

Toward the Commoditization of Natural Language Processing

In a remarkable paper, Peter Turney shows that using a simple family of algorithms and freely available software, one can determine analogies, synonyms, antonyms, and relations between words automatically. Here is the beginning of the abstract:

Recognizing analogies, synonyms, antonyms, and associations appear to be four distinct tasks, requiring distinct NLP algorithms. In the past, the four tasks have been treated independently, using a wide variety of algorithms. These four semantic classes, however, are a tiny sample of the full range of semantic phenomena, and we cannot afford to create ad hoc algorithms for each semantic phenomenon; we need to seek a unified approach.

I do not work in Natural Language Processing (NLP) per se, but this sounds like commoditization to me in the sense that you no longer need to design, learn and tweak custom algorithms. If you have enough data, you can do NLP after learning one (remarkably simple) family of algorithms. Peter Norvig might approve.

In the database research world, commoditization is already an accomplished fact. Database researchers have been wondering about their relevance for about ten years. Peter might argue that in such a context, researchers should become bold and daring. Computer Science researchers should choose crazy problems. Reference: Peter Turney, A Uniform Approach to Analogies, Synonyms, Antonyms, and Associations, Coling 2008 August 2008.