Daniel Lemire's blog

, 1 min read

Analyzing Large Collections of Electronic Text Using OLAP at APICS 2005

Steven will be presenting our paper Analyzing Large Collections of Electronic Text Using OLAP at APICS 2005. This work is based on an idea by Owen Kaser:__ what happens if we apply multidimensional databases (OLAP) to literary research?__

Data Mining and Information Retrieval techniques are used routinely for literary research or processing text in general, but decision support techniques commonly used in the business world (sometimes called “Business Intelligence”) have not seen much use yet in text processing. The main difference between decision support systems and data mining is the fact that in decision support, the user remains in control, thus simple yet extremely efficient algorithms are favoured over sophisticated, but possibly expensive algorithms. Ideally, all decision support algorithms should be O(1) after accounting for precomputations. With infinite storage almost available now, decision support research is due for a technological and scientific boom.

Computer-assisted reading and analysis of text has various applications in the humanities and social sciences. The increasing size of many electronic text archives has the advantage of a more complete analysis but the disadvantage of taking longer to obtain results. On-Line Analytical Processing is a method used to store and quickly analyze multidimensional data. By storing text analysis information in an OLAP system, a user can obtain solutions to inquiries in a matter of seconds as opposed to minutes, hours, or even days. This analysis is user-driven allowing various users the freedom to pursue their own direction of research.