Daniel Lemire's blog

, 1 min read

Google releases massive n-gram data set

All n-gram geeks rejoice! Google just released a massive n-gram data sets: 1,146,580,664 five-word sequences that appear at least 40 times in 1,011,582,453,213 words of running text.