Daniel Lemire's blog

, 1 min read

External-Memory Sorting in Java

Update: this code is now obsolete. Please see the corresponding Github project.

Sometimes, you want to sort large file without first loading them into memory. The solution is to use External Sorting. Typically, you divide the files into small blocks, sort each block in RAM, and then merge the result.

Many database engines and the Unix sort command support external sorting. But what if you want to avoid a database? Or what if you want to sort in a non-lexicographic order? Or maybe you just want a simple external sorting example?

When I could not find such a simple program, I wrote one.

Please help me improve it. It needs:

  • To be easy to modify: the code must remain simple.
  • It must scale to very large files.
  • While scalability and simplicity are most important, it must also be sensibly efficient.

Once we have a good solution, I plan to post it on Google code and add a link in the External Sorting wikipedia entry. If you contribute to the code, please add your name so you can get credit.

Reference: ExternalSort.java, http://pastebin.com/H57TZF7e