Original post is here: eklausmeier.goip.de
Last year I used a drop-in replacement for the ordinary Linux sort
command called nsort
from Ordinal Technology. Ordinal's nsort
is free but not open-source. One thing is clear, however, it is very fast. nsort
was written by Chris Nyberg.
The motivation for looking for a faster sort was as follows. I had to drop all duplicate records from a single Oracle database table. The table had more than 800 million records. It was later found out, i.e., after I already had the solution, that from the initial number of records only 3% of the records would remain, i.e., 97% of the records were indeed duplicates. The solution basically was to extract all data from the table with a small C program. The extracted data was then sorted (sort -u
), the result then loaded into the database table again.
Using nsort
instead of plain sort
runtime was one-third. In my case overall runtime went down from 60 minutes to 20 minutes.
Nsort user guide is the very readable user's guide to nsort
.
Benchmarks involving nsort
can be found at sortbenchmark.org.