Dramatic Faster Sorting in Linux Using Nsort

Original post is here: eklausmeier.goip.de

Last year I used a drop-in replacement for the ordinary Linux sort command called nsort from Ordinal Technology. Ordinal's nsort is free but not open-source. One thing is clear, however, it is very fast. nsort was written by Chris Nyberg.

The motivation for looking for a faster sort was as follows. I had to drop all duplicate records from a single Oracle database table. The table had more than 800 million records. It was later found out, i.e., after I already had the solution, that from the initial number of records only 3% of the records would remain, i.e., 97% of the records were indeed duplicates. The solution basically was to extract all data from the table with a small C program. The extracted data was then sorted (sort -u), the result then loaded into the database table again.

Using nsort instead of plain sort runtime was one-third. In my case overall runtime went down from 60 minutes to 20 minutes.

Nsort user guide is the very readable user's guide to nsort.

Benchmarks involving nsort can be found at sortbenchmark.org.

#neosort #nsort #Ordinal #Technology #unique #800 million records #Oracle #database #duplicate #deduplicate #Nyberg #sort

last updated: 2024-11-04