I intended 1.2 to be my “year-end” release but ended up cleaning up some things in the few days since, including one bug, so might as well call this 1.3 now.
A few pre-built binaries are again available in case that’s helpful.
I just tagged the release of dupd 1.2, enjoy hunting those duplicates!
This time I included pre-built binaries for a few platforms. Probably mostly useful on OS X for those without dev tools intalled.
Recently I’ve done a few performance improvements to dupd, motivated by one particular edge case file set I was working with a while back. That file set had very large numbers (over 100K) of files of the same size (these were log files from a production system where the content was always different but due to the structure of the files they tended to have the same size). This was a worst case scenario for dupd given the way it grouped files of the same size as potential duplicates. With the latest changes (in dupd 1.2) this scenario is dramatically faster (scan time reduced from about an hour to about five minutes – see below).
In more common scenarios these improvements don’t make a big difference but there is still some small benefit. Memory consumption is also reduced in dupd 1.2 (there is more room to reduce memory consumption that I might play with if I have time some day).
In a nutshell, dupd 1.2 should be either no slower, slightly faster or in some edge cases dramatically faster than dupd 1.1.
The three main changes were:
That said, do these changes translate to any benefit on more “normal” file sets? Nowhere near as dramatically, but it’s still faster and uses less memory so that’s all good.
All the numbers above are from machines with SSDs. I also tested on a couple machines with traditional hard drives and there was zero change in performance. No graph, it’s just a straight line ;-)
With normal hard drives, the file I/O time so completely dominates run time that there is no difference from any dupd improvements.
(I suspect the edge case file set would have seen improvement even on spinning rust, but I didn’t have the chance to test that scenario.)