On Tue, Nov 18, 2008 at 12:36:42PM +0000, Sam Mason wrote: > On Mon, Nov 17, 2008 at 11:22:47AM -0800, Lothar Behrens wrote: > > I have a problem to find as fast as possible files that are double or > > in other words, identical. > > Also identifying those files that are not identical. > > I'd probably just take a simple Unix command line approach, something > like: > > find /base/dir -type f -exec md5sum {} \; | sort | uniq -Dw 32 You save a little bit of time by using find /base/dir -type f -print0 | xargs -0 md5sum | sort | uniq -Dw 32 > this will give you a list of files whose contents are identical > (according to an MD5 hash). An alternative would be to put the hashes > into a database and run the matching up there. > > > Sam Gerhard
Attachment:
signature.asc
Description: Digital signature