On Tuesday 16 March 2010 14:41:41 Nathan Wayde wrote: > On 16/03/10 00:48, Shridhar Daithankar wrote: > > [...] > > But as far as file system performance goes, the overhead should be > > identical for both the runs, no? > > I'm not too sure about that. I'm guessing there is less seeking going on > with Btrfs. Some files systems (reiserfs + reiserfs4 IIRC) are very good > with many small files, better than the ext*fs, this may be another case > of that. Yes btrfs does have tail packing i.e. storing inode and the file together in a single block. However all the files I had in the tree were 50-55K in size and that definitely does not fit in a block. > I still think you could achieve better times by not calling the external > command that many times. > Since you're already gonna store the checksums in a database, I'd just > write a proper program in python or something. The application I am developing already has copy/copyttree and md5sum built- in. I mmap the whole file and do memcpy/memcmp/md5sum in a single pass. That is already a bit faster than native cp, which uses write and buffer management. I changed/refactored the tree copy code and created a new tree. And I wanted to verify outside the application that the tree copy has gone good. Hence did find/md5sum. This was a one time exercise only but the result were drastic enough to be published. -- Regards Shridhar