On Sat, Mar 24, 2012 at 04:27:19PM +0000, Peter Grandi wrote: > -------------------------------------------------------------- > # (cd /tmp/ext4; rm -rf linux-2.6.32; sync; time star -no-fsync -x -f /tmp/linux-2.6.32.tar; egrep 'Dirty|Writeback' /proc/meminfo; time sync) > star: 37343 blocks + 0 bytes (total of 382392320 bytes = 373430.00k). > > real 0m1.204s > user 0m0.139s > sys 0m1.270s > Dirty: 419456 kB > Writeback: 0 kB > > real 0m5.012s > user 0m0.000s > sys 0m0.458s > -------------------------------------------------------------- > # (cd /tmp/ext4; rm -rf linux-2.6.32; sync; time star -x -f /tmp/linux-2.6.32.tar; egrep 'Dirty|Writeback' /proc/meminfo; time sync) > star: 37343 blocks + 0 bytes (total of 382392320 bytes = 373430.00k). > > real 23m29.346s > user 0m0.327s > sys 0m2.280s > Dirty: 108 kB > Writeback: 0 kB > > real 0m0.236s > user 0m0.000s > sys 0m0.199s But as a user, what guarantees do I *want* from tar? I think the only meaningful guarantee I might want is: "if the tar returns successfully, I want to know that all the files are persisted to disk". And of course that's what your final "sync" does, although with the unfortunate side-effect of syncing all other dirty blocks in the system too. Calling fsync() after every single file is unpacked does also achieve the desired guarantee, but at a very high cost. This is partly because you have to wait for each fsync() to return [although I guess you could spawn threads to do them] but also because the disk can't aggregate lots of small writes into one larger write, even when the filesystem has carefully allocated them in adjacent blocks. I think what's needed is a group fsync which says "please ensure this set of files is all persisted to disk", which is done at the end, or after every N files. If such an API exists I don't know of it. On the flip side, does fsync()ing each individual file buy you anything over and above the desired guarantee? Possibly - in theory you could safely restart an aborted untar even through a system crash. You would have to be aware that the last file which was unpacked may only have been partially written to disk, so you'd have to restart by overwriting the last item in the archive which already exists on disk. Maybe star has this feature, I don't know. And unlike zip, I don't think tarfiles are indexed, so you'd still have to read it from the beginning. If the above benchmark is typical, it suggests that fsyncing after every file is 4 times slower than untar followed by sync. So I reckon you would be better off using the fast/unsafe version, and simply restarting it from the beginning if the system crashed while you were running it. That's unless you expect the system to crash 4 or more times while you untar this single file. Just my 2¢, as a user and definitely not a filesystem expert. Regards, Brian. _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs