On Tue, 6 Jan 2015, Dave Chinner wrote: > On Mon, Jan 05, 2015 at 02:26:30PM -0800, Sage Weil wrote: > > On Tue, 6 Jan 2015, Dave Chinner wrote: > > > Again, this is probably more a misunderstanding of FIEMAP than > > > anything. FIEMAP is *advisory* and gives no output accuracy > > > guarantees as userspace cannot prevent the extent maps from changing > > > at any time. As an example, see the aborted attempt by the 'cp' > > > utility to use FIEMAP to detect holes when copying sparse files.... > > > > Where did the cp vs FIEMAP discussion play out? I missed that one. > > Oh, there were several issues - different filesystems exposed > different issues, but the main one is that extent maps don't reflect > newly written cached data that do not have extents allocated for > them, hence the nedd for SEEK_DATA/SEEK_HOLE for optimal sparse file > traversal: > > http://lwn.net/Articles/429345/ > http://lwn.net/Articles/440255/ > > Not to mention race conditions between extent walking and background > writeback started to noticed: > > http://lists.openwall.net/linux-ext4/2012/11/13/8 > > But then there were also corruption bugs in the cp FIEMAP code as > well: > > http://gnu-coreutils.7620.n7.nabble.com/bug-12656-cp-since-8-11-corrupts-files-td20710.html Sigh, I didn't look far enough back it seems. > > We only use fiemap to determine which file regions are holes, only after > > fsync, and only when there are no other processes or threads accessing the > > same file (and only when explicitly enabled by the admin since many users > > still have buggy implementations deployed). Under those circumstances I > > thought it should be reliable... > > And when the filesystem does background defragmentation or block > trimming or some other re-organisation of recently accessed files? I wouldn't expect any of those things to change whether the file system reports a file extent as allocated or a hole, but now that you mention it and given what we've seen so far that's probably not the safest bet to make. In any case, SEEK_DATA/HOLE is clearly a more appropriate interface and appears to be well supported. We'll switch to that and probably leave it off by default again until we've confirmed there are tests in xfstests that match what ceph is doing. Thanks, Dave! In any case, to the original point about converging on power fail testing approaches, I'd say it's worth a time slot at LSF. :) sage -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html