On Tue, 6 Jan 2015, Dave Chinner wrote: > On Mon, Jan 05, 2015 at 10:34:57AM -0800, Sage Weil wrote: > > On Wed, 10 Dec 2014, Josef Bacik wrote: > > > On 12/10/2014 06:27 AM, Jan Kara wrote: > > The first instance was with the use of sync_file_range to try to > > control/limit the amount of dirty data in the page cache. This, possibly > > in combination with posix_fadvise(DONTNEED), managed to break the > > writeback sequence in XFS and led to data corruption after power loss. > > Corruption after power loss is not brilliant behaviour from XFS > here, but I'll point out for the wider audience that > sync_file_range() provides absolutely no data integrity guarantees > for power loss situations. It's really, really badly named because > it doesn't give the same guarantees as other "sync" functions > filesystems provide. IOWs, if you value your data, the only > interface you can rely for data integrity is fsync/fdatasync... Agreed. In our case, we used syncfs(2) for data integrity. sync_file_range(2) was used only to limit dirty data in the page cache. > > The other issue we saw was just a general raft of FIEMAP bugs over the > > last year or two. We saw cases where even after fsync a fiemap result > > would not include all extents, and (not unexpectedly) lots of corner cases > > in several file systems, e.g., around partial blocks at end of file. (As > > far as I know everything we saw is resolved in current kernels.) > > Again, this is probably more a misunderstanding of FIEMAP than > anything. FIEMAP is *advisory* and gives no output accuracy > guarantees as userspace cannot prevent the extent maps from changing > at any time. As an example, see the aborted attempt by the 'cp' > utility to use FIEMAP to detect holes when copying sparse files.... Where did the cp vs FIEMAP discussion play out? I missed that one. We only use fiemap to determine which file regions are holes, only after fsync, and only when there are no other processes or threads accessing the same file (and only when explicitly enabled by the admin since many users still have buggy implementations deployed). Under those circumstances I thought it should be reliable... In retrospect the SEEK_HOLE/SEEK_DATA interface is simpler and better suited, but I'm hesitant to fall into the same trap. > > I'm not so concerned with these specific bugs, but worried that we > > (perhaps naively) expected them to be pretty safe. Perhaps for FIEMAP > > this is a general case where a newish syscall/ioctl should be tested > > carefully with our workloads before being relied upon, and we could have > > worked to make sure e.g. xfstests has appropriate tests. > > Oh, it does - that's why it mostly works now across all filesystems > that are regularly tested with xfstests. > > > For power fail > > testing in particular, though, right now it isn't clear who is testing > > what under what workloads, so the only really "safe" approach is to stick > > to whatever syscall combinations we think the rest of the world is using, > > or make sure we test ourselves. > > Write tests for the regression test suite that filesystem developers > run all the time. ;) Yes (and I assume that you specifically mean xfstests here). I hope we can get some consensus on what that testing approach will be for power failure. I don't much care whether it's an ioctl each fs implements or a dm layer that does about the same thing; I see advantages to both approaches. As long as there is some convergence... sage -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html