Hi folks, Brian, Eric and I have been tracking down a set of data corruption problems on XFS over the past couple of days. The one that is important to the wider developer community is the truncate/mmap write issue that Eric isolated from a real-world application that was triggering it. The corruption only affects block size smaller than page size configurations and is caused by mmapped writes to the EOF page which has been partially truncated. If we then extend the file again, the region of the page that was truncated and had blocks punched out of it can be written to via mapped writes without blocks being allocated for the hole. Hence while the page is in the page cache, the contents of the file look OK. Unmount/mount the filesystem, then re-read the page from disk and it will contain zeros because there is a hole rather than data blocks. In the XFS case, the bug was that the filesystem truncate code is not cleaning the partial page fully during the truncate down or up, and hence the pte remains mapped dirty in the TLB. Hence when new data is written to the page, it doesn't trigger a write fault, ->page_mkwrite is not called and hence blocks are not allocated over the hole. I chose to fix it on the truncate up as it was the lesser of two evils - we can't actually fix the problem entirely because we can't serialise page faults against truncate. Initially I couldn't reproduce the data corruptions on ext4, but Eric came to my rescue and provided me with an updated mremap test that triggered corruptions. I also added another variant to the plain truncate/mwrite test and so now that itest also reliably produces data corruptions on ext4. I suspect the ext4 issue is similar to the XFS case (i.e. no page_mkwrite call), but I can't follow the ext4 code with any level of cluefulness.... And so: practise what I preach and post a heads-up to -fsdevel. That is, if two filesystems that support block size smaller than page size have similar data corruptions when exercising the same generic code paths in similar ways, then it is likely that other filesystems have similar problems and need to be checked. While the tests I packaged for xfstests are not yet reviewed, they do work and expose the corruptions on both XFS and ext4. Hence I've pushed them to a git tree branch so that everyone can test their filesystems against the reproducers. The tests in question are generic/029 and generic/030 and can be found here: git://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git mmap-truncate FWIW, any filesystem that supports FALLOC_FL_COLLAPSE_RANGE should also have generic/031 run against it. This is the test case that Brian isolated from an fsx failure that exposed a different partial page truncation data corruption issue in XFS with block size smaller than page size. However, it's a similar situation with ext4: the exact same underlying partial page writeback bug was found in ext4 back in May and fixed in 3.16.... Most importantly, all the credit must go to Eric and Brian for doing the hard work of turning application failures into simple, reproducable test cases. Finding bugs is easy when you are provided with a 100% reliable reproducer and a bunch of analysis about where the bug most likely lies. :) Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html