On Wed, Nov 01, 2017 at 10:05:40PM +0800, Eryu Guan wrote: > On truncate down, if new size is not block size aligned, we zero the > rest of block to avoid exposing stale data to user, and > iomap_truncate_page() skips zeroing if the range is already in > unwritten state or a hole. Then we writeback from on-disk i_size to > the new size if this range hasn't been written to disk yet, and > truncate page cache beyond new EOF and set in-core i_size. > > The problem is that we could write data between di_size and newsize > before removing the page cache beyond newsize, as the extents may > still be in unwritten state right after a buffer write. As such, the > page of data that newsize lies in has not been zeroed by page cache > invalidation before it is written, and xfs_do_writepage() hasn't > triggered it's "zero data beyond EOF" case because we haven't > updated in-core i_size yet. Then a subsequent mmap read could see > non-zeros past EOF. > > I occasionally see this in fsx runs in fstests generic/112, a > simplified fsx operation sequence is like (assuming 4k block size > xfs): > > fallocate 0x0 0x1000 0x0 keep_size > write 0x0 0x1000 0x0 > truncate 0x0 0x800 0x1000 > punch_hole 0x0 0x800 0x800 > mapread 0x0 0x800 0x800 > > where fallocate allocates unwritten extent but doesn't update > i_size, buffer write populates the page cache and extent is still > unwritten, truncate skips zeroing page past new EOF and writes the > page to disk, punch_hole invalidates the page cache, at last mapread > reads the block back and sees non-zero beyond EOF. > > Fix it by moving truncate_setsize() to before writeback so the page > cache invalidation zeros the partial page at the new EOF. This also > triggers "zero data beyond EOF" in xfs_do_writepage() at writeback > time, because newsize has been set and page straddles the newsize. > > Also fixed the wrong 'end' param of filemap_write_and_wait_range() > call while we're at it, the 'end' is inclusive and should be > 'newsize - 1'. > > Suggested-by: Dave Chinner <dchinner@xxxxxxxxxx> > Signed-off-by: Eryu Guan <eguan@xxxxxxxxxx> Looks sane, but I haven't tested it yet. Acked-by: Dave Chinner <dchinner@xxxxxxxxxx> -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html