On truncate down, if new size is not block size aligned, we zero the rest of block via iomap_truncate_page() to avoid exposing stale data to user, and iomap_truncate_page() skips zeroing if the range is already in unwritten status or a hole. But it's possible that a buffer write overwrites the unwritten extent, which won't be converted to a normal extent until I/O completion, and iomap_truncate_page() skips zeroing wrongly because of the not-converted unwritten extent. This would cause a subsequent mmap read sees non-zeros beyond EOF. I occasionally see this in fsx runs in fstests generic/112, a simplified fsx operation sequence is like (assuming 4k block size xfs): fallocate 0x0 0x1000 0x0 keep_size write 0x0 0x1000 0x0 truncate 0x0 0x800 0x1000 punch_hole 0x0 0x800 0x800 mapread 0x0 0x800 0x800 This is introduced along with the switch to iomap based buffer write support, commit 68a9f5e7007c ("xfs: implement iomap based buffered write path"), because previously, block_truncate_page() did fs block based check and only skipped holes. Fix it by flushing the range we're going to zero to ensure iomap_truncate_page() sees the final extent state and zeros the range correctly. Also fixed the wrong 'end' param of filemap_write_and_wait_range() call while we're at it, the 'end' is inclusive and should be 'newsize - 1'. Signed-off-by: Eryu Guan <eguan@xxxxxxxxxx> --- An fstests test case followed, I've verified it could reproduce the bug reliably and quickly. fs/xfs/xfs_iops.c | 23 +++++++++++++++++++---- 1 file changed, 19 insertions(+), 4 deletions(-) diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c index 17081c77ef86..cd4c612cc23c 100644 --- a/fs/xfs/xfs_iops.c +++ b/fs/xfs/xfs_iops.c @@ -824,6 +824,7 @@ xfs_setattr_size( { struct xfs_mount *mp = ip->i_mount; struct inode *inode = VFS_I(ip); + struct address_space *mapping = inode->i_mapping; xfs_off_t oldsize, newsize; struct xfs_trans *tp; int error; @@ -878,8 +879,22 @@ xfs_setattr_size( if (newsize > oldsize) { error = xfs_zero_eof(ip, newsize, oldsize, &did_zeroing); } else { - error = iomap_truncate_page(inode, newsize, &did_zeroing, - &xfs_iomap_ops); + /* + * Firstly write out the range that will be zeroed, otherwise + * iomap_truncate_page() may skip zeroing this range wrongly. + * Because the range could still be dirty after buffer writes + * over unwritten extents, and the extent still stays in + * UNWRITTEN state, because the conversion only happens at I/O + * completion. + */ + unsigned int blocksize = i_blocksize(inode); + unsigned int off = newsize & (blocksize - 1); + if (off && mapping->nrpages) + error = filemap_write_and_wait_range(mapping, newsize, + newsize + (blocksize - off) - 1); + if (!error) + error = iomap_truncate_page(inode, newsize, + &did_zeroing, &xfs_iomap_ops); } if (error) @@ -895,8 +910,8 @@ xfs_setattr_size( */ if (did_zeroing || (newsize > ip->i_d.di_size && oldsize != ip->i_d.di_size)) { - error = filemap_write_and_wait_range(VFS_I(ip)->i_mapping, - ip->i_d.di_size, newsize); + error = filemap_write_and_wait_range(mapping, ip->i_d.di_size, + newsize - 1); if (error) return error; } -- 2.13.6 -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html