Re: [RFC PATCH] xfs: flush posteof zeroing before reflink truncation

Brian Foster <bfoster@xxxxxxxxxx> · Fri, 16 Nov 2018 14:31:24 -0500

On Fri, Nov 16, 2018 at 11:07:24AM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> 
> If we're remapping into a range that starts beyond EOF, we have to zero
> the memory between EOF and the start of the target range, as established
> in 410fdc72b05af.  However, in 4918ef4ea008, we extended the pagecache
> truncation range downwards to a page boundary to guarantee that
> pagecache pages are removed and that there's no possibility that we end
> up zeroing subpage blocks within a page.  Unfortunately, we never commit
> the posteof zeroing to disk, so on a filesystem where page size > block
> size the truncation partially undoes the zeroing and we end up with
> stale disk contents.
> 
> Brian and I reproduced this problem by running generic/091 on a 1k block
> xfs filesystem, assuming fsx in fstests supports clone/dedupe/copyrange.
> 

Note that I think we might have hit different manifestations of the same
problem. I actually reproduced an assert failure via shared/010 with the
writeback patch I just posted that checks that we don't write to shared
extents. A trace focused on the problematic range showed:

        fsstress-12966 [002] 14464.404582: xfs_zero_eof:         dev 253:3 ino 0x80009b isize 0x14070 disize 0x14070 offset 0x14070 count 655248
        fsstress-12966 [002] 14464.404583: xfs_ilock:            dev 253:3 ino 0x80009b flags ILOCK_EXCL caller xfs_file_iomap_begin
        fsstress-12966 [002] 14464.404585: xfs_reflink_trim_around_shared: dev 253:3 ino 0x80009b lblk 0x14 len 0x1 pblk 1048635 st 0
        fsstress-12966 [002] 14464.404641: xfs_iomap_found:      dev 253:3 ino 0x80009b size 0x14070 offset 0x14070 count 655248 type 0x0 startoff 0x14 startblock 1048635 blockcount 0x1
        fsstress-12966 [002] 14464.404642: xfs_iunlock:          dev 253:3 ino 0x80009b flags ILOCK_EXCL caller xfs_file_iomap_begin
        fsstress-12966 [002] 14464.404704: writeback_dirty_page: bdi 253:3: ino=8388763 index=20

... which implies that the eof zeroing associated with the remap dirties
the page that ends up being written back to a shared block. As a quick
hack, I just swapped the order of the zeroing and the existing flush
calls and that made the problem disappear.

> Fixes: 410fdc72b05a ("xfs: zero posteof blocks when cloning above eof")
> Fixes: 4918ef4ea008 ("xfs: fix pagecache truncation prior to reflink")
> Simultaneously-diagnosed-by: Brian Foster <bfoster@xxxxxxxxxx>
> Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> ---
> Note: I haven't tested this thoroughly but wanted to push this out for
> everyone to look at ASAP.
> ---

This addresses the problem that I reproduce with shared/010 as well, and
looks Ok to me pending further testing:

Reviewed-by: Brian Foster <bfoster@xxxxxxxxxx>

>  fs/xfs/xfs_reflink.c |    8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index c56bdbfcf7ae..8ea09a7e550c 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -1255,13 +1255,19 @@ xfs_reflink_zero_posteof(
>  	loff_t			pos)
>  {
>  	loff_t			isize = i_size_read(VFS_I(ip));
> +	int			error;
>  
>  	if (pos <= isize)
>  		return 0;
>  
>  	trace_xfs_zero_eof(ip, isize, pos - isize);
> -	return iomap_zero_range(VFS_I(ip), isize, pos - isize, NULL,
> +	error = iomap_zero_range(VFS_I(ip), isize, pos - isize, NULL,
>  			&xfs_iomap_ops);
> +	if (error)
> +		return error;
> +
> +	return filemap_write_and_wait_range(VFS_I(ip)->i_mapping,
> +			isize, pos - 1);

Random nit: any particular reason we use ->i_data in the page truncate
call in xfs_reflink_remap_prep() and ->i_mapping everywhere else?

Brian

>  }
>  
>  /*