Re: [PATCH] xfs: stabilize insert range start boundary to avoid COW writeback race

"Darrick J. Wong" <darrick.wong@xxxxxxxxxx> · Mon, 16 Dec 2019 21:59:27 -0800

On Tue, Dec 10, 2019 at 08:23:40AM -0500, Brian Foster wrote:
> generic/522 (fsx) occasionally fails with a file corruption due to
> an insert range operation. The primary characteristic of the
> corruption is a misplaced insert range operation that differs from
> the requested target offset. The reason for this behavior is a race
> between the extent shift sequence of an insert range and a COW
> writeback completion that causes a front merge with the first extent
> in the shift.
> 
> The shift preparation function flushes and unmaps from the target
> offset of the operation to the end of the file to ensure no
> modifications can be made and page cache is invalidated before file
> data is shifted. An insert range operation then splits the extent at
> the target offset, if necessary, and begins to shift the start
> offset of each extent starting from the end of the file to the start
> offset. The shift sequence operates at extent level and so depends
> on the preparation sequence to guarantee no changes can be made to
> the target range during the shift. If the block immediately prior to
> the target offset was dirty and shared, however, it can undergo
> writeback and move from the COW fork to the data fork at any point
> during the shift. If the block is contiguous with the block at the
> start offset of the insert range, it can front merge and alter the
> start offset of the extent. Once the shift sequence reaches the
> target offset, it shifts based on the latest start offset and
> silently changes the target offset of the operation and corrupts the
> file.
> 
> To address this problem, update the shift preparation code to
> stabilize the start boundary along with the full range of the
> insert. Also update the existing corruption check to fail if any
> extent is shifted with a start offset behind the target offset of
> the insert range. This prevents insert from racing with COW
> writeback completion and fails loudly in the event of an unexpected
> extent shift.
> 
> Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx>
> ---
> 
> This has survived a couple fstests runs (upstream) so far as well as an
> overnight loop test of generic/522 (on RHEL). The RHEL based kernel just
> happened to be where this was originally diagnosed and provides a fairly
> reliable failure rate of within 30 iterations or so. The current test is
> at almost 70 iterations and still running.
> 
> Brian

After a week of testing I declare that 2.5 billion fsxops should be
enough for anyone. :P

This /also/ seems to have fixed the separate corruption problems I was
seeing, so...

Reviewed-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx>

--D

> 
>  fs/xfs/libxfs/xfs_bmap.c |  3 +--
>  fs/xfs/xfs_bmap_util.c   | 12 ++++++++++++
>  2 files changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index a9ad1f991ba3..4a802b3abe77 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -5972,8 +5972,7 @@ xfs_bmap_insert_extents(
>  		goto del_cursor;
>  	}
>  
> -	if (XFS_IS_CORRUPT(mp,
> -			   stop_fsb >= got.br_startoff + got.br_blockcount)) {
> +	if (XFS_IS_CORRUPT(mp, stop_fsb > got.br_startoff)) {
>  		error = -EFSCORRUPTED;
>  		goto del_cursor;
>  	}
> diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> index 2efd78a9719e..e62fb5216341 100644
> --- a/fs/xfs/xfs_bmap_util.c
> +++ b/fs/xfs/xfs_bmap_util.c
> @@ -992,6 +992,7 @@ xfs_prepare_shift(
>  	struct xfs_inode	*ip,
>  	loff_t			offset)
>  {
> +	struct xfs_mount	*mp = ip->i_mount;
>  	int			error;
>  
>  	/*
> @@ -1004,6 +1005,17 @@ xfs_prepare_shift(
>  			return error;
>  	}
>  
> +	/*
> +	 * Shift operations must stabilize the start block offset boundary along
> +	 * with the full range of the operation. If we don't, a COW writeback
> +	 * completion could race with an insert, front merge with the start
> +	 * extent (after split) during the shift and corrupt the file. Start
> +	 * with the block just prior to the start to stabilize the boundary.
> +	 */
> +	offset = round_down(offset, 1 << mp->m_sb.sb_blocklog);
> +	if (offset)
> +		offset -= (1 << mp->m_sb.sb_blocklog);
> +
>  	/*
>  	 * Writeback and invalidate cache for the remainder of the file as we're
>  	 * about to shift down every extent from offset to EOF.
> -- 
> 2.20.1
>