On Tue, Dec 10, 2019 at 08:23:40AM -0500, Brian Foster wrote: > generic/522 (fsx) occasionally fails with a file corruption due to > an insert range operation. The primary characteristic of the > corruption is a misplaced insert range operation that differs from > the requested target offset. The reason for this behavior is a race > between the extent shift sequence of an insert range and a COW > writeback completion that causes a front merge with the first extent > in the shift. > > The shift preparation function flushes and unmaps from the target > offset of the operation to the end of the file to ensure no > modifications can be made and page cache is invalidated before file > data is shifted. An insert range operation then splits the extent at > the target offset, if necessary, and begins to shift the start > offset of each extent starting from the end of the file to the start > offset. The shift sequence operates at extent level and so depends > on the preparation sequence to guarantee no changes can be made to > the target range during the shift. If the block immediately prior to > the target offset was dirty and shared, however, it can undergo > writeback and move from the COW fork to the data fork at any point > during the shift. If the block is contiguous with the block at the > start offset of the insert range, it can front merge and alter the > start offset of the extent. Once the shift sequence reaches the > target offset, it shifts based on the latest start offset and > silently changes the target offset of the operation and corrupts the > file. > > To address this problem, update the shift preparation code to > stabilize the start boundary along with the full range of the > insert. Also update the existing corruption check to fail if any > extent is shifted with a start offset behind the target offset of > the insert range. This prevents insert from racing with COW > writeback completion and fails loudly in the event of an unexpected > extent shift. > > Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx> > --- > > This has survived a couple fstests runs (upstream) so far as well as an > overnight loop test of generic/522 (on RHEL). The RHEL based kernel just > happened to be where this was originally diagnosed and provides a fairly > reliable failure rate of within 30 iterations or so. The current test is > at almost 70 iterations and still running. > > Brian After a week of testing I declare that 2.5 billion fsxops should be enough for anyone. :P This /also/ seems to have fixed the separate corruption problems I was seeing, so... Reviewed-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx> --D > > fs/xfs/libxfs/xfs_bmap.c | 3 +-- > fs/xfs/xfs_bmap_util.c | 12 ++++++++++++ > 2 files changed, 13 insertions(+), 2 deletions(-) > > diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c > index a9ad1f991ba3..4a802b3abe77 100644 > --- a/fs/xfs/libxfs/xfs_bmap.c > +++ b/fs/xfs/libxfs/xfs_bmap.c > @@ -5972,8 +5972,7 @@ xfs_bmap_insert_extents( > goto del_cursor; > } > > - if (XFS_IS_CORRUPT(mp, > - stop_fsb >= got.br_startoff + got.br_blockcount)) { > + if (XFS_IS_CORRUPT(mp, stop_fsb > got.br_startoff)) { > error = -EFSCORRUPTED; > goto del_cursor; > } > diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c > index 2efd78a9719e..e62fb5216341 100644 > --- a/fs/xfs/xfs_bmap_util.c > +++ b/fs/xfs/xfs_bmap_util.c > @@ -992,6 +992,7 @@ xfs_prepare_shift( > struct xfs_inode *ip, > loff_t offset) > { > + struct xfs_mount *mp = ip->i_mount; > int error; > > /* > @@ -1004,6 +1005,17 @@ xfs_prepare_shift( > return error; > } > > + /* > + * Shift operations must stabilize the start block offset boundary along > + * with the full range of the operation. If we don't, a COW writeback > + * completion could race with an insert, front merge with the start > + * extent (after split) during the shift and corrupt the file. Start > + * with the block just prior to the start to stabilize the boundary. > + */ > + offset = round_down(offset, 1 << mp->m_sb.sb_blocklog); > + if (offset) > + offset -= (1 << mp->m_sb.sb_blocklog); > + > /* > * Writeback and invalidate cache for the remainder of the file as we're > * about to shift down every extent from offset to EOF. > -- > 2.20.1 >