Re: [PATCH] xfs: stabilize insert range start boundary to avoid COW writeback race

Dave Chinner <david@xxxxxxxxxxxxx> · Wed, 11 Dec 2019 08:41:00 +1100

On Tue, Dec 10, 2019 at 08:23:40AM -0500, Brian Foster wrote:
> generic/522 (fsx) occasionally fails with a file corruption due to
> an insert range operation. The primary characteristic of the
> corruption is a misplaced insert range operation that differs from
> the requested target offset. The reason for this behavior is a race
> between the extent shift sequence of an insert range and a COW
> writeback completion that causes a front merge with the first extent
> in the shift.

How is the COW writeback completion modifying the extent list while
an extent shift is modifying the extent list?  Both should be
running under XFS_ILOCK_EXCL contexts so there shouldn't be a race
condition here unless we've screwed up the extent list modification
atomicity...

> 
> The shift preparation function flushes and unmaps from the target
> offset of the operation to the end of the file to ensure no
> modifications can be made and page cache is invalidated before file
> data is shifted. An insert range operation then splits the extent at
> the target offset, if necessary, and begins to shift the start
> offset of each extent starting from the end of the file to the start
> offset. The shift sequence operates at extent level and so depends
> on the preparation sequence to guarantee no changes can be made to
> the target range during the shift.

Oh... shifting extents is not an atomic operation w.r.t. other
inode modifications - both insert and collapse run individual
modification transactions and lock/unlock the inode around each
transaction. So, essentially, they aren't atomic when faced with
other *metadata* modifications to the inode.

> If the block immediately prior to
> the target offset was dirty and shared, however, it can undergo
> writeback and move from the COW fork to the data fork at any point
> during the shift. If the block is contiguous with the block at the
> start offset of the insert range, it can front merge and alter the
> start offset of the extent. Once the shift sequence reaches the
> target offset, it shifts based on the latest start offset and
> silently changes the target offset of the operation and corrupts the
> file.

Yup, that's exactly the landmine that non-atomic, multi-transaction
extent range operations have. It might be a COW operation, it might
be something else that ends up manipulating the extent list. But
because the ILOCK is not held across the entire extent shift,
insert/collapse are susceptible to corruption when any other XFs
code concurrently modifies the extent list.

I think insert/collapse need to be converted to work like a
truncate operation instead of a series on individual write
operations. That is, they are a permanent transaction that locks the
inode once and is rolled repeatedly until the entire extent listi
modification is done and then the inode is unlocked.

> To address this problem, update the shift preparation code to
> stabilize the start boundary along with the full range of the
> insert. Also update the existing corruption check to fail if any
> extent is shifted with a start offset behind the target offset of
> the insert range. This prevents insert from racing with COW
> writeback completion and fails loudly in the event of an unexpected
> extent shift.

It looks ok to avoid this particular symptom (backportable point
fix), but I really think we should convert insert/collapse to be
atomic w.r.t other extent list modifications....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx