On Thu, Oct 21, 2021 at 12:33:30PM -0400, Brian Foster wrote: > If writeback I/O to a COW extent fails, the COW fork blocks are > punched out and the data fork blocks left alone. It is possible for > COW fork blocks to overlap non-shared data fork blocks (due to > cowextsz hint prealloc), however, and writeback unconditionally maps > to the COW fork whenever blocks exist at the corresponding offset of > the page undergoing writeback. This means it's quite possible for a > COW fork extent to overlap delalloc data fork blocks, writeback to > convert and map to the COW fork blocks, writeback to fail, and > finally for ioend completion to cancel the COW fork blocks and leave > stale data fork delalloc blocks around in the inode. The blocks are > effectively stale because writeback failure also discards dirty page > state. > > If this occurs, it is likely to trigger assert failures, free space > accounting corruption and failures in unrelated file operations. For > example, a subsequent reflink attempt of the affected file to a new > target file will trip over the stale delalloc in the source file and > fail. Several of these issues are occasionally reproduced by > generic/648, but are reproducible on demand with the right sequence > of operations and timely I/O error injection. > > To fix this problem, update the ioend failure path to also punch out > underlying data fork delalloc blocks on I/O error. This is analogous > to the writeback submission failure path in xfs_discard_page() where > we might fail to map data fork delalloc blocks and consistent with > the successful COW writeback completion path, which is responsible > for unmapping from the data fork and remapping in COW fork blocks. > > Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx> > --- > > Note that I have an fstest to reproduce this problem reliably that I'll > post to the list shortly. Yay! > Brian > > fs/xfs/xfs_aops.c | 15 ++++++++++++--- > 1 file changed, 12 insertions(+), 3 deletions(-) > > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c > index 34fc6148032a..c8c15c3c3147 100644 > --- a/fs/xfs/xfs_aops.c > +++ b/fs/xfs/xfs_aops.c > @@ -82,6 +82,7 @@ xfs_end_ioend( > struct iomap_ioend *ioend) > { > struct xfs_inode *ip = XFS_I(ioend->io_inode); > + struct xfs_mount *mp = ip->i_mount; > xfs_off_t offset = ioend->io_offset; > size_t size = ioend->io_size; > unsigned int nofs_flag; > @@ -97,18 +98,26 @@ xfs_end_ioend( > /* > * Just clean up the in-memory structures if the fs has been shut down. > */ > - if (xfs_is_shutdown(ip->i_mount)) { > + if (xfs_is_shutdown(mp)) { > error = -EIO; > goto done; > } > > /* > - * Clean up any COW blocks on an I/O error. > + * Clean up all COW blocks and underlying data fork delalloc blocks on > + * I/O error. The delalloc punch is required because this ioend was > + * mapped to blocks in the COW fork and the associated pages are no > + * longer dirty. If we don't remove delalloc blocks here, they become > + * stale and can corrupt free space accounting on unmount. > */ > error = blk_status_to_errno(ioend->io_bio->bi_status); > if (unlikely(error)) { > - if (ioend->io_flags & IOMAP_F_SHARED) > + if (ioend->io_flags & IOMAP_F_SHARED) { > xfs_reflink_cancel_cow_range(ip, offset, size, true); > + xfs_bmap_punch_delalloc_range(ip, > + XFS_B_TO_FSBT(mp, offset), > + XFS_B_TO_FSB(mp, size)); This looks correct to me. I'll give this a spin with your testcase and report back. Reviewed-by: Darrick J. Wong <djwong@xxxxxxxxxx> --D > + } > goto done; > } > > -- > 2.31.1 >