Re: [PATCH] xfs: don't reuse busy extents on extent trim

Chandan Babu R <chandanrlinux@xxxxxxxxx> · Tue, 23 Feb 2021 11:54:32 +0530

On 22 Feb 2021 at 21:04, Brian Foster wrote:
> Freed extents are marked busy from the point the freeing transaction
> commits until the associated CIL context is checkpointed to the log.
> This prevents reuse and overwrite of recently freed blocks before
> the changes are committed to disk, which can lead to corruption
> after a crash. The exception to this rule is that metadata
> allocation is allowed to reuse busy extents because metadata changes
> are also logged.
>
> As of commit 97d3ac75e5e0 ("xfs: exact busy extent tracking"), XFS
> has allowed modification or complete invalidation of outstanding
> busy extents for metadata allocations. This implementation assumes
> that use of the associated extent is imminent, which is not always
> the case. For example, the trimmed extent might not satisfy the
> minimum length of the allocation request, or the allocation
> algorithm might be involved in a search for the optimal result based
> on locality.
>
> generic/019 reproduces a corruption caused by this scenario. First,
> a metadata block (usually a bmbt or symlink block) is freed from an
> inode. A subsequent bmbt split on an unrelated inode attempts a near
> mode allocation request that invalidates the busy block during the
> search, but does not ultimately allocate it. Due to the busy state
> invalidation, the block is no longer considered busy to subsequent
> allocation. A direct I/O write request immediately allocates the
> block and writes to it. Finally, the filesystem crashes while in a
> state where the initial metadata block free had not committed to the
> on-disk log. After recovery, the original metadata block is in its
> original location as expected, but has been corrupted by the
> aforementioned dio.
>
> This demonstrates that it is fundamentally unsafe to modify busy
> extent state for extents that are not guaranteed to be allocated.
> This applies to pretty much all of the code paths that currently
> trim busy extents for one reason or another. Therefore to address
> this problem, drop the reuse mechanism from the busy extent trim
> path. This code already knows how to return partial non-busy ranges
> of the targeted free extent and higher level code tracks the busy
> state of the allocation attempt. If a block allocation fails where
> one or more candidate extents is busy, we force the log and retry
> the allocation.
>

The changes look good to me.

Reviewed-by: Chandan Babu R <chandanrlinux@xxxxxxxxx>

> Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx>
> ---
>  fs/xfs/xfs_extent_busy.c | 14 --------------
>  1 file changed, 14 deletions(-)
>
> diff --git a/fs/xfs/xfs_extent_busy.c b/fs/xfs/xfs_extent_busy.c
> index 3991e59cfd18..ef17c1f6db32 100644
> --- a/fs/xfs/xfs_extent_busy.c
> +++ b/fs/xfs/xfs_extent_busy.c
> @@ -344,7 +344,6 @@ xfs_extent_busy_trim(
>  	ASSERT(*len > 0);
>  
>  	spin_lock(&args->pag->pagb_lock);
> -restart:
>  	fbno = *bno;
>  	flen = *len;
>  	rbp = args->pag->pagb_tree.rb_node;
> @@ -363,19 +362,6 @@ xfs_extent_busy_trim(
>  			continue;
>  		}
>  
> -		/*
> -		 * If this is a metadata allocation, try to reuse the busy
> -		 * extent instead of trimming the allocation.
> -		 */
> -		if (!(args->datatype & XFS_ALLOC_USERDATA) &&
> -		    !(busyp->flags & XFS_EXTENT_BUSY_DISCARDED)) {
> -			if (!xfs_extent_busy_update_extent(args->mp, args->pag,
> -							  busyp, fbno, flen,
> -							  false))
> -				goto restart;
> -			continue;
> -		}
> -
>  		if (bbno <= fbno) {
>  			/* start overlap */

-- 
chandan