On 22 Feb 2021 at 21:04, Brian Foster wrote: > Freed extents are marked busy from the point the freeing transaction > commits until the associated CIL context is checkpointed to the log. > This prevents reuse and overwrite of recently freed blocks before > the changes are committed to disk, which can lead to corruption > after a crash. The exception to this rule is that metadata > allocation is allowed to reuse busy extents because metadata changes > are also logged. > > As of commit 97d3ac75e5e0 ("xfs: exact busy extent tracking"), XFS > has allowed modification or complete invalidation of outstanding > busy extents for metadata allocations. This implementation assumes > that use of the associated extent is imminent, which is not always > the case. For example, the trimmed extent might not satisfy the > minimum length of the allocation request, or the allocation > algorithm might be involved in a search for the optimal result based > on locality. > > generic/019 reproduces a corruption caused by this scenario. First, > a metadata block (usually a bmbt or symlink block) is freed from an > inode. A subsequent bmbt split on an unrelated inode attempts a near > mode allocation request that invalidates the busy block during the > search, but does not ultimately allocate it. Due to the busy state > invalidation, the block is no longer considered busy to subsequent > allocation. A direct I/O write request immediately allocates the > block and writes to it. Finally, the filesystem crashes while in a > state where the initial metadata block free had not committed to the > on-disk log. After recovery, the original metadata block is in its > original location as expected, but has been corrupted by the > aforementioned dio. > > This demonstrates that it is fundamentally unsafe to modify busy > extent state for extents that are not guaranteed to be allocated. > This applies to pretty much all of the code paths that currently > trim busy extents for one reason or another. Therefore to address > this problem, drop the reuse mechanism from the busy extent trim > path. This code already knows how to return partial non-busy ranges > of the targeted free extent and higher level code tracks the busy > state of the allocation attempt. If a block allocation fails where > one or more candidate extents is busy, we force the log and retry > the allocation. > The changes look good to me. Reviewed-by: Chandan Babu R <chandanrlinux@xxxxxxxxx> > Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx> > --- > fs/xfs/xfs_extent_busy.c | 14 -------------- > 1 file changed, 14 deletions(-) > > diff --git a/fs/xfs/xfs_extent_busy.c b/fs/xfs/xfs_extent_busy.c > index 3991e59cfd18..ef17c1f6db32 100644 > --- a/fs/xfs/xfs_extent_busy.c > +++ b/fs/xfs/xfs_extent_busy.c > @@ -344,7 +344,6 @@ xfs_extent_busy_trim( > ASSERT(*len > 0); > > spin_lock(&args->pag->pagb_lock); > -restart: > fbno = *bno; > flen = *len; > rbp = args->pag->pagb_tree.rb_node; > @@ -363,19 +362,6 @@ xfs_extent_busy_trim( > continue; > } > > - /* > - * If this is a metadata allocation, try to reuse the busy > - * extent instead of trimming the allocation. > - */ > - if (!(args->datatype & XFS_ALLOC_USERDATA) && > - !(busyp->flags & XFS_EXTENT_BUSY_DISCARDED)) { > - if (!xfs_extent_busy_update_extent(args->mp, args->pag, > - busyp, fbno, flen, > - false)) > - goto restart; > - continue; > - } > - > if (bbno <= fbno) { > /* start overlap */ -- chandan