Re: [PATCH] xfs: fix extent busy updating

"Darrick J. Wong" <djwong@xxxxxxxxxx> · Thu, 5 Jan 2023 11:58:22 -0800

On Wed, Jan 04, 2023 at 05:41:15PM +0000, Wengang Wang wrote:
> Darrick,
> 
> > On Jan 4, 2023, at 8:24 AM, Darrick J. Wong <djwong@xxxxxxxxxx> wrote:
> > 
> > On Tue, Jan 03, 2023 at 11:32:17AM -0800, Wengang Wang wrote:
> >> In xfs_extent_busy_update_extent() case 6 and 7, whenever bno is modified on
> >> extent busy, the relavent length has to be modified accordingly.
> >> 
> >> Signed-off-by: Wengang Wang <wen.gang.wang@xxxxxxxxxx>
> >> ---
> >> fs/xfs/xfs_extent_busy.c | 1 +
> >> 1 file changed, 1 insertion(+)
> >> 
> >> diff --git a/fs/xfs/xfs_extent_busy.c b/fs/xfs/xfs_extent_busy.c
> >> index ad22a003f959..f3d328e4a440 100644
> >> --- a/fs/xfs/xfs_extent_busy.c
> >> +++ b/fs/xfs/xfs_extent_busy.c
> >> @@ -236,6 +236,7 @@ xfs_extent_busy_update_extent(
> >> 		 *
> >> 		 */
> >> 		busyp->bno = fend;
> >> +		busyp->length = bend - fend;
> > 
> > Looks correct to me, but how did you find this?
> 
> I was working with a UEK5 XFS bug where busy blocks (contained in
> extent busy) are allocated to regular files unexpectedly.  When I was
> trying to fix that problem (still reuse busy blocks for directories),
> the problem here is exposed.
> 
> 
> >  Is there some sort of
> > test case we could attach to this?
> 
> Hm.. I can only reproduce this with my patch. Well, the idea is that,
> for example,
> 
> 1) we have an extent busy in the busy tree:    (bno=100, len=200)
> 2) allocate blocks for directories from above extent busy (multiple times)
> 3) after the allocations, above extent busy finally becomes  (bno=300, len=200)  though it should become (bno=300, len=0) and should be removed from the busy tree.
> 4) the block 300 (in that AG) is used as metadata (directory blocks containing dir entries) and then that block is freed
> 5) insert the new extent busy (bno=300, len=1) to the busy tree,
> in function xfs_extent_busy_insert():
> 
>  61         while (*rbp) {
>  62                 parent = *rbp;
>  63                 busyp = rb_entry(parent, struct xfs_extent_busy, rb_node);
>  64
>  65                 if (new->bno < busyp->bno) {
>  66                         rbp = &(*rbp)->rb_left;
>  67                         ASSERT(new->bno + new->length <= busyp->bno);
>  68                 } else if (new->bno > busyp->bno) {
>  69                         rbp = &(*rbp)->rb_right;
>  70                         ASSERT(bno >= busyp->bno + busyp->length);
>  71                 } else {
>  72                         ASSERT(0);
>  73                 }
>  74         }
> 
> Note that node (bno=300, len=200) already exists in the tree, the code
> hits line 72, the “else” case, and enters infinite loop.

Hm.  I /think/ it would be possible but fairly difficult to write an
fstest -- you'd have to create a storage with a very slow DISCARD
command, mount the fs with -o discard, fill the fs, free all the blocks,
make a large directory structure, and then rmdir the directory.  Those
last three steps would have to happen before the discards finish.

Uh... do you think that's worth the effort?  I don't think the kernel
even has a dm driver to simulate a device with only slow discards.

In the meantime,
Reviewed-by: Darrick J. Wong <djwong@xxxxxxxxxx>

--D

> 
> thanks,
> wengang
> 
> > 
> > --D
> > 
> >> 	} else if (bbno < fbno) {
> >> 		/*
> >> 		 * Case 8:
> >> -- 
> >> 2.21.0 (Apple Git-122.2)
> >> 
>