Re: regressions in xfs/168?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Darrick and Dave,

On Wed, May 19, 2021 at 05:08:02PM -0700, Darrick J. Wong wrote:
> On Thu, May 20, 2021 at 08:20:06AM +1000, Dave Chinner wrote:
> > On Wed, May 19, 2021 at 02:02:05PM -0700, Darrick J. Wong wrote:
> > > Hm.  Does anyone /else/ see failures with the new test xfs/168 (the fs
> > > shrink tests) on a 1k blocksize?  It looks as though we shrink the AG so
> > > small that we trip the assert at the end of xfs_ag_resv_init that checks
> > > that the reservations for an AG don't exceed the free space in that AG,
> > > but tripping that doesn't return any error code, so xfs_ag_shrink_space
> > > commits the new fs size and presses on with even more shrinking until
> > > we've depleted AG 1 so thoroughly that the fs won't mount anymore.
> > 
> > Yup, now that I've got the latest fstests I see that failure, too.
> > 
> > [58972.431760] Call Trace:
> > [58972.432467]  xfs_ag_resv_init+0x1d3/0x240
> > [58972.433611]  xfs_ag_shrink_space+0x1bf/0x360
> > [58972.434801]  xfs_growfs_data+0x413/0x640
> > [58972.435894]  xfs_file_ioctl+0x32f/0xd30
> > [58972.439289]  __x64_sys_ioctl+0x8e/0xc0
> > [58972.440337]  do_syscall_64+0x3a/0x70
> > [58972.441347]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > [58972.442741] RIP: 0033:0x7f7021755d87
> > 
> > > At a bare minimum we probably need to check the same thing the assert
> > > does and bail out of the shrink; or maybe we just need to create a
> > > function to adjust an AG's reservation to make that function less
> > > complicated.
> > 
> > So if I'm reading xfs_ag_shrink_space() correctly, it doesn't
> > check what the new reservation will be and so it's purely looking at
> > whether the physical range can be freed or not? And when freeing
> > that physical range results in less free space in the AG than the
> > reservation requires, we pop an assert failure rather than failing
> > the reservation and undoing the shrink like the code is supposed to
> > do?
> 
> Yes.  I've wondered for a while now if that assert in xfs_ag_resv_init
> should get turned into an ENOSPC return so that callers can decide what
> they want to do with it.

Thanks for the detailed analysis (sorry that I didn't check the 1k blocksize
case before), I'm now renting a department in a new city, no xfstests env
available for now.

But if I read/understand correctly, the following code might resolve the issue?

diff --git a/fs/xfs/libxfs/xfs_ag_resv.c b/fs/xfs/libxfs/xfs_ag_resv.c
index 6c5f8d10589c..1f918afd5e91 100644
--- a/fs/xfs/libxfs/xfs_ag_resv.c
+++ b/fs/xfs/libxfs/xfs_ag_resv.c
@@ -312,10 +312,12 @@ xfs_ag_resv_init(
 	if (error)
 		return error;
 
-	ASSERT(xfs_perag_resv(pag, XFS_AG_RESV_METADATA)->ar_reserved +
-	       xfs_perag_resv(pag, XFS_AG_RESV_RMAPBT)->ar_reserved <=
-	       pag->pagf_freeblks + pag->pagf_flcount);
 #endif
+	if (xfs_perag_resv(pag, XFS_AG_RESV_METADATA)->ar_reserved +
+	    xfs_perag_resv(pag, XFS_AG_RESV_RMAPBT)->ar_reserved >
+	    pag->pagf_freeblks + pag->pagf_flcount)
+		return -ENOSPC;
+
 out:
 	return error;
 }

If that works, could you kindly send out it (or some better/sane solution),
many thanks in advance!

Thanks,
Gao Xiang

> 
> --D
> 
> > IOWs, the problem is the ASSERT firing on debug kernels, not the
> > actual shrink code that does handle this reservation ENOSPC error
> > case properly? i.e. we've got something like an uncaught overflow
> > in xfs_ag_resv_init() that is tripping the assert? (e.g. used >
> > ask)
> > 
> > So I'm not sure that the problem is the shrink code here - it should
> > undo a reservation failure just fine, but the reservation code is
> > failing before we get there on a debug kernel...
> > 
> > Cheers,
> > 
> > Dave.
> > -- 
> > Dave Chinner
> > david@xxxxxxxxxxxxx



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux