On Thu, May 20, 2021 at 08:20:06AM +1000, Dave Chinner wrote: > On Wed, May 19, 2021 at 02:02:05PM -0700, Darrick J. Wong wrote: > > Hm. Does anyone /else/ see failures with the new test xfs/168 (the fs > > shrink tests) on a 1k blocksize? It looks as though we shrink the AG so > > small that we trip the assert at the end of xfs_ag_resv_init that checks > > that the reservations for an AG don't exceed the free space in that AG, > > but tripping that doesn't return any error code, so xfs_ag_shrink_space > > commits the new fs size and presses on with even more shrinking until > > we've depleted AG 1 so thoroughly that the fs won't mount anymore. > > Yup, now that I've got the latest fstests I see that failure, too. > > [58972.431760] Call Trace: > [58972.432467] xfs_ag_resv_init+0x1d3/0x240 > [58972.433611] xfs_ag_shrink_space+0x1bf/0x360 > [58972.434801] xfs_growfs_data+0x413/0x640 > [58972.435894] xfs_file_ioctl+0x32f/0xd30 > [58972.439289] __x64_sys_ioctl+0x8e/0xc0 > [58972.440337] do_syscall_64+0x3a/0x70 > [58972.441347] entry_SYSCALL_64_after_hwframe+0x44/0xae > [58972.442741] RIP: 0033:0x7f7021755d87 > > > At a bare minimum we probably need to check the same thing the assert > > does and bail out of the shrink; or maybe we just need to create a > > function to adjust an AG's reservation to make that function less > > complicated. > > So if I'm reading xfs_ag_shrink_space() correctly, it doesn't > check what the new reservation will be and so it's purely looking at > whether the physical range can be freed or not? And when freeing > that physical range results in less free space in the AG than the > reservation requires, we pop an assert failure rather than failing > the reservation and undoing the shrink like the code is supposed to > do? Yes. I've wondered for a while now if that assert in xfs_ag_resv_init should get turned into an ENOSPC return so that callers can decide what they want to do with it. --D > IOWs, the problem is the ASSERT firing on debug kernels, not the > actual shrink code that does handle this reservation ENOSPC error > case properly? i.e. we've got something like an uncaught overflow > in xfs_ag_resv_init() that is tripping the assert? (e.g. used > > ask) > > So I'm not sure that the problem is the shrink code here - it should > undo a reservation failure just fine, but the reservation code is > failing before we get there on a debug kernel... > > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx