On Wed, Nov 13, 2024 at 09:05:13AM +1100, Dave Chinner wrote: > These are three bug fixes for recent issues. > > The first is a repost of the original patch to prevent allocation of > sparse inode clusters at the end of an unaligned runt AG. There > was plenty of discussion over that fix here: > > https://lore.kernel.org/linux-xfs/20241024025142.4082218-1-david@xxxxxxxxxxxxx/ > > And the outcome of that discussion is that we can't allow sparse > inode clusters overlapping the end of the runt AG without an on disk > format definition change. Hence this patch to ensure the check is > done correctly is the only change we need to make to the kernel to > avoid this problem in the future. > > Filesystems that have this problem on disk will need to run > xfs_repair to remove the bad cluster, but no data loss is possible > from this because the kernel currently disallows inode allocation > from the bad cluster and so none of the inodes in the sparse cluster > can actually be used. Hence there is no possible data loss or other > metadata corruption possible from this situation, all we need to do > is ensure that it doesn't happen again once repair has done it's > work. <shrug> How many systems are in this state? Would those users rather we fix the validation code in repair/scrub/wherever to allow ichunks that overrun the end of a runt AG? --D > The other two patches are for issues I've recently hit when running > lots of fstests in parallel. That changes loading and hence timing > of events during tests, exposing latent race conditions in the code. > The quota fix removes racy debug code that has been there since the > quota code was first committed in 1996. > > The log shutdown race fix is a much more recent issue created by > trying to ensure shutdowns operate in a sane and predictable manner. > The logic flaw is that we allow multiple log shutdowns to start and > force the log before selecting on a single log shutdown task. This > leads to a situation where shutdown log item callback processing > gets stuck waiting on a task holding a buffer lock that is waiting > on a log force that is waiting on shutdown log item callback > processing to complete... > > Thoughts? > >