On Tue, Apr 02, 2024 at 09:40:21PM -0700, Christoph Hellwig wrote: > On Wed, Apr 03, 2024 at 08:38:17AM +1100, Dave Chinner wrote: > > From: Dave Chinner <dchinner@xxxxxxxxxx> > > > > xfs_alloc_file_space ends up in an endless loop when > > xfs_bmapi_write() returns nimaps == 0 at ENOSPC. The process is > > unkillable, and so just runs around in a tight circle burning CPU > > until the system is rebooted. > > What is your reproducer? Let's just fix this for real. Run the reproducer in this bug report on a TOT kernel, and the XFS_IOC_RESVSP call will livelock: https://lore.kernel.org/linux-xfs/CAEJPjCvT3Uag-pMTYuigEjWZHn1sGMZ0GCjVVCv29tNHK76Cgg@xxxxxxxxxxxxxx/ That has nothing to do with delalloc - free space accounting was screwed up by a reserve blocks ioctl, and so when allocation fails it just runs around in a tight circle and cannot be broken out of. Regardless of the reproducer that corrupts free space accounting, there is no guarantee that allocations will succeed even if there is free space available. Hence this loop must have a way to break out when allocation fails. This becomes even more apparent with the forced alignment feature - as soon as we run out of contiguous free space for aligned allocation, allocations will persistently fail when there is plenty of free space still available. Given that the fix was for something that doesn't currently exist (RT delalloc) the only sane thing to do right now is revert the fix and push that revert back to the stable kernels that are susceptible to this livelock. I don't know exactly how the orginal delalloc issue was triggered, let alone had the time to time to understand how to actually fix it properly. The code as it stands contains a regression and so the first thing we need to do is revert the change so we can backport it.... -Dave. -- Dave Chinner david@xxxxxxxxxxxxx