On Tue, Apr 25, 2023 at 08:20:52AM -0700, Darrick J. Wong wrote: > On Sat, Apr 22, 2023 at 08:24:40AM +1000, Dave Chinner wrote: > > From: Dave Chinner <dchinner@xxxxxxxxxx> > > > > On a filesystem with a non-zero stripe unit and a large sequential > > write, delayed allocation will set a minimum allocation length of > > the stripe unit. If allocation fails because there are no extents > > long enough for an aligned minlen allocation, it is supposed to > > fall back to unaligned allocation which allows single block extents > > to be allocated. > > > > When the allocator code was rewritting in the 6.3 cycle, this > > fallback was broken - the old code used args->fsbno as the both the > > allocation target and the allocation result, the new code passes the > > target as a separate parameter. The conversion didn't handle the > > aligned->unaligned fallback path correctly - it reset args->fsbno to > > the target fsbno on failure which broke allocation failure detection > > in the high level code and so it never fell back to unaligned > > allocations. > > > > This resulted in a loop in writeback trying to allocate an aligned > > block, getting a false positive success, trying to insert the result > > in the BMBT. This did nothing because the extent already was in the > > BMBT (merge results in an unchanged extent) and so it returned the > > prior extent to the conversion code as the current iomap. > > > > Because the iomap returned didn't cover the offset we tried to map, > > xfs_convert_blocks() then retries the allocation, which fails in the > > same way and now we have a livelock. > > > > Reported-by: Brian Foster <bfoster@xxxxxxxxxx> > > Fixes: 85843327094f ("xfs: factor xfs_bmap_btalloc()") > > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> > > Insofar as this has revealed a whole ton of *more* problems in mkfs, > Reviewed-by: Darrick J. Wong <djwong@xxxxxxxxxx> Thanks, I've added this to for-next and I'll include it in the pull req to Linus tomorrow because I don't want expose everyone using merge window kernels to this ENOSPC issue even for a short while. > Specifically: if I set su=128k,sw=4, some tests will try to format a > 512M filesystem. This results in an 8-AG filesystem with a log that > fills up almost but not all of an entire AG. The AG then ends up with > an empty bnobt and an empty AGFL, and 25 missing blocks... I used su=64k,sw=2 so I didn't see those specific issues. Mostly I see failures due to mkfs warnings like this: +Warning: AG size is a multiple of stripe width. This can cause performance +problems by aligning all AGs on the same disk. To avoid this, run mkfs with +an AG size that is one stripe unit smaller or larger, for example 129248. > ...oh and the new test vms that run this config failed to finish for > some reason. Sigh. Yeah, I've had xfs_repair hang in xfs/155 a couple of times. Killing the xfs_repair process allows everything to keep going. I suspect it's a prefetch race/deadlock... -Dave. -- Dave Chinner david@xxxxxxxxxxxxx