On Sat, Apr 23, 2022 at 08:01:20AM +1000, Dave Chinner wrote: > On Thu, Apr 14, 2022 at 03:54:31PM -0700, Darrick J. Wong wrote: > > From: Darrick J. Wong <djwong@xxxxxxxxxx> > > > > In commit e1a4e37cc7b6, we clamped the length of bunmapi calls on the > > data forks of shared files to avoid two failure scenarios: one where the > > extent being unmapped is so sparsely shared that we exceed the > > transaction reservation with the sheer number of refcount btree updates > > and EFI intent items; and the other where we attach so many deferred > > updates to the transaction that we pin the log tail and later the log > > head meets the tail, causing the log to livelock. > > > > We avoid triggering the first problem by tracking the number of ops in > > the refcount btree cursor and forcing a requeue of the refcount intent > > item any time we think that we might be close to overflowing. This has > > been baked into XFS since before the original e1a4 patch. > > > > A recent patchset fixed the second problem by changing the deferred ops > > code to finish all the work items created by each round of trying to > > complete a refcount intent item, which eliminates the long chains of > > deferred items (27dad); and causing long-running transactions to relog > > their intent log items when space in the log gets low (74f4d). > > > > Because this clamp affects /any/ unmapping request regardless of the > > sharing factors of the component blocks, it degrades the performance of > > all large unmapping requests -- whereas with an unshared file we can > > unmap millions of blocks in one go, shared files are limited to > > unmapping a few thousand blocks at a time, which causes the upper level > > code to spin in a bunmapi loop even if it wasn't needed. > > > > This also eliminates one more place where log recovery behavior can > > differ from online behavior, because bunmapi operations no longer need > > to requeue. > > > > Partial-revert-of: e1a4e37cc7b6 ("xfs: try to avoid blowing out the transaction reservation when bunmaping a shared extent") > > Depends: 27dada070d59 ("xfs: change the order in which child and parent defer ops ar finished") > > Depends: 74f4d6a1e065 ("xfs: only relog deferred intent items if free space in the log gets low") > > Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx> > > --- > > fs/xfs/libxfs/xfs_bmap.c | 22 +--------------------- > > fs/xfs/libxfs/xfs_refcount.c | 5 ++--- > > fs/xfs/libxfs/xfs_refcount.h | 8 ++------ > > 3 files changed, 5 insertions(+), 30 deletions(-) > > This looks reasonable, but I'm wondering how the original problem > was discovered and whether this has been tested against that > original problem situation to ensure we aren't introducing a > regression here.... generic/447, and yes, I have forced it to run a deletion of 1 million extents without incident. :) I should probably amend that test to note that it's an exerciser for e1a4e37cc7b6. > > diff --git a/fs/xfs/libxfs/xfs_refcount.h b/fs/xfs/libxfs/xfs_refcount.h > > index 9eb01edbd89d..6b265f6075b8 100644 > > --- a/fs/xfs/libxfs/xfs_refcount.h > > +++ b/fs/xfs/libxfs/xfs_refcount.h > > @@ -66,15 +66,11 @@ extern int xfs_refcount_recover_cow_leftovers(struct xfs_mount *mp, > > * reservation and crash the fs. Each record adds 12 bytes to the > > * log (plus any key updates) so we'll conservatively assume 32 bytes > > * per record. We must also leave space for btree splits on both ends > > - * of the range and space for the CUD and a new CUI. > > + * of the range and space for the CUD and a new CUI. Each EFI that we > > + * attach to the transaction also consumes ~32 bytes. > > */ > > #define XFS_REFCOUNT_ITEM_OVERHEAD 32 > > FWIW, I think this is a low-ball number - each EFI also consumes an > ophdr (12 bytes) for the region identifier in the log, so it's > actually 44 bytes, not 32 bytes that will be consumed. It is not > necessary to address this in this patchset, though. <Nod> --D > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx