Re: [PATCH 1/6] xfs: stop artificially limiting the length of bunmap calls

Dave Chinner <david@xxxxxxxxxxxxx> · Sat, 23 Apr 2022 09:51:33 +1000



On Fri, Apr 22, 2022 at 03:18:20PM -0700, Darrick J. Wong wrote:
> On Sat, Apr 23, 2022 at 08:01:20AM +1000, Dave Chinner wrote:
> > On Thu, Apr 14, 2022 at 03:54:31PM -0700, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <djwong@xxxxxxxxxx>
> > > 
> > > In commit e1a4e37cc7b6, we clamped the length of bunmapi calls on the
> > > data forks of shared files to avoid two failure scenarios: one where the
> > > extent being unmapped is so sparsely shared that we exceed the
> > > transaction reservation with the sheer number of refcount btree updates
> > > and EFI intent items; and the other where we attach so many deferred
> > > updates to the transaction that we pin the log tail and later the log
> > > head meets the tail, causing the log to livelock.
> > > 
> > > We avoid triggering the first problem by tracking the number of ops in
> > > the refcount btree cursor and forcing a requeue of the refcount intent
> > > item any time we think that we might be close to overflowing.  This has
> > > been baked into XFS since before the original e1a4 patch.
> > > 
> > > A recent patchset fixed the second problem by changing the deferred ops
> > > code to finish all the work items created by each round of trying to
> > > complete a refcount intent item, which eliminates the long chains of
> > > deferred items (27dad); and causing long-running transactions to relog
> > > their intent log items when space in the log gets low (74f4d).
> > > 
> > > Because this clamp affects /any/ unmapping request regardless of the
> > > sharing factors of the component blocks, it degrades the performance of
> > > all large unmapping requests -- whereas with an unshared file we can
> > > unmap millions of blocks in one go, shared files are limited to
> > > unmapping a few thousand blocks at a time, which causes the upper level
> > > code to spin in a bunmapi loop even if it wasn't needed.
> > > 
> > > This also eliminates one more place where log recovery behavior can
> > > differ from online behavior, because bunmapi operations no longer need
> > > to requeue.
> > > 
> > > Partial-revert-of: e1a4e37cc7b6 ("xfs: try to avoid blowing out the transaction reservation when bunmaping a shared extent")
> > > Depends: 27dada070d59 ("xfs: change the order in which child and parent defer ops ar finished")
> > > Depends: 74f4d6a1e065 ("xfs: only relog deferred intent items if free space in the log gets low")
> > > Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx>
> > > ---
> > >  fs/xfs/libxfs/xfs_bmap.c     |   22 +---------------------
> > >  fs/xfs/libxfs/xfs_refcount.c |    5 ++---
> > >  fs/xfs/libxfs/xfs_refcount.h |    8 ++------
> > >  3 files changed, 5 insertions(+), 30 deletions(-)
> > 
> > This looks reasonable, but I'm wondering how the original problem
> > was discovered and whether this has been tested against that
> > original problem situation to ensure we aren't introducing a
> > regression here....
> 
> generic/447, and yes, I have forced it to run a deletion of 1 million
> extents without incident. :)

Ok, that's all I wanted to know :)

Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx>

-- 
Dave Chinner
david@xxxxxxxxxxxxx