Re: [RFC PATCH 0/4] bringing back the AGFL reserve

Dave Chinner <david@xxxxxxxxxxxxx> · Fri, 14 Jun 2024 10:45:37 +1000

On Thu, Jun 13, 2024 at 01:27:09PM -0700, Krister Johansen wrote:
> Hi,
> One of the teams that I work with hits WARNs in
> xfs_bmap_extents_to_btree() on a database workload of theirs.  The last
> time the subject came up on linux-xfs, it was suggested[1] to try
> building an AG reserve pool for the AGFL.
> 
> I managed to work out a reproducer for the problem.  Debugging that, the
> steps Gao outlined turned out to be essentially what was necessary to
> get the problem to happen repeatably.
> 
> 1. Allocate almost all of the space in an AG
> 2. Free and reallocate that space to fragement it so the freespace
> b-trees are just about to split.
> 3. Allocate blocks in a file such that the next extent allocated for
> that file will cause its bmbt to get converted from an inline extent to
> a b-tree.
> 4. Free space such that the free-space btrees have a contiguous extent
> with a busy portion on either end
> 5. Allocate the portion in the middle, splitting the extent and
> triggering a b-tree split.

Do you have a script that sets up this precondition reliably?
It sounds like it can be done from a known filesystem config. If you
do have a script, can you share it? Or maybe even better, turn it
into an fstest?

> On older kernels this is all it takes.  After the AG-aware allocator
> changes I also need to start the allocation in the highest numbered AG
> available while inducing lock contention in the lower numbered AGs.

Ah, so you have to perform a DOS on the lower AGFs so that the
attempts made by the xfs_alloc_vextent_start_ag() to trylock the
lower AGFs once it finds it cannot allocate in the highest AG
anymore also fail.

That was one of the changes made in the perag aware allocator
rework; it added full-range AG iteration when XFS_ALLOC_FLAG_TRYLOCK
is set because we can't deadlock on reverse order AGF locking when
using trylocks.

However, if the trylock iteration fails, it then sets the restart AG
to the minimum AG be can wait for without deadlocking, removes the
trylock and restarts the iteration. Hence you've had to create AGF
lock contention to force the allocator back to being restricted by
the AGF locking orders.

Is this new behaviour sufficient to mitigate the problem being seen
with this database workload? Has it been tested with kernels that
have those changes, and if so did it have any impact on the
frequency of the issue occurring?

> In order to ensure that AGs have enough space to complete transactions
> with multiple allocations, I've taken a stab at implementing an AGFL
> reserve pool.

OK. I'll comment directly on the code from here, hopefully I'll
address your other questions in those comments.

-Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx