Re: [PATCH 2/2] xfs: Prevent deadlock when allocating blocks for AGFL

Chandan Babu R <chandanrlinux@xxxxxxxxx> · Wed, 05 May 2021 18:12:41 +0530

On 04 May 2021 at 05:33, Dave Chinner wrote:
> On Mon, May 03, 2021 at 03:22:10PM +0530, Chandan Babu R wrote:
>> On 01 May 2021 at 04:14, Dave Chinner wrote:
>> >> that end, I have tried to slightly simplify the patch that I had originally
>> >> sent (i.e. [PATCH 2/2] xfs: Prevent deadlock when allocating blocks for
>> >> AGFL). The new patch removes one the boolean variables
>> >> (i.e. alloc_small_extent) and also skips redundant searching of extent records
>> >> when backtracking in preparation for searching smaller extents.
>> >
>> > I still don't think this is right approach because it tries to
>> > correct a bad decision (use a busy extent instead of trying the next
>> > free extent) with another bad decision (log force might not unbusy
>> > the extent we are trying to allocate). We should not do either of
>> > these things in this situation, nor do we need to mark busy extents
>> > as being in a transaction to avoid deadlocks.
>> >
>> > That is, if all free extents are busy and there is nothing we can
>> > allocate in the AG for the AGFL, then flush the busy extents and try
>> > again while we hold the AGF locked. Because we hold the AGF locked,
>> > nobody else can create new busy extents in the AG while we wait.
>> > That means after a busy extent flush any remaining busy extents in
>> > this AG are ones that we hold busy in this transaction and are the
>> > ones we need to avoid allocating from in the first place.
>> >
>> > IOWs, we don't need to mark busy extents as being in a transaction
>> > at all - we know that this is the only way we can have a busy extent
>> > in the AG after we flush busy extents while holding the AGF locked.
>> > And that means if we still can't find a free extent after a busy
>> > extent flush, then we're definitely at ENOSPC in that AG as there
>> > are no free extents we can safely allocate from in the AG....
>>
>> ... Assume that there is one free busy extent in an AG and that it is 1 block
>> in length. Also assume that the free extent is busy in the current
>> transaction.
>
> ISTR that this won't happen during extent allocation because the
> transaction reservation and the AG selection code is supposed to
> ensure there are sufficient free blocks both globally and in the AG
> for the entire operation, not just one part of it.
>
> Also, the extent freeing path is this:
>
> ...
>   __xfs_free_extent()
>     xfs_free_extent_fix_freelist()
>       xfs_alloc_fix_freelist(XFS_ALLOC_FLAG_FREEING)
>
> And that XFS_ALLOC_FLAG_FREEING is special - it means that we:
>
> a) always say there is space available in the AG for the freeing
> operation to take place, and
> b) only perform best effort allocation to fill up the free list.
>
> Case b) triggers this code:
>
>                 /*
>                  * Stop if we run out.  Won't happen if callers are obeying
>                  * the restrictions correctly.  Can happen for free calls
>                  * on a completely full ag.
>                  */
>                 if (targs.agbno == NULLAGBLOCK) {
>                         if (flags & XFS_ALLOC_FLAG_FREEING)
>                                 break;
>                         goto out_agflbp_relse;
>                 }
>
>
> That is, if we fail to fix up the free list, we still go ahead with
> the operation because freeing extents when we are at ENOSPC means
> that, by definition, we don't need to allocate blocks to track the
> new free space because the new free space records will fit inside
> the root btree blocks that are already allocated.
>
> Hence when doing allocation for the free list, we need to fail the
> allocation rather than block on the only remaining free extent in
> the AG. If we are freeing extents, the AGFL not being full is not an
> issue at all. And if we are allocating extents, the transaction
> reservations should have ensured that the AG had sufficient space in
> it to complete the entire operation without deadlocking waiting for
> space.
>
> Either way, I don't see a problem with making sure the AGFL
> allocations just skip busy extents and fail if the only free extents
> are ones this transaction has freed itself.
>

Hmm. In the scenario where *all* free extents in the AG were originally freed
by the current transaction (and hence busy in the transaction), we would need
to be able to recognize this situation and skip invoking
xfs_extent_busy_flush() altogether. Otherwise, xfs_extent_busy_flush() invokes
xfs_log_force() and keeps waiting for busy generation number to change.

Hence, IMHO we would need an extent busy flag (e.g. XFS_EXTENT_BUSY_IN_TRANS)
to correctly determine if all the busy extents are indeed busy in the current
transaction.

--
chandan