Re: [PATCH 1/2] xfs: IO time one extent per EFI

Wengang Wang <wen.gang.wang@xxxxxxxxxx> · Fri, 21 Apr 2023 00:24:49 +0000

> On Apr 20, 2023, at 4:22 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> 
> On Thu, Apr 20, 2023 at 05:31:14PM +0000, Wengang Wang wrote:
>> 
>> 
>>> On Apr 19, 2023, at 4:55 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
>>> 
>>> On Fri, Apr 14, 2023 at 03:58:35PM -0700, Wengang Wang wrote:
>>>> At IO time, make sure an EFI contains only one extent. Transaction rolling in
>>>> xfs_defer_finish() would commit the busy blocks for previous EFIs. By that we
>>>> avoid holding busy extents (for previously extents in the same EFI) in current
>>>> transaction when allocating blocks for AGFL where we could be otherwise stuck
>>>> waiting the busy extents held by current transaction to be flushed (thus a
>>>> deadlock).
>>>> 
>>>> The log changes
>>>> 1) before change:
>>>> 
>>>>   358 rbbn 13 rec_lsn: 1,12 Oper 5: tid: ee327fd2  len: 48 flags: None
>>>>   359 EFI  nextents:2 id:ffff9fef708ba940
>>>>   360 EFI id=ffff9fef708ba940 (0x21, 7)
>>>>   361 EFI id=ffff9fef708ba940 (0x18, 8)
>>>>   362 -----------------------------------------------------------------
>>>>   363 rbbn 13 rec_lsn: 1,12 Oper 6: tid: ee327fd2  len: 48 flags: None
>>>>   364 EFD  nextents:2 id:ffff9fef708ba940
>>>>   365 EFD id=ffff9fef708ba940 (0x21, 7)
>>>>   366 EFD id=ffff9fef708ba940 (0x18, 8)
>>>> 
>>>> 2) after change:
>>>> 
>>>>   830 rbbn 31 rec_lsn: 1,30 Oper 5: tid: 319f015f  len: 32 flags: None
>>>>   831 EFI  nextents:1 id:ffff9fef708b9b80
>>>>   832 EFI id=ffff9fef708b9b80 (0x21, 7)
>>>>   833 -----------------------------------------------------------------
>>>>   834 rbbn 31 rec_lsn: 1,30 Oper 6: tid: 319f015f  len: 32 flags: None
>>>>   835 EFI  nextents:1 id:ffff9fef708b9d38
>>>>   836 EFI id=ffff9fef708b9d38 (0x18, 8)
>>>>   837 -----------------------------------------------------------------
>>>>   838 rbbn 31 rec_lsn: 1,30 Oper 7: tid: 319f015f  len: 32 flags: None
>>>>   839 EFD  nextents:1 id:ffff9fef708b9b80
>>>>   840 EFD id=ffff9fef708b9b80 (0x21, 7)
>>>>   841 -----------------------------------------------------------------
>>>>   842 rbbn 31 rec_lsn: 1,30 Oper 8: tid: 319f015f  len: 32 flags: None
>>>>   843 EFD  nextents:1 id:ffff9fef708b9d38
>>>>   844 EFD id=ffff9fef708b9d38 (0x18, 8)
>>>> 
>>>> Signed-off-by: Wengang Wang <wen.gang.wang@xxxxxxxxxx>
>>>> ---
>>>> fs/xfs/xfs_extfree_item.h | 9 ++++++++-
>>>> 1 file changed, 8 insertions(+), 1 deletion(-)
>>>> 
>>>> diff --git a/fs/xfs/xfs_extfree_item.h b/fs/xfs/xfs_extfree_item.h
>>>> index da6a5afa607c..ae84d77eaf30 100644
>>>> --- a/fs/xfs/xfs_extfree_item.h
>>>> +++ b/fs/xfs/xfs_extfree_item.h
>>>> @@ -13,8 +13,15 @@ struct kmem_cache;
>>>> 
>>>> /*
>>>> * Max number of extents in fast allocation path.
>>>> + *
>>>> + * At IO time, make sure an EFI contains only one extent. Transaction rolling
>>>> + * in xfs_defer_finish() would commit the busy blocks for previous EFIs. By
>>>> + * that we avoid holding busy extents (for previously extents in the same EFI)
>>>> + * in current transaction when allocating blocks for AGFL where we could be
>>>> + * otherwise stuck waiting the busy extents held by current transaction to be
>>>> + * flushed (thus a deadlock).
>>>> */
>>>> -#define XFS_EFI_MAX_FAST_EXTENTS 16
>>>> +#define XFS_EFI_MAX_FAST_EXTENTS 1
>>> 
>>> IIRC, this doesn't have anything to do with the number of extents an
>>> EFI can hold. All it does is control how the memory for the EFI
>>> allocated.
>> 
>> Yes, it ensures that one EFI contains at most one extent. And because each
>> deferred intent goes with one transaction roll, it would solve the AGFL allocation
>> deadlock (because no busy extents held by the process when it is doing the
>> AGFL allocation).
>> 
>>> Oh, at some point it got overloaded code to define the max items in
>>> a defer ops work item. Ok, I now see why you changed this, but I
>>> don't think this is right way to solve the problem. We can handle
>>> processing multiple extents per EFI just fine, we just need to
>>> update the EFD and roll the transaction on each extent we process,
>>> yes?
>>> 
>> 
>> I am not quite sure what does “update the EFD” mean.
> 
> Historical terminology, see below.
> 
>> My original concern is that (without your updated EFD), the extents in original EFI can be partially done before a crush. And during the recovery, the already done extents would also be replayed and hit error (because the in-place metadata could be flushed since the transaction is rolled.).
>> 
>> Now consider your “update the EFD”, you meant the following?
>> 
>> EFI:  ID:  THISISID1   extent1 extent2
>> free extent extent1
>> EFD: ID: THISISID1  extent1
>> free extent extent2
>> another EFD: ID: THISISID1 (same ID as above)  extent2
> 
> Yes, that's pretty much how multi-extent EFIs used to work, except
> the second and subsequent EFDs recorded all the extents that had
> been freed.  That way recovery could simply find the EFD with the
> highest LSN in the log to determine what part of the EFI had not
> been replayed.

Good to know it.

> 
> We don't do that anymore for partially processed multi-extent
> intents anymore. Instead, we use deferred ops to chain updates. i.e.
> we log a complete intent done items alongside a new intent
> containing the remaining work to be done in the same transaction.
> This cancels the original intent and atomically replaces it with a
> new intent containing the remaining work to be done.
> 
> So when I say "update the EFD" I'm using historic terminology for
> processing and recovering multi-extent intents. In modern terms,
> what I mean is "update the deferred work intent chain to reflect the
> work remaining to be done".

OK. so let’s see the difference between your implementation from mine.
Say, there are three extents to free.

My patch would result in:

EFI 1  with extent1
free extent1
EFD 1 with extent1
transaction roll
EFI 2 with extent2
free extent2
EFD 2 with extent2
transaction roll
EFI 3 with extent3
free extent3
EFD3 with extent3
transaction commit

The EFI/EFD log item pairs should not be written to log as they appear in same checkpoint.

Your idea yields this:

EFI 1 with extent1 extent2 extent3
free extent1
EFI 2 with extent2 extent3
EFD 1 with extent1 extent2 extent3
transaction commit
create transaction
free extent2
EFI 3 with extent3
EFD 2 with extent extent2 extent3
transaction commit
create transaction
free extent3
EFD 3 with extent3
transaction commit.

Your implementation also includes three EFI/EFD pairs, that’s the same as mine.
So actually it’s still one extent per EFI per transaction. Though you are not changing
XFS_EFI_MAX_FAST_EXTENTS.

And your implementation may use more log space than mine in case the EFI
(and EFD pair) can’t be cancelled.  :D

The only difference if that you use transaction commit and I am using transaction roll
which is not safe as you said. 

Is my understanding correct?

One question is that if only one EFI is safe per transaction, how we ensure that there
is only one EFI per transaction in case there are more than 16
(current XFS_EFI_MAX_FAST_EXTENTS) extents to free in current code?

thanks,
wengang