Re: [PATCH 2/2] xfs: log recovery stage split EFIs with multiple extents

Wengang Wang <wen.gang.wang@xxxxxxxxxx> · Fri, 21 Apr 2023 00:32:15 +0000

> On Apr 20, 2023, at 3:54 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> 
> On Thu, Apr 20, 2023 at 05:10:42PM +0000, Wengang Wang wrote:
>> 
>> 
>>> On Apr 19, 2023, at 5:30 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
>>> 
>>> On Fri, Apr 14, 2023 at 03:58:36PM -0700, Wengang Wang wrote:
>>>> At log recovery stage, we need to split EFIs with multiple extents. For each
>>>> orginal multiple-extent EFI, split it into new EFIs each including one extent
>>>> from the original EFI. By that we avoid deadlock when allocating blocks for
>>>> AGFL waiting for the held busy extents by current transaction to be flushed.
>>>> 
>>>> For the original EFI, the process is
>>>> 1. Create and log new EFIs each covering one extent from the
>>>>   original EFI.
>>>> 2. Don't free extent with the original EFI.
>>>> 3. Log EFD for the original EFI.
>>>>   Make sure we log the new EFIs and original EFD in this order:
>>>>     new EFI 1
>>>>     new EFI 2
>>>>     ...
>>>>     new EFI N
>>>>     original EFD
>>>> The original extents are freed with the new EFIs.
>>> 
>>> We may not have the log space available during recovery to explode a
>>> single EFI out into many EFIs like this. The EFI only had enough
>>> space reserved for processing a single EFI, and exploding a single
>>> EFI out like this requires an individual log reservation for each
>>> new EFI. Hence this de-multiplexing process risks running out of log
>>> space and deadlocking before we've been able to process anything.
>>> 
>> 
>> Oh, yes, got it.
>> 
>>> Hence the only option we really have here is to replicate how CUIs
>>> are handled.  We must process the first extent with a whole EFD and
>>> a new EFI containing the remaining unprocessed extents as defered
>>> operations.  i.e.
>>> 
>>> 1. free the first extent in the original EFI
>>> 2. log an EFD for the original EFI
>>> 3. Add all the remaining extents in the original EFI to an xefi chain
>>> 4. Call xfs_defer_ops_capture_and_commit() to create a new EFI from
>>>  the xefi chain and commit the current transaction.
>>> 
>>> xfs_defer_ops_capture_and_commit() will then add a work item to the
>>> defered list which will come back to the new EFI and process it
>>> through the normal runtime deferred ops intent processing path.
>>> 
>> 
>> So you meant this?
>> 
>> Orig EFI with extent1 extent2 extent3
>> free first extent1
>> Full EFD to orig EFI
>> transaction roll,
>> xfs_defer_ops_capture_and_commit() to take care of extent2 and extent3
> 
> No. We do not need a transaction roll there if we rebuild a new
> xefi list with the remaining extents from the original efi. At that
> point, we call:
> 
> <create transaction>
> <free first extent>
> <create new xefi list>
> <create and log EFD>
> xfs_defer_ops_capture_and_commit()
>   xfs_defer_ops_capture()
>     xfs_defer_create_intents()
>       for each tp->t_dfops
>         ->create intent
>   xfs_extent_free_create_intent()
>     create new EFI
>     walk each xefi and add it to the new intent
>     <captures remaining defered work>
>   xfs_trans_commit()
> 
> i.e. xfs_defer_ops_capture_and_commit() can builds new EFI for us
> from the xefi list as part of defering the work that remains to be
> done. Once it has done that, it queues the remaining work and
> commits the transaction. Hence all we need to do in recovery of the
> first extent is free it, create the xefi list and log the full EFD.
> xfs_defer_ops_capture_and_commit() does the rest.

Yes, xfs_defer_ops_capture_and_commit will commit the transaction so we
don’t need a roll.

> 
>> If so, I don’t think it’s safe.
>> Consider that case that kernel panic happened after the transaction roll,
>> during next log replay, the original EFI has the matching EFD, so this EFI
>> is ignored, but actually extent2 and extent3 are not freed.
>> 
>> If you didn’t mean above, but instead this:
>> 
>> Orig EFI with extent1 extent2 extent3
>> free first extent1
>> New EFI extent2 extent3
>> Full EFD to orig EFI
>> transaction roll,
>> xfs_defer_ops_capture_and_commit() to take care of extent2 and extent3
>> 
>> The problem will comeback to the log space issue, are we ensured we have
>> the space for the new EFI? 
> 
> Yes, because logging the full EFD cancels the original EFI and so it
> no longer consumes log space. hence we can log a new EFI using the
> space the original EFI consumed.
> 

Make sense.

>>> The first patch changed that path to only create intents with a
>>> single extent, so the continued defer ops would then do the right
>>> thing with that change in place. However, I think that we also need
>>> the runtime code to process a single extent per intent per commit in
>>> the same manner as above. i.e. we process the first extent in the
>>> intent, then relog all the remaining unprocessed extents as a single
>>> new intent.
>>> 
>>> Note that this is similar to how we already relog intents to roll
>>> them forward in the journal. The only difference for single extent
>>> processing is that an intent relog duplicates the entire extent list
>>> in the EFD and the new EFI, whilst what we want is the new EFI to
>>> contain all the extents except the one we just processed...
>>> 
>> 
>> The problem to me is that where we place the new EFI, it can’t be after the EFD.
>> I explained why above.
> 
> Yes it can. The only thing that matters is that the EFD and new EFI
> are committed in the same transaction. Remember: transactions are
> atomic change sets - either all the changes in the transaction are
> replayed on recovery, or none of them are.

Yes, will try this way.

thanks,
wengang