Re: [PATCH 2/2] xfs: log recovery stage split EFIs with multiple extents

Dave Chinner <david@xxxxxxxxxxxxx> · Fri, 21 Apr 2023 08:54:18 +1000

On Thu, Apr 20, 2023 at 05:10:42PM +0000, Wengang Wang wrote:
> 
> 
> > On Apr 19, 2023, at 5:30 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> > 
> > On Fri, Apr 14, 2023 at 03:58:36PM -0700, Wengang Wang wrote:
> >> At log recovery stage, we need to split EFIs with multiple extents. For each
> >> orginal multiple-extent EFI, split it into new EFIs each including one extent
> >> from the original EFI. By that we avoid deadlock when allocating blocks for
> >> AGFL waiting for the held busy extents by current transaction to be flushed.
> >> 
> >> For the original EFI, the process is
> >> 1. Create and log new EFIs each covering one extent from the
> >>    original EFI.
> >> 2. Don't free extent with the original EFI.
> >> 3. Log EFD for the original EFI.
> >>    Make sure we log the new EFIs and original EFD in this order:
> >>      new EFI 1
> >>      new EFI 2
> >>      ...
> >>      new EFI N
> >>      original EFD
> >> The original extents are freed with the new EFIs.
> > 
> > We may not have the log space available during recovery to explode a
> > single EFI out into many EFIs like this. The EFI only had enough
> > space reserved for processing a single EFI, and exploding a single
> > EFI out like this requires an individual log reservation for each
> > new EFI. Hence this de-multiplexing process risks running out of log
> > space and deadlocking before we've been able to process anything.
> > 
> 
> Oh, yes, got it.
> 
> > Hence the only option we really have here is to replicate how CUIs
> > are handled.  We must process the first extent with a whole EFD and
> > a new EFI containing the remaining unprocessed extents as defered
> > operations.  i.e.
> > 
> > 1. free the first extent in the original EFI
> > 2. log an EFD for the original EFI
> > 3. Add all the remaining extents in the original EFI to an xefi chain
> > 4. Call xfs_defer_ops_capture_and_commit() to create a new EFI from
> >   the xefi chain and commit the current transaction.
> > 
> > xfs_defer_ops_capture_and_commit() will then add a work item to the
> > defered list which will come back to the new EFI and process it
> > through the normal runtime deferred ops intent processing path.
> > 
> 
> So you meant this?
> 
> Orig EFI with extent1 extent2 extent3
> free first extent1
> Full EFD to orig EFI
> transaction roll,
> xfs_defer_ops_capture_and_commit() to take care of extent2 and extent3

No. We do not need a transaction roll there if we rebuild a new
xefi list with the remaining extents from the original efi. At that
point, we call:

	<create transaction>
	<free first extent>
	<create new xefi list>
	<create and log EFD>
	xfs_defer_ops_capture_and_commit()
	  xfs_defer_ops_capture()
	    xfs_defer_create_intents()
	      for each tp->t_dfops
	        ->create intent
		  xfs_extent_free_create_intent()
		    create new EFI
		    walk each xefi and add it to the new intent
	    <captures remaining defered work>
	  xfs_trans_commit()

i.e. xfs_defer_ops_capture_and_commit() can builds new EFI for us
from the xefi list as part of defering the work that remains to be
done. Once it has done that, it queues the remaining work and
commits the transaction. Hence all we need to do in recovery of the
first extent is free it, create the xefi list and log the full EFD.
xfs_defer_ops_capture_and_commit() does the rest.

> If so, I don’t think it’s safe.
> Consider that case that kernel panic happened after the transaction roll,
> during next log replay, the original EFI has the matching EFD, so this EFI
> is ignored, but actually extent2 and extent3 are not freed.
> 
> If you didn’t mean above, but instead this:
> 
> Orig EFI with extent1 extent2 extent3
> free first extent1
> New EFI extent2 extent3
> Full EFD to orig EFI
> transaction roll,
> xfs_defer_ops_capture_and_commit() to take care of extent2 and extent3
> 
> The problem will comeback to the log space issue, are we ensured we have
> the space for the new EFI? 

Yes, because logging the full EFD cancels the original EFI and so it
no longer consumes log space. hence we can log a new EFI using the
space the original EFI consumed.

> > The first patch changed that path to only create intents with a
> > single extent, so the continued defer ops would then do the right
> > thing with that change in place. However, I think that we also need
> > the runtime code to process a single extent per intent per commit in
> > the same manner as above. i.e. we process the first extent in the
> > intent, then relog all the remaining unprocessed extents as a single
> > new intent.
> > 
> > Note that this is similar to how we already relog intents to roll
> > them forward in the journal. The only difference for single extent
> > processing is that an intent relog duplicates the entire extent list
> > in the EFD and the new EFI, whilst what we want is the new EFI to
> > contain all the extents except the one we just processed...
> > 
> 
> The problem to me is that where we place the new EFI, it can’t be after the EFD.
> I explained why above.

Yes it can. The only thing that matters is that the EFD and new EFI
are committed in the same transaction. Remember: transactions are
atomic change sets - either all the changes in the transaction are
replayed on recovery, or none of them are.

-Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx