> On Apr 20, 2023, at 3:54 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > On Thu, Apr 20, 2023 at 05:10:42PM +0000, Wengang Wang wrote: >> >> >>> On Apr 19, 2023, at 5:30 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote: >>> >>> On Fri, Apr 14, 2023 at 03:58:36PM -0700, Wengang Wang wrote: >>>> At log recovery stage, we need to split EFIs with multiple extents. For each >>>> orginal multiple-extent EFI, split it into new EFIs each including one extent >>>> from the original EFI. By that we avoid deadlock when allocating blocks for >>>> AGFL waiting for the held busy extents by current transaction to be flushed. >>>> >>>> For the original EFI, the process is >>>> 1. Create and log new EFIs each covering one extent from the >>>> original EFI. >>>> 2. Don't free extent with the original EFI. >>>> 3. Log EFD for the original EFI. >>>> Make sure we log the new EFIs and original EFD in this order: >>>> new EFI 1 >>>> new EFI 2 >>>> ... >>>> new EFI N >>>> original EFD >>>> The original extents are freed with the new EFIs. >>> >>> We may not have the log space available during recovery to explode a >>> single EFI out into many EFIs like this. The EFI only had enough >>> space reserved for processing a single EFI, and exploding a single >>> EFI out like this requires an individual log reservation for each >>> new EFI. Hence this de-multiplexing process risks running out of log >>> space and deadlocking before we've been able to process anything. >>> >> >> Oh, yes, got it. >> >>> Hence the only option we really have here is to replicate how CUIs >>> are handled. We must process the first extent with a whole EFD and >>> a new EFI containing the remaining unprocessed extents as defered >>> operations. i.e. >>> >>> 1. free the first extent in the original EFI >>> 2. log an EFD for the original EFI >>> 3. Add all the remaining extents in the original EFI to an xefi chain >>> 4. Call xfs_defer_ops_capture_and_commit() to create a new EFI from >>> the xefi chain and commit the current transaction. >>> >>> xfs_defer_ops_capture_and_commit() will then add a work item to the >>> defered list which will come back to the new EFI and process it >>> through the normal runtime deferred ops intent processing path. >>> >> >> So you meant this? >> >> Orig EFI with extent1 extent2 extent3 >> free first extent1 >> Full EFD to orig EFI >> transaction roll, >> xfs_defer_ops_capture_and_commit() to take care of extent2 and extent3 > > No. We do not need a transaction roll there if we rebuild a new > xefi list with the remaining extents from the original efi. At that > point, we call: > > <create transaction> > <free first extent> > <create new xefi list> > <create and log EFD> > xfs_defer_ops_capture_and_commit() > xfs_defer_ops_capture() > xfs_defer_create_intents() > for each tp->t_dfops > ->create intent > xfs_extent_free_create_intent() > create new EFI > walk each xefi and add it to the new intent > <captures remaining defered work> > xfs_trans_commit() > > i.e. xfs_defer_ops_capture_and_commit() can builds new EFI for us > from the xefi list as part of defering the work that remains to be > done. Once it has done that, it queues the remaining work and > commits the transaction. Hence all we need to do in recovery of the > first extent is free it, create the xefi list and log the full EFD. > xfs_defer_ops_capture_and_commit() does the rest. Yes, xfs_defer_ops_capture_and_commit will commit the transaction so we don’t need a roll. > >> If so, I don’t think it’s safe. >> Consider that case that kernel panic happened after the transaction roll, >> during next log replay, the original EFI has the matching EFD, so this EFI >> is ignored, but actually extent2 and extent3 are not freed. >> >> If you didn’t mean above, but instead this: >> >> Orig EFI with extent1 extent2 extent3 >> free first extent1 >> New EFI extent2 extent3 >> Full EFD to orig EFI >> transaction roll, >> xfs_defer_ops_capture_and_commit() to take care of extent2 and extent3 >> >> The problem will comeback to the log space issue, are we ensured we have >> the space for the new EFI? > > Yes, because logging the full EFD cancels the original EFI and so it > no longer consumes log space. hence we can log a new EFI using the > space the original EFI consumed. > Make sense. >>> The first patch changed that path to only create intents with a >>> single extent, so the continued defer ops would then do the right >>> thing with that change in place. However, I think that we also need >>> the runtime code to process a single extent per intent per commit in >>> the same manner as above. i.e. we process the first extent in the >>> intent, then relog all the remaining unprocessed extents as a single >>> new intent. >>> >>> Note that this is similar to how we already relog intents to roll >>> them forward in the journal. The only difference for single extent >>> processing is that an intent relog duplicates the entire extent list >>> in the EFD and the new EFI, whilst what we want is the new EFI to >>> contain all the extents except the one we just processed... >>> >> >> The problem to me is that where we place the new EFI, it can’t be after the EFD. >> I explained why above. > > Yes it can. The only thing that matters is that the EFD and new EFI > are committed in the same transaction. Remember: transactions are > atomic change sets - either all the changes in the transaction are > replayed on recovery, or none of them are. Yes, will try this way. thanks, wengang