Re: [PATCH] xfs: Don't block in xfs_extent_busy_flush

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On May 22, 2023, at 3:40 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> 
> On Mon, May 22, 2023 at 06:20:11PM +0000, Wengang Wang wrote:
>>> On May 21, 2023, at 6:17 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
>>> On Fri, May 19, 2023 at 10:18:29AM -0700, Wengang Wang wrote:
>>>> diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
>>>> index 011b50469301..3c5a9e9952ec 100644
>>>> --- a/fs/xfs/xfs_extfree_item.c
>>>> +++ b/fs/xfs/xfs_extfree_item.c
>>>> @@ -336,6 +336,25 @@ xfs_trans_get_efd(
>>>> return efdp;
>>>> }
>>>> 
>>>> +/*
>>>> + * Fill the EFD with all extents from the EFI and set the counter.
>>>> + * Note: the EFD should comtain at least one extents already.
>>>> + */
>>>> +static void xfs_fill_efd_with_efi(struct xfs_efd_log_item *efdp)
>>>> +{
>>>> + struct xfs_efi_log_item *efip = efdp->efd_efip;
>>>> + uint                    i;
>>>> +
>>>> + if (efdp->efd_next_extent == efip->efi_format.efi_nextents)
>>>> + return;
>>>> +
>>>> + for (i = 0; i < efip->efi_format.efi_nextents; i++) {
>>>> +        efdp->efd_format.efd_extents[i] =
>>>> +        efip->efi_format.efi_extents[i];
>>>> + }
>>>> + efdp->efd_next_extent = efip->efi_format.efi_nextents;
>>>> +}
>>>> +
>>> 
>>> Ok, but it doesn't dirty the transaction or the EFD, which means....
>> 
>> Actually EAGAIN shouldn’t happen with the first record in EFIs because
>> the trans->t_busy is empty in AGFL block allocation for the first record.
>> So the dirtying work should already done with the first one.
> 
> You're assuming that the only thing we are going to want to return
> -EAGAIN for freeing attamps for is busy extents. Being able to
> restart btree operations by "commit and retry" opens up a
> a whole new set of performance optimisations we can make to the
> btree code.
> 
> IOWs, I want this functionality to be generic in nature, not
> tailored specifically to one situation where an -EAGAIN needs to be
> returned to trigger a commit an retry.

Yes, I assumed that because I didn’t see relevant EAGAIN handlers
in existing code. It’s reasonable to make it generic for existing or planed
EAGAINs.

> 
>>>> @@ -369,6 +388,10 @@ xfs_trans_free_extent(
>>>> error = __xfs_free_extent(tp, xefi->xefi_startblock,
>>>> xefi->xefi_blockcount, &oinfo, XFS_AG_RESV_NONE,
>>>> xefi->xefi_flags & XFS_EFI_SKIP_DISCARD);
>>>> + if (error == -EAGAIN) {
>>>> + xfs_fill_efd_with_efi(efdp);
>>>> + return error;
>>>> + }
>>> 
>>> .... this is incorrectly placed.
>>> 
>>> The very next lines say:
>>> 
>>>> /*
>>>> * Mark the transaction dirty, even on error. This ensures the
>>>> * transaction is aborted, which:
>>> 
>>> i.e. we have to make the transaction and EFD log item dirty even if
>>> we have an error. In this case, the error is not fatal, but we still
>>> have to ensure that we commit the EFD when we roll the transaction.
>>> Hence the transaction and EFD still need to be dirtied on -EAGAIN...
>> 
>> see above.
> 
> See above :)
> 
>>>> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
>>>> index 322eb2ee6c55..00bfe9683fa8 100644
>>>> --- a/fs/xfs/xfs_log_recover.c
>>>> +++ b/fs/xfs/xfs_log_recover.c
>>>> @@ -2540,30 +2540,27 @@ xlog_recover_process_intents(
>>>> struct xfs_log_item *lip;
>>>> struct xfs_ail *ailp;
>>>> int error = 0;
>>>> -#if defined(DEBUG) || defined(XFS_WARN)
>>>> - xfs_lsn_t last_lsn;
>>>> -#endif
>>>> + xfs_lsn_t threshold_lsn;
>>>> 
>>>> ailp = log->l_ailp;
>>>> + threshold_lsn = xfs_ail_max_lsn(ailp);
>>>> spin_lock(&ailp->ail_lock);
>>>> -#if defined(DEBUG) || defined(XFS_WARN)
>>>> - last_lsn = xlog_assign_lsn(log->l_curr_cycle, log->l_curr_block);
>>> 
>>> xfs_ail_max_lsn() and l_curr_cycle/l_curr_block are not the same
>>> thing.  max_lsn points to the lsn of the last entry in the AIL (in
>>> memory state), whilst curr_cycle/block points to the current
>>> physical location of the log head in the on-disk journal.
>>> 
>> 
>> Yes, I intended to use the lsn of the last entry in the AIL.
> 
> Again, they are not the same thing: using the last entry in the
> AIL here is incorrect. We want to replay all the items in the AIL
> that were active in the log, not up to the last item in the AIL. The
> actively recovered log region ends at last_lsn as per above, whilst
> xfs_ail_max_lsn() is not guaranteed to be less than last_lsn before
> we start walking it.

OK, got it.

> 
>> For the problem with xlog_recover_process_intents(), please see my reply to
>> Darrick. On seeing the problem, my first try was to use “last_lsn” to stop
>> the iteration but that didn’t help.  last_lsn was found quite bigger than even
>> the new EFI lsn. While use xfs_ail_max_lsn() it solved the problem.
> 
> In what case are we queuing a *new* intent into the AIL that has a
> LSN less than xlog_assign_lsn(log->l_curr_cycle, log->l_curr_block)?
> If we are doing that *anywhere*, then we have a likely journal
> corruption bug in the code because it indicates we committed that
> item to the journal over something in the log we are currently
> replaying.

I made a mistake to say last_lsn is quite bigger than the new EFI lsn,
it’s actually much bigger than the max_lsn (because cycle increased).
The lsn of the new EFI is exactly same as last_lsn.

> 
>>> In this case, we can't use in-memory state to determine where to
>>> stop the initial intent replay - recovery of other items may have
>>> inserted new intents beyond the end of the physical region being
>>> recovered, in which case using xfs_ail_max_lsn() will result in
>>> incorrect behaviour here.
>> 
>> Yes, this patch is one of those (if some exist) introduce new intents (EFIs here).
>> We add the new intents to the transaction first (xfs_defer_create_intent()), add
>> the deferred operations to ‘capture_list’. And finally the deferred options in
>> ‘capture_list’ is processed after the intent-iteration on the AIL.
> 
> The changes made in that transaction, including the newly logged
> EFI, get committed before the rest of the work gets deferred via
> xfs_defer_ops_capture_and_commit(). That commits the new efi (along
> with all the changes that have already been made in the transaction)
> to the CIL, and eventually the journal checkpoints and the new EFI
> gets inserted into the AIL at the LSN of the checkpoint.
> 
> The LSN of the checkpoint is curr_cycle/block - the log head -
> because that's where the start record of the checkpoint is
> physically written.  As each iclog is filled, the log head moves
> forward - it always points at the location that the next journal
> write will be written to. At the end of a checkpoint, the LSN of the
> start record is used for AIL insertion.
> 

Thanks for explanation!

> Hence if a new log item created by recovery has a LSN less than
> last_lsn, then we have a serious bug somewhere that needs to be
> found and fixed. The use of last_lsn tells us something has gone
> badly wrong during recovery, the use of xfs_ail_max_lsn() removes
> the detection of the issue and now we don't know that something has
> gone badly wrong...
> 

I made a mistake.. the (first) new EFI lsn is the same as last_lsn, sorry
for confusing.

>> For existing other cases (if there are) where new intents are added,
>> they don’t use the capture_list for delayed operations? Do you have example then? 
>> if so I think we should follow their way instead of adding the defer operations
>> (but reply on the intents on AIL).
> 
> All of the intent recovery stuff uses
> xfs_defer_ops_capture_and_commit() to commit the intent being
> replayed and cause all further new intent processing in that chain
> to be defered until after all the intents recovered from the journal
> have been iterated. All those new intents end up in the AIL at a LSN
> index >= last_lsn.

Yes. So we break the AIL iteration on seeing an intent with lsn equal to
or bigger than last_lsn and skip the iop_recover() for that item?
and shall we put this change to another separated patch as it is to fix
an existing problem (not introduced by my patch)?

thanks,
wengang




[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux