Re: Question: reserve log space at IO time for recover

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On Aug 24, 2023, at 12:28 AM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> 
> On Wed, Aug 23, 2023 at 09:52:34PM -0700, Darrick J. Wong wrote:
>> On Fri, Aug 18, 2023 at 03:25:46AM +0000, Wengang Wang wrote:
>> 
>> Since xfs_efi_item_recover is only performing one step of what could be
>> a chain of deferred updates, it never rolls the transaction that it
>> creates.  It therefore only requires the amount of grant space that
>> you'd get with tr_logcount == 1.  It is therefore a bit silly that we
>> ask for more than that, and in bad cases like this, hang log recovery
>> needlessly.
> 
> But this doesn't fix the whatever problem lead to the recovery not
> having the same full tr_itruncate reservation available as was held
> by the transaction that logged the EFI and was running the extent
> free at the time the system crashed. There should -always- be enough
> transaction reservation space in the journal to reserve space for an
> intent replay if the intent recovery reservation uses the same
> reservation type as runtime.
> 
> Hence I think this is still just a band-aid over whatever went wrong
> at runtime that lead to the log not having enough space for a
> reservation that was clearly held at runtime and hadn't yet used.

Hi Dave,
I was thinking there is a problem previously, but I didn’t find anything wrong.
Per my test and analysis, the changing (adding and subtracting) to/from
xlog->l_reserve_head and xlog->l_write_head matches the changes of
xlog->l_curr_cycle/l_curr_block.  And xlog->l_curr_cycle/l_curr_block matches
the disk log head change.

Actually there is no problem at runtime. What’s wrong is just the recovery part.

As I said previously, it’s not true that we always have the full tr_itruncate reservation
at any time during the process. Let me show the detailed steps:

1) We do full tr_itruncate reservation for the two-transaction chain (half each),
and get the first transaction.

2) work with the first transaction, adding log items to it. Now we have full tr_itruncate
reservation.

3) in transaction roll, we get the second transaction. The second transaction
also has half tr_itruncate reservation.  Now the first transaction and the second
transaction together have full tr_itruncate reservation.

4) still in transaction roll, we commit the first transaction, including EFI log item(s)
    in it. The reservation belonging to the first transaction went in three different
    ways:
   1. some are used to store the log items (and the necessary headers).
   2. a small part is stolen by the ticket on checkpoint context.
   3. the left is released after the log items are committed to CIL.
       xlog_cil_commit() -> xfs_log_ticket_regrant() -> xlog_grant_sub_space().

   Now we have only half tr_itruncate reservation (for the second trans in chain),
   NOT full!

5) Work with the second transaction (it’s slow some how). Now we have half
   tr_itruncate reservation.

6) Parallel transitions come and go, using up other free log bytes. Now we have
   half tr_itruncate reservation. And on disk log, the free log bytes is only a bit
   more than half tr_itruncate reservation, but less than full tr_itruncate reservation.

7) Still work with the second transaction in the chain, freeing the extents, system
    crashed. Now half tr_itruncate reservation.

So you see, since step 5) there is only half tr_itruncate reservation exist.  The idea
that from step 1) to step 7) there should be full tr_itruncate reservation exist is wrong.

You can’t after the first transaction is committed, the reservation for that transaction
is till there, right.

Thanks for your following comments, but they look based on the wrong idea.

Thanks,
Wengang

> 
>> Which is exactly what you theorized above.  Ok, I'm starting to be
>> convinced... :)
> 
> I'm not. :/
> 
>> I wonder, if you add this to the variable declarations in
>> xfs_efi_item_recover (or xfs_efi_recover if we're actually talking about
>> UEK5 here):
>> 
>> struct xfs_trans_resv resv = M_RES(mp)->tr_itruncate;
>> 
>> and then change the xfs_trans_alloc call to:
>> 
>> resv.tr_logcount = 1;
>> error = xfs_trans_alloc(mp, &resv, 0, 0, 0, &tp);
>> 
>> Does that solve the problem?
> 
> It might fix this specific case given the state of the log at the
> time the system crashed. However, it doesn't help the general
> case where whatever went wrong at runtime results in there being
> less than a single tr_itruncate reservation unit available in the
> log.
> 
> One of the recent RH custom cases I looked at had much less than a
> single tr_itruncate unit reservation when it came to recovering the
> EFI, so I'm not just making this up.
> 
> I really don't think this problem is not solvable at recovery time.
> if the runtime hang/crash leaves the EFI pinning the tail of the log
> and something has allowed the log to be overfull (i.e. log space
> used plus granted EFI reservation > physical log space), then we may
> not have any space at all for any sort of reservation during
> recovery.
> 
> I suspect that the cause of this recovery issue is the we require an
> overcommit of the log space accounting before we throttle incoming
> transaction reservations (i.e. the new reservation has to overrun
> before we throttle). I think that the reservation accounting overrun
> detection can race to the first item being placed on the wait list,
> which might allow transactions that should be throttled to escape
> resulting in a temporary log space overcommit. IF we crash at jsut
> the wrong time and an intent-in-progress pins the tail of the log,
> the overcommit situation can lead to recovery not having enough
> space for intent recovery reservations...
> 
> This subtle overcommit is one of the issues I have been trying to
> correct with the byte-based grant head accounting patches (which I'm
> -finally- getting back to). I know that replaying the log from the
> metadump that repoduces the hang with the byte-based grant head
> accounting patchset allowed log recovery to succeed. It has a
> different concept of where the head and tail are and hence how much
> log space is actually available at any given time, and that
> difference was just enough to allow a tr_itruncate reservation to
> succced. It also has different reservation grant overrun detection
> logic, but I'm not 100% sure if it solves this underlying runtime
> overcommit problem or not yet...
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@xxxxxxxxxxxxx





[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux