Re: [PATCH 1/1] xfs: reserve less log space when recovering log intent items

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Aug 29, 2023 at 04:11:00PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@xxxxxxxxxx>
> 
> Wengang Wang reports that a customer's system was running a number of
> truncate operations on a filesystem with a very small log.  Contention
> on the reserve heads lead to other threads stalling on smaller updates
> (e.g.  mtime updates) long enough to result in the node being rebooted
> on account of the lack of responsivenes.  The node failed to recover
> because log recovery of an EFI became stuck waiting for a grant of
> reserve space.  From Wengang's report:
> 
> "For the file deletion, log bytes are reserved basing on
> xfs_mount->tr_itruncate which is:
> 
>     tr_logres = 175488,
>     tr_logcount = 2,
>     tr_logflags = XFS_TRANS_PERM_LOG_RES,
> 
> "You see it's a permanent log reservation with two log operations (two
> transactions in rolling mode).  After calculation (xlog_calc_unit_res()
> adds space for various log headers), the final log space needed per
> transaction changes from  175488 to 180208 bytes.  So the total log
> space needed is 360416 bytes (180208 * 2).  [That quantity] of log space
> (360416 bytes) needs to be reserved for both run time inode removing
> (xfs_inactive_truncate()) and EFI recover (xfs_efi_item_recover())."
> 
> In other words, runtime pre-reserves 360K of space in anticipation of
> running a chain of two transactions in which each transaction gets a
> 180K reservation.
> 
> Now that we've allocated the transaction, we delete the bmap mapping,
> log an EFI to free the space, and roll the transaction as part of
> finishing the deferops chain.  Rolling creates a new xfs_trans which
> shares its ticket with the old transaction.  Next, xfs_trans_roll calls
> __xfs_trans_commit with regrant == true, which calls xlog_cil_commit
> with the same regrant parameter.
> 
> xlog_cil_commit calls xfs_log_ticket_regrant, which decrements t_cnt and
> subtracts t_curr_res from the reservation and write heads.
> 
> If the filesystem is fresh and the first transaction only used (say)
> 20K, then t_curr_res will be 160K, and we give that much reservation
> back to the reservation head.  Or if the file is really fragmented and
> the first transaction actually uses 170K, then t_curr_res will be 10K,
> and that's what we give back to the reservation.
> 
> Having done that, we're now headed into the second transaction with an
> EFI and 180K of reservation.  Other threads apparently consumed all the
> reservation for smaller transactions, such as timestamp updates.
> 
> Now let's say the first transaction gets written to disk and we crash
> without ever completing the second transaction.  Now we remount the fs,
> log recovery finds the unfinished EFI, and calls xfs_efi_recover to
> finish the EFI.  However, xfs_efi_recover starts a new tr_itruncate
> tranasction, which asks for 360K log reservation.  This is a lot more
> than the 180K that we had reserved at the time of the crash.  If the
> first EFI to be recovered is also pinning the tail of the log, we will
> be unable to free any space in the log, and recovery livelocks.
> 
> Wengang confirmed this:
> 
> "Now we have the second transaction which has 180208 log bytes reserved
> too. The second transaction is supposed to process intents including
> extent freeing.  With my hacking patch, I blocked the extent freeing 5
> hours. So in that 5 hours, 180208 (NOT 360416) log bytes are reserved.
> 
> "With my test case, other transactions (update timestamps) then happen.
> As my hacking patch pins the journal tail, those timestamp-updating
> transactions finally use up (almost) all the left available log space
> (in memory in on disk).  And finally the on disk (and in memory)
> available log space goes down near to 180208 bytes.  Those 180208 bytes
> are reserved by [the] second (extent-free) transaction [in the chain]."
> 
> Wengang and I noticed that EFI recovery starts a transaction, completes
> one step of the chain, and commits the transaction without completing
> any other steps of the chain.  Those subsequent steps are completed by
> xlog_finish_defer_ops, which allocates yet another transaction to
> finish the rest of the chain.  That transaction gets the same tr_logres
> as the head transaction, but with tr_logcount = 1 to force regranting
> with every roll to avoid livelocks.
> 
> In other words, we already figured this out in commit 929b92f64048d
> ("xfs: xfs_defer_capture should absorb remaining transaction
> reservation"), but should have applied that logic to each intent item's
> recovery function.  For Wengang's case, the xfs_trans_alloc call in the
> EFI recovery function should only be asking for a single transaction's
> worth of log reservation -- 180K, not 360K.
> 
> Quoting Wengang again:
> 
> "With log recovery, during EFI recovery, we use tr_itruncate again to
> reserve two transactions that needs 360416 log bytes.  Reserving 360416
> bytes fails [stalls] because we now only have about 180208 available.
> 
> "Actually during the EFI recover, we only need one transaction to free
> the extents just like the 2nd transaction at RUNTIME.  So it only needs
> to reserve 180208 rather than 360416 bytes.  We have (a bit) more than
> 180208 available log bytes on disk, so [if we decrease the reservation
> to 180K] the reservation goes and the recovery [finishes].  That is to
> say: we can fix the log recover part to fix the issue. We can introduce
> a new xfs_trans_res xfs_mount->tr_ext_free
> 
> {
>   tr_logres = 175488,
>   tr_logcount = 0,
>   tr_logflags = 0,
> }
> 
> "and use tr_ext_free instead of tr_itruncate in EFI recover."
> 
> However, I don't think it quite makes sense to create an entirely new
> transaction reservation type to handle single-stepping during log
> recovery.  Instead, we should copy the transaction reservation
> information in the xfs_mount, change tr_logcount to 1, and pass that
> into xfs_trans_alloc.  We know this won't risk changing the min log size
> computation since we always ask for a fraction of the reservation for
> all known transaction types.
> 
> This looks like it's been lurking in the codebase since commit
> 3d3c8b5222b92, which changed the xfs_trans_reserve call in
> xlog_recover_process_efi to use the tr_logcount in tr_itruncate.
> That changed the EFI recovery transaction from making a
> non-XFS_TRANS_PERM_LOG_RES request for one transaction's worth of log
> space to a XFS_TRANS_PERM_LOG_RES request for two transactions worth.
> 
> Fixes: 3d3c8b5222b92 ("xfs: refactor xfs_trans_reserve() interface")
> Complements: 929b92f64048d ("xfs: xfs_defer_capture should absorb remaining transaction reservation")
> Suggested-by: Wengang Wang <wen.gang.wang@xxxxxxxxxx>
> Cc: Srikanth C S <srikanth.c.s@xxxxxxxxxx>
> [djwong: apply the same transformation to all log intent recovery]
> Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx>

Looks fine.

Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx>
-- 
Dave Chinner
david@xxxxxxxxxxxxx



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux