Re: [PATCH] xfs: explicitly call cond_resched in xfs_itruncate_extents_flags

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jan 10, 2024 at 09:45:01AM -0800, Darrick J. Wong wrote:
> On Wed, Jan 10, 2024 at 03:13:47PM +0800, Jian Wen wrote:
> > From: Jian Wen <wenjianhn@xxxxxxxxx>
> > 
> > Deleting a file with lots of extents may cause a soft lockup if the
> > preemption model is none(CONFIG_PREEMPT_NONE=y or preempt=none is set
> > in the kernel cmdline). Alibaba cloud kernel and Oracle UEK container
> > kernel are affected by the issue, since they select CONFIG_PREEMPT_NONE=y.
> > 
> > Explicitly call cond_resched in xfs_itruncate_extents_flags avoid
> > the below softlockup warning.
> > watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [kworker/0:13:139]
> 
> Wowwee, how many extents does this file have, anyway?

IIRC, extent removal runs at about 50,000 extents/s on a typical
modern CPU on a production kernel build when everything is cached.
23s then equates to a bit above a million extents....

> > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> > index c0f1c89786c2..194381e10472 100644
> > --- a/fs/xfs/xfs_inode.c
> > +++ b/fs/xfs/xfs_inode.c
> > @@ -4,6 +4,7 @@
> >   * All Rights Reserved.
> >   */
> >  #include <linux/iversion.h>
> > +#include <linux/sched.h>
> >  
> >  #include "xfs.h"
> >  #include "xfs_fs.h"
> > @@ -1383,6 +1384,8 @@ xfs_itruncate_extents_flags(
> >  		error = xfs_defer_finish(&tp);
> >  		if (error)
> >  			goto out;
> > +
> > +		cond_resched();
> 
> Is there a particular reason why truncation uses a giant chained
> transaction for all extents (with xfs_defer_finish) instead of, say, one
> chain per extent?

IIUC the question correctly, extent removal for a given range ibeing
truncated has to been seen externally as an atomic change so the
ILOCK has to held across the entire removal operation.

This means it has to be executed as a single rolling transaction as
transaction reservation cannot be done with the ILOCK held and the
atomicity of extent removal prevents the ILOCK from being dropped.

This atomicity only really matters for active operations
like truncate, hole punching, etc when other operations may also be
trying to act on the extent list. However, in inodegc, we are
essentially guaranteed that nobody else will need to access the
extent list whilst we are truncating them away, so it could be done
as a bunch of fine grained transactions.

However, this doesn't solve the problem the CPU never being yeilded
if everything is cached and no locks or transaction reservations are
ever contended. It will still hold the CPU for the same length of
time. Hence I'm not sure changing the way the transactions are run
will really change anything material w.r.t. the issue at hand.

-Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx




[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux