Re: [PATCH] xfs: gut error handling in xfs_trans_unreserve_and_mod_sb()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Nov 21, 2019 at 03:00:23PM +1100, Dave Chinner wrote:
> On Wed, Nov 20, 2019 at 06:38:36PM -0800, Darrick J. Wong wrote:
> > On Thu, Nov 21, 2019 at 11:44:37AM +1100, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@xxxxxxxxxx>
> > > 
> > > Shaokun Zhang reported that XFs was using substantial CPU time in
> > > percpu_count_sum() when running a single threaded benchmark on
> > > a high CPU count (128p) machine from xfs_mod_ifree(). The issue
> > > is that the filesystem is empty when the benchmark runs, so inode
> > > allocation is running with a very low inode free count.
> > > 
> > > With the percpu counter batching, this means comparisons when the
> > > counter is less that 128 * 256 = 32768 use the slow path of adding
> > > up all the counters across the CPUs, and this is expensive on high
> > > CPU count machines.
> > > 
> > > The summing in xfs_mod_ifree() is only used to fire an assert if an
> > > underrun occurs. The error is ignored by the higher level code.
> > > Hence this is really just debug code. Hence we don't need to run it
> > > on production kernels, nor do we need such debug checks to return
> > > error values just to trigger an assert.
> > > 
> > > Further, the error handling in xfs_trans_unreserve_and_mod_sb() is
> > > largely incorrect - Rolling back the changes in the transaction if
> > > only one counter underruns makes all the other counters
> > > incorrect.
> > 
> > Separate change, separate patch...
> 
> Yeah, i can split it up, just wanted to see what people thought
> about the approach...

<nod>

> > >  	if (idelta) {
> > > -		error = xfs_mod_icount(mp, idelta);
> > > -		if (error)
> > > -			goto out_undo_fdblocks;
> > > +		percpu_counter_add_batch(&mp->m_icount, idelta,
> > > +					 XFS_ICOUNT_BATCH);
> > > +		if (idelta < 0)
> > > +			ASSERT(__percpu_counter_compare(&mp->m_icount, 0,
> > > +							XFS_ICOUNT_BATCH) >= 0);
> > >  	}
> > >  
> > >  	if (ifreedelta) {
> > > -		error = xfs_mod_ifree(mp, ifreedelta);
> > > -		if (error)
> > > -			goto out_undo_icount;
> > > +		percpu_counter_add(&mp->m_ifree, ifreedelta);
> > > +		if (ifreedelta < 0)
> > > +			ASSERT(percpu_counter_compare(&mp->m_ifree, 0) >= 0);
> > 
> > Since the whole thing is a debug statement, why not shove everything
> > into a single assert?
> > 
> > ASSERT(ifreedelta >= 0 || percpu_computer_compare() >= 0); ?
> 
> I could, but it still needs to be split over two lines and I find
> unnecessarily complex ASSERT checks hinder understanding. I can look
> at what I wrote at a glance and immediately understand that the
> assert is conditional on the counter being negative, but the single
> line compound assert form requires me to stop, read and think about
> the logic before I can identify that the ifreedelta check is just a
> conditional that reduces the failure scope rather than is a failure
> condition itself.
> 
> I like simple logic with conditional behaviour being obvious via
> pattern matching - it makes my brain hurt less because I'm really
> good at visual pattern matching and I'm really bad at reading
> and writing code.....

Fair enough.  I'm not a paragon of correctness wrt. boolean logic either.
I'm ok if you leave it as is.

> > > -out_undo_frextents:
> > > -	if (rtxdelta)
> > > -		xfs_sb_mod64(&mp->m_sb.sb_frextents, -rtxdelta);
> > > -out_undo_ifree:
> > > +	xfs_sb_mod64(&mp->m_sb.sb_frextents, rtxdelta);
> > 
> > As for these bits... why even bother with a three line helper?  I think
> > this is clearer about what's going on:
> > 
> > 	mp->m_sb.sb_frextents += rtxdelta;
> > 	mp->m_sb.sb_dblocks += tp->t_dblocks_delta;
> > 	...
> > 	ASSERT(!rtxdelta || mp->m_sb.sb_frextents >= 0);
> > 	ASSERT(!tp->t_dblocks_delta || mp->m_sb.sb.dblocks >= 0);
> 
> That required writing more code and adding more logic I'd have to
> think about to write, and then think about again every time I read
> it.

OTOH it's an opportunity to make the asserts more useful, because right
now they just say:

XFS (sda): Assertion failed: counter >= 0, file: xfs_trans.c, line XXX

*Which* counter just tripped the assert?  At least it could say:

XFS (sda): Assertion failed: mp->m_sb.sb_dblocks >= 0, file: xfs_trans.c, line XXX

> > I also wonder if we should be shutting down the fs here if the counts
> > go negative, but <shrug> that would be yet a different patch. :)
> 
> I also thought about that, but all this accounting should have
> already been bounds checked. i.e. We should never get an error here,
> and I don't think I've *ever* seen an assert in this code fire.
> Hence I just went for the dead simple nuke-it-from-orbit patch...

<nod> I have, but only after seriously fubaring some code. :)

--D

> Cheers,
> 
> Dave.
> 
> -- 
> Dave Chinner
> david@xxxxxxxxxxxxx



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux