Re: [PATCH] xfs: gut error handling in xfs_trans_unreserve_and_mod_sb()

Dave Chinner <david@xxxxxxxxxxxxx> · Thu, 21 Nov 2019 15:00:23 +1100

On Wed, Nov 20, 2019 at 06:38:36PM -0800, Darrick J. Wong wrote:
> On Thu, Nov 21, 2019 at 11:44:37AM +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@xxxxxxxxxx>
> > 
> > Shaokun Zhang reported that XFs was using substantial CPU time in
> > percpu_count_sum() when running a single threaded benchmark on
> > a high CPU count (128p) machine from xfs_mod_ifree(). The issue
> > is that the filesystem is empty when the benchmark runs, so inode
> > allocation is running with a very low inode free count.
> > 
> > With the percpu counter batching, this means comparisons when the
> > counter is less that 128 * 256 = 32768 use the slow path of adding
> > up all the counters across the CPUs, and this is expensive on high
> > CPU count machines.
> > 
> > The summing in xfs_mod_ifree() is only used to fire an assert if an
> > underrun occurs. The error is ignored by the higher level code.
> > Hence this is really just debug code. Hence we don't need to run it
> > on production kernels, nor do we need such debug checks to return
> > error values just to trigger an assert.
> > 
> > Further, the error handling in xfs_trans_unreserve_and_mod_sb() is
> > largely incorrect - Rolling back the changes in the transaction if
> > only one counter underruns makes all the other counters
> > incorrect.
> 
> Separate change, separate patch...

Yeah, i can split it up, just wanted to see what people thought
about the approach...

> >  	if (idelta) {
> > -		error = xfs_mod_icount(mp, idelta);
> > -		if (error)
> > -			goto out_undo_fdblocks;
> > +		percpu_counter_add_batch(&mp->m_icount, idelta,
> > +					 XFS_ICOUNT_BATCH);
> > +		if (idelta < 0)
> > +			ASSERT(__percpu_counter_compare(&mp->m_icount, 0,
> > +							XFS_ICOUNT_BATCH) >= 0);
> >  	}
> >  
> >  	if (ifreedelta) {
> > -		error = xfs_mod_ifree(mp, ifreedelta);
> > -		if (error)
> > -			goto out_undo_icount;
> > +		percpu_counter_add(&mp->m_ifree, ifreedelta);
> > +		if (ifreedelta < 0)
> > +			ASSERT(percpu_counter_compare(&mp->m_ifree, 0) >= 0);
> 
> Since the whole thing is a debug statement, why not shove everything
> into a single assert?
> 
> ASSERT(ifreedelta >= 0 || percpu_computer_compare() >= 0); ?

I could, but it still needs to be split over two lines and I find
unnecessarily complex ASSERT checks hinder understanding. I can look
at what I wrote at a glance and immediately understand that the
assert is conditional on the counter being negative, but the single
line compound assert form requires me to stop, read and think about
the logic before I can identify that the ifreedelta check is just a
conditional that reduces the failure scope rather than is a failure
condition itself.

I like simple logic with conditional behaviour being obvious via
pattern matching - it makes my brain hurt less because I'm really
good at visual pattern matching and I'm really bad at reading
and writing code.....

> > -out_undo_frextents:
> > -	if (rtxdelta)
> > -		xfs_sb_mod64(&mp->m_sb.sb_frextents, -rtxdelta);
> > -out_undo_ifree:
> > +	xfs_sb_mod64(&mp->m_sb.sb_frextents, rtxdelta);
> 
> As for these bits... why even bother with a three line helper?  I think
> this is clearer about what's going on:
> 
> 	mp->m_sb.sb_frextents += rtxdelta;
> 	mp->m_sb.sb_dblocks += tp->t_dblocks_delta;
> 	...
> 	ASSERT(!rtxdelta || mp->m_sb.sb_frextents >= 0);
> 	ASSERT(!tp->t_dblocks_delta || mp->m_sb.sb.dblocks >= 0);

That required writing more code and adding more logic I'd have to
think about to write, and then think about again every time I read
it.

> I also wonder if we should be shutting down the fs here if the counts
> go negative, but <shrug> that would be yet a different patch. :)

I also thought about that, but all this accounting should have
already been bounds checked. i.e. We should never get an error here,
and I don't think I've *ever* seen an assert in this code fire.
Hence I just went for the dead simple nuke-it-from-orbit patch...

Cheers,

Dave.

-- 
Dave Chinner
david@xxxxxxxxxxxxx