Re: [PATCH 1/3] xfs: simplify extent allocation alignment

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Mar 26, 2024 at 04:08:04PM +0000, John Garry wrote:
> On 20/03/2024 04:35, Dave Chinner wrote:
> 
> For some reason I never received this mail. I just noticed it on
> lore.kernel.org today by chance.
> 
> > On Wed, Mar 13, 2024 at 11:03:18AM +0000, John Garry wrote:
> > > On 06/03/2024 05:20, Dave Chinner wrote:
> > > >    		return false;
> > > > diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h
> > > > index 0b956f8b9d5a..aa2c103d98f0 100644
> > > > --- a/fs/xfs/libxfs/xfs_alloc.h
> > > > +++ b/fs/xfs/libxfs/xfs_alloc.h
> > > > @@ -46,7 +46,7 @@ typedef struct xfs_alloc_arg {
> > > >    	xfs_extlen_t	minleft;	/* min blocks must be left after us */
> > > >    	xfs_extlen_t	total;		/* total blocks needed in xaction */
> > > >    	xfs_extlen_t	alignment;	/* align answer to multiple of this */
> > > > -	xfs_extlen_t	minalignslop;	/* slop for minlen+alignment calcs */
> > > > +	xfs_extlen_t	alignslop;	/* slop for alignment calcs */
> > > >    	xfs_agblock_t	min_agbno;	/* set an agbno range for NEAR allocs */
> > > >    	xfs_agblock_t	max_agbno;	/* ... */
> > > >    	xfs_extlen_t	len;		/* output: actual size of extent */
> > > > diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> > > > index 656c95a22f2e..d56c82c07505 100644
> > > > --- a/fs/xfs/libxfs/xfs_bmap.c
> > > > +++ b/fs/xfs/libxfs/xfs_bmap.c
> > > > @@ -3295,6 +3295,10 @@ xfs_bmap_select_minlen(
> > > >    	xfs_extlen_t		blen)
> > > 
> > > Hi Dave,
> > > 
> > > >    {
> > > > +	/* Adjust best length for extent start alignment. */
> > > > +	if (blen > args->alignment)
> > > > +		blen -= args->alignment;
> > > > +
> > > 
> > > This change seems to be causing or exposing some issue, in that I find that
> > > I am being allocated an extent which is aligned to but not a multiple of
> > > args->alignment.
> > 
> > Entirely possible the logic isn't correct ;)
> 
> Out of curiosity, how do you guys normally test all this sort of logic?

With difficulty.

Exercising all the weird corner cases is really hard because the
combinatory explosion that occurs when you have 20 control
parameters, up to 5 different failure fallback strategies,
behavioural variations with delayed allocation, ENOSPC and AGFL
refilling accounting variations, etc, means it's basically
impossible to enumerate and iterate the behaviour space fully.
And then we have filesystem geometry and application concurrency
to consider, too.

All of the behaviours up to this point in time are best effort - we
don't guarantee allocation policy is followed when there is not
enough free space to execute the preferred policy - we slowly fall
back to mechanisms that are further from the policy but more likely
to succeed. i.e. as we approach ENOSPC, the allocation policies get
"looser" - they are less restrictive and more variable and don't
give as good results as when there is plenty of free space for the
allocation policy to make good decisions from.

As such, I only check that macro-level behaviour when there is lots
of free space is largely correct. e.g. by doing something like
copying a kernel tree onto a new filesystem, then checking inode
locality follows directories, block locality follows inodes, large
files are stripe aligned, extent size hint based inodes appear to
have the correct extent sizes, etc.

I then rely on the ENOSPC tests in fstests to find regressions that
might occur when the filesystem is stressed with little free space
available. These are a whole lot better than they used to be; root
cause analysis of ENOSPC corner case bugs has consumed months of my
working life over the past 20 years....

> I found this issue with the small program which I wrote to generate traffic.
> I could not find anything similar.

That's because it's largely impossible to write a test that is
deterministic and works on all possible test configurations. Even
changing the size of the filesystem even slightly can result in
vastly different but still 100% correct allocation
behaviour....

> > > Firstly, in this same scenario, in xfs_alloc_space_available() we calculate
> > > alloc_len = args->minlen + (args->alignment - 1) + args->alignslop = 76 + (4
> > > - 1) + 0 = 79, and then args->maxlen = 79.
> > 
> > That seems OK, we're doing aligned allocation and this is an ENOSPC
> > corner case so the aligned allocation should get rounded down in
> > xfs_alloc_fix_len() or rejected.
> > 
> > One thought I just had is that the args->maxlen adjustment shouldn't
> > be to "available space" - it should probably be set to args->minlen
> > because that's the aligned 'alloc_len' we checked available space
> > against. That would fix this, because then we'd have args->minlen =
> > args->maxlen = 76.
> > 
> > However, that only addresses this specific case, not the general
> > case of xfs_alloc_fix_len() failing to tail align the allocated
> > extent.
> > 
> > > Then xfs_alloc_fix_len() allows
> > > this as args->len == args->maxlen (=79), even though args->prod, mod = 4, 0.
> > 
> > Yeah, that smells wrong.
> 
> Would it be worth adding a debug assert for prod and mod being honoured from
> the allocator? xfs_alloc_fix_len() does have an assert later on and it does
> not help here.

I don't see any value in that because it's not actually a "fatal"
issue. See above about trading off policy strictness for allocation
success.

Again, this force alignment stuff is a fundamental change in this
behaviour - it wants "hard failure" rather than "trade off" and so
there isn't a general case for asserting that allocation must be
mod/prod aligned. Extent size hints are a -hint-, not a requirement,
and I don't want random assert failures in test systems because
debug kernels start treating hints as "must not fail" requirements.

> > I'd suggest that we've never noticed this until now because we
> > have never guaranteed extent alignment. Hence the occasional
> > short/unaligned extent being allocated in dark ENOSPC corners was
> > never an issue for anyone.
> > 
> > However, introducing a new alignment guarantee turns these sorts of
> > latent non-issues into bugs that need to be fixed. i.e. This is
> > exactly the sort of rare corner case behaviour I expected to be
> > flushed out by guaranteeing and then extensively testing allocation
> > alignments.
> > 
> > If you drop the rlen == args->maxlen check from
> > xfs_alloc_space_available(),
> 
> I assume that you mean xfs_alloc_fix_len()

Yes.

> > the problem should go away and the
> > extent gets trimmed to 76 blocks.
> 
> ..if so, then, yes, it does. We end up with this:
> 
>    0: [0..14079]:      42432..56511      0 (42432..56511)   14080
>    1: [14080..14687]:  177344..177951    0 (177344..177951)   608
>    2: [14688..14719]:  350720..350751    1 (171520..171551)    32

Good, that's how it should work. :) 

I'll update the patchset I have with these fixes.

-Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx




[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux