Re: [PATCH] xfs: account extra freespace btree splits for multiple allocations

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Dave,

Sorry for some delay again (I was applying some workaround to our
production to mitigate this and some other stuffs around there.)

On Tue, Nov 22, 2022 at 12:17:57PM +1100, Dave Chinner wrote:
> On Mon, Nov 21, 2022 at 06:32:48AM +0800, Gao Xiang wrote:
> > On Wed, Nov 16, 2022 at 01:51:06PM +1100, Dave Chinner wrote:
> > > On Tue, Nov 15, 2022 at 03:57:54PM +0800, Gao Xiang wrote:
> > > > On Sun, Nov 13, 2022 at 08:45:45AM +1100, Dave Chinner wrote:
> > > ..... because the assumption is that AGFL blocks come from free
> > > space and so when we are at ENOSPC bno/cnt btrees *do no require
> > > splits* so will not consume extra space. Hence allocation at ENOSPC
> > > doesn't need to take into account AGFL block usage because the AGFL
> > > will not be consumed.
> > 
> > I noticed another thing.  I think the reason why the first allocation
> > in this case caused a cntbt split is that Zorro's workload set
> > sunit/swidth.  Therefore, due to align requirement, I assume it
> > called xfs_alloc_fix_len() to fix up agbno and len.
> > 
> > Actually I found our workload has the similar sunit/swidth setup and
> > I am thinking about this these days.  One thing is that why we need
> > freespace btree splits when consuming free blocks.
> 
> stripe alignment does not affect AGFL behaviour, not free space
> btree allocation requirements. stripe alignment only affects the
> initial user data allocations from xfs_bmap_btalloc() where the
> stripe alignment variables (e.g. min align, align slop, etc). None
> of these parameters are set for AGFL or btree block allocations, so
> they ignore all alignment constraints.
> 
> > Another thing is that considering we're near ENOSPC, and bno/cnt
> > btrees has only a few records.  If we allocates without alignment,
> > I also think bno/cnt btrees do no require splits so it will not
> > consume extra space since the overall extents only decrease.
> > 
> > Yet how about allocating with alignment? It seems that it can add
> > another free extent in order to fulfill the alignment.  I'm not sure
> > if it can cause some corner cases here.
> 
> Alignment never requires an extra allocation - it reserves extra
> space to select a larger freespace that an aligned extent can be
> carved out of. If an aligned extent cannot be found, we fall back to
> unaligned allocation....

As we talked on IRC, I skip this part now.  In brief, I think stripe
allocation can make it reproduce more frequently.

> 
> > > Similarly, if we have enough free space records to split a free
> > > space btree block, we have enough free space to refill the AGFL
> > > multiple times and we don't have to reserve space for them.
> > > 
> > > IOWs, the allocation code has, historically, never had to care about
> > > AGFL refilling when the AG is near ENOSPC as nothing will consume
> > > AGFL blocks when the AG is near empty.
> > > 
> > > This is the design assumption that AG reservations broke. This is
> > > why I'm asking you to look into taking blocks that are supposedly
> > > reserved for the AGFL, because as reserved space is used, the
> > > bno/cnt btrees will shrink and return those blocks to free space and
> > > hence they are still available for reserved allocations to use as
> > > the real physical ENOSPC condition approaches.
> > 
> > Yeah, intuitively I also imagine like what you said.  However, does it
> > have strictly monotonicity, especially with stripe alignment setup?
> > 
> > > 
> > > The more I look at this, the more I think overall answer to this
> > > problem is to allow AGFL refilling to ignore AG reserves rather than
> > > causing ENOSPC....
> > 
> > Could you give more details how to fit this case?  Also we have a
> > short talk last Wednesday (sorry that I had an urgent thing at that
> > time).  You mentioned "the simple solution is something like
> > min(ag reservation blocks, needed AGFL blocks) instead of accounting
> > them separately", could you give an example for this case as well?
> 
> Go read the head comment in xfs_ag_resv.c. Specifically, this bit:
> 
>  * Reserved blocks can be managed by passing one of the enum xfs_ag_resv_type
>  * values via struct xfs_alloc_arg or directly to the xfs_free_extent
>  * function.  It might seem a little funny to maintain a reservoir of blocks
>  * to feed another reservoir, but the AGFL only holds enough blocks to get
>  * through the next transaction.  The per-AG reservation is to ensure (we
>  * hope) that each AG never runs out of blocks.  Each data structure wanting
>  * to use the reservation system should update ask/used in xfs_ag_resv_init.
> 
> This was originally written with RMAPBT updates in mind (rmap btree
> blocks come from the AGFL, just like the bno/cnt btrees). SInce this
> was written, AGFL blocks have been carved out of this reservation by
> XFS_AG_RESV_AGFL, and so this reservation space no longer reserves
> or accounts for refilling the AGFL for non-RMAPBT operations.
> 
> My point is, however, that the reservation space was intended for
> ensuring the AGFL could be refilled without triggering ENOSPC in
> certain circumstances. And here we are with a situation where
> refilling the AGFL triggers ENOSPC because of the reservation
> space.
> 
> The "available" calculation in xfs_alloc_space_available() does:
> 
> 	available = (int)(pag->pagf_freeblks + agflcount -
>                           reservation - min_free - args->minleft);
> 
> 
> Which is effectively:
> 
> 	available =	(free space) -
> 			(reserved space) -
> 			(minimum AG requires to be left free) -
> 			(minimum allocation requires to left free)
> 
> But what we have to consider is that three of these parameters have
> a component of "AGFL blocks" in them:
> 
> free space	= indexed free space + current AGFL blocks
> reserved space	= space reserved for future AGFL block allocation
> minimum AG free = AGFL blocks needed for this allocation
> 
> Looked at a different way (as a timeline):
> 
> free space	= Previously allocated AGFL blocks
> reserved space  = future allocation AGFL block pool
> minimum free	= present allocation AGFL needs
> 
> So the problem you are seeing is that on the second allocation of a
> chain, the AGFL blocks previously allocated by the initial
> allocation in the chain are not sufficient for present AGFL needs
> and we do not allow the present allocation to use space from the
> future AGFL block pool to fill the AGFL....

I agree with your word here.

> 
> Also, we need to keep in mind that the initial allocation uses
> args->resv = XFS_AG_RESV_NONE, so the reservation space returned for
> the initial allocation is the full metadata (finobt) + reserved
> (RMAPBT) reservation that is being made.
> 
> The second allocation in the chain (where minleft is zero) really
> needs to have a reserve pool for AGFL filling. But we don't have a
> reserve pool for general AGFL allocations anymore, and this looks
> like we need it. i.e. instead of XFS_AG_RESV_AGFL just being used to
> avoid accounting AGFL block usage, perhaps it should actually manage
> a reserve pool for ensuring the AGFL can be refilled near ENOSPC
> (due to outstanding RMAP/FINOBT reservations) in a single
> transaction allocation chain....
> 
> i.e. so long as the pool has more blocks in it than the current
> allocation requires to refill the AGFL, the allocation can
> proceed...

Ok, let me think out a way to use ag_resv framework to resolve this
issue now.

Thanks,
Gao Xiang




[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux