Re: [PATCH 031/119] xfs: rmap btree requires more reserved free space

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jul 13, 2016 at 02:32:17PM -0400, Brian Foster wrote:
> On Wed, Jul 13, 2016 at 09:50:08AM -0700, Darrick J. Wong wrote:
> > On Fri, Jul 08, 2016 at 09:21:55AM -0400, Brian Foster wrote:
> > > >  /*
> > > > + * In order to avoid ENOSPC-related deadlock caused by out-of-order locking of
> > > > + * AGF buffer (PV 947395), we place constraints on the relationship among
> > > > + * actual allocations for data blocks, freelist blocks, and potential file data
> > > > + * bmap btree blocks. However, these restrictions may result in no actual space
> > > > + * allocated for a delayed extent, for example, a data block in a certain AG is
> > > > + * allocated but there is no additional block for the additional bmap btree
> > > > + * block due to a split of the bmap btree of the file. The result of this may
> > > > + * lead to an infinite loop when the file gets flushed to disk and all delayed
> > > > + * extents need to be actually allocated. To get around this, we explicitly set
> > > > + * aside a few blocks which will not be reserved in delayed allocation.
> > > > + *
> > > > + * The minimum number of needed freelist blocks is 4 fsbs _per AG_ when we are
> > > > + * not using rmap btrees a potential split of file's bmap btree requires 1 fsb,
> > > > + * so we set the number of set-aside blocks to 4 + 4*agcount when not using
> > > > + * rmap btrees.
> > > > + *
> > > 
> > > That's a bit wordy.
> > 
> > Yikes, that whole thing is a single sentence!
> > 
> > One thing I'm not really sure about is how "a potential split of file's bmap
> > btree requires 1 fsb" seems to translate to 4 in the actual formula.  I'd
> > have thought it would be m_bm_maxlevels or something... not just 4.
> > 
> 
> I'm not sure about that either, tbh.

So, a trip down memory lane. 

<wavy line dissolve>

Back in 2006, I fixed a bug that changed XFS_ALLOC_SET_ASIDE from
a  fixed value of 8 blocks to 4 blocks + 4 AGFL blocks per AG in
commit 4be536de ("[XFS] Prevent free space
oversubscription and xfssyncd looping."). The original value of
8 was for 4 blocks for the bmbt split, and 4 blocks from the current
AG for the AGFL (commit message explains the reason this was a
problem (Yay for writing good commit messages 10 years ago!)). The
oringinal comment text was:

- * reserved in delayed allocation. Considering the minimum number of
- * needed freelist blocks is 4 fsbs, a potential split of file's bmap
- * btree requires 1 fsb, so we set the number of set-aside blocks to 8.
-*/

So we need to go back further. We have an obvious git log search
target in the comment (PV#947395), and that points to:

commit d210a28cd851082cec9b282443f8cc0e6fc09830
Author: Yingping Lu <yingping@xxxxxxx>
Date:   Fri Jun 9 14:55:18 2006 +1000

    [XFS] In actual allocation of file system blocks and freeing extents, the
    transaction within each such operation may involve multiple locking of AGF
    buffer. While the freeing extent function has sorted the extents based on
    AGF number before entering into transaction, however, when the file system
    space is very limited, the allocation of space would try every AGF to get
    space allocated, this could potentially cause out-of-order locking, thus
    deadlock could happen. This fix mitigates the scarce space for allocation
    by setting aside a few blocks without reservation, and avoid deadlock by
    maintaining ascending order of AGF locking.
    
    SGI-PV: 947395
    SGI-Modid: xfs-linux-melb:xfs-kern:210801a
    
    Signed-off-by: Yingping Lu <yingping@xxxxxxx>
    Signed-off-by: Nathan Scott <nathans@xxxxxxx>

Which tells us nothing about why 1 fsb for the bmbt split was
actually reserved as 4fsbs. IIRC, I ended up having to find and fix
the problem because Yingping left SGI soon after this fix was made,
and at the time nobody understood or could work out why that was
done. It worked, however, so we left it that way, and just fixed the
per-ag reservation problem this bug fix had.


> > /* 
> >  * When rmap is disabled, we need to reserve 4 fsbs _per AG_ for the freelist
> >  * and 4 more to handle a potential split of the file's bmap btree.

As such, I'm not sure that is any more correct than the original
comment.

Looking back on this now with 10 years more time working on XFS, my
suspicion is that a single level bmap btree split requires 1
block to be allocated, but that allocation will call
xfs_alloc_fix_freelist() to refill the freelist to the minimum
(which is 4 blocks), and so we need at least 4 blocks for the
allocation to succeed (4 blocks for the freelist fill, and if we are
at ENOSPC then the bmap btree block will be allocated from the
AGFL).

Whether the value of 4 is correct or not for this purpose is just a
guess based on the per-ag AGFL requirements, so my only comment
right now is: it's worked for 10 years, so let's not change it until
there's at least some evidence that is it wrong.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux