Re: block allocations for the refcount btree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Feb 12, 2016 at 11:10:46AM -0800, Christoph Hellwig wrote:
> On Thu, Feb 11, 2016 at 08:40:58AM +1100, Dave Chinner wrote:
> > I run into that from time to time (maybe once a month) on a vanilla
> > kernel.
> > 
> > IIRC, the problem is the delayed allocation extent split runs out of
> > it's reserved block count if you split it enough times. The case
> > I've seen is that  the indlen calculated in xfs_bmap_worst_indlen()
> > ends up too small for a subsequent allocation after we've called
> > xfs_bmap_del_extent() to delete the middle of a delalloc extent too
> > many times.
> > 
> > Brian had some patches that attempted to solve it - we may have
> > simply dropped the ball on this (again).
> > 
> > http://oss.sgi.com/archives/xfs/2014-09/msg00337.html
> 
> I'm pretty sure that is a separate issue.  With the refcount btree we may
> allocate an extent (or rather just a single block) in xfs_alloc_ag_vextent
> as called from xfs_refcountbt_alloc_block.  The reservation helps us to
> ensure this block is always available, but we still need to account for
> that in xfs_trans_reserve(), which we currently don't do for itruncate
> transactions.  

One side effect of the per-ag block reservation code is that it reserves all
the blocks that the refcountbt will ever need at mount time, which includes
decreasing the incore fdblocks counter at mount and putting it back at unmount
time.  This /should/ eliminate the need for reserving blocks in truncate
transactions, though clearly this isn't being done correctly.  The AGresv code
as of a couple weeks ago tried to monkey with the transaction block reservation
counts after the allocator does its usual accounting, which as you observe,
doesn't work.

Dave suggested that I embed the AGresv structures directly into xfs_perag, and
I realized that we'll only ever need two of these things -- one to feed the
AGFL (rmapbt) and another to feed the higher level btrees (refcountbt).  At the
same time, I decided that because the AGresv code ultimately knows whether an
allocation request was satisfied from a reservation or from the free space
btree, it should also have a hand in deciding whether or not to update the
transaction's block reservation.

So what I'm saying is that I think this problem was with the AGresv code not
doing accounting correctly, and that I've fixed it in a subsequent rewrite of
the AGresv code.  I'll post it later, after I figure out why generic/333
regresses with the new code.

However, there's one thing to be aware of -- if the AGresv uses up all the
blocks that were preallocated at mount time, the allocator will grab any free
blocks available and charge the blocks to the transaction, just like before.
If this ever happens (in theory we reserve enough blocks so that we can have a
refcount record for every block in the AG) then we'll still have this problem.

The most cautious thing to do, I think, is to combine the AGresv fixes with
this patch that adds a block reservation to truncate transactions in case the
AGresv can't supply a block to the refcount btree.  The problem here is that
for most cases we'll have both the AGresv and the transaction reserving blocks
for the same purpose, which seems excessive.  Moreover, it introduces the
possibility of userspace seeing ENOSPC while truncating files even if there's
actually sufficient space to handle a refcountbt split.

<shrug> What does everyone else think?

--D

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs



[Index of Archives]     [Linux XFS Devel]     [Linux Filesystem Development]     [Filesystem Testing]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux