Re: block allocations for the refcount btree

Dave Chinner <david@xxxxxxxxxxxxx> · Fri, 12 Feb 2016 07:21:48 +1100

On Thu, Feb 11, 2016 at 09:09:37AM -0500, Brian Foster wrote:
> On Thu, Feb 11, 2016 at 08:40:58AM +1100, Dave Chinner wrote:
> > On Wed, Feb 10, 2016 at 11:07:38AM -0800, Christoph Hellwig wrote:
> > > On Wed, Feb 10, 2016 at 01:50:10AM -0800, Darrick J. Wong wrote:
> > > > That's odd... I'd have thought that the AG reservation would always be able
> > > > to handle a refcount btree expansion, since it calculates how many blocks
> > > > are needed to handle the worst case of 1 record per extent.  There's also
> > > > a bug where we undercount the number of blocks already used, so it should
> > > > have an extra big reservation.
> > > > 
> > > > OTOH I've seen occasional ENOSPCs in generic/186 and generic/168 too, so I
> > > > guess something's going wrong.  Maybe the xfs_ag_resv* tracepoints can help?
> > > 
> > > I'm not seeing an ENOSPC, I run into:
> > > 
> > > [  640.924891] XFS: Assertion failed: tp->t_blk_res_used <= tp->t_blk_res, file: fs/xfs/xfs_trans.c, line: 315
> > 
> > I run into that from time to time (maybe once a month) on a vanilla
> > kernel.
> > 
> 
> Any idea which test reproduces? I see that generic/033 resulted from the
> discussion below on the rfc. I don't currently reproduce with that test,
> however. The test mentions it uses fzero because zero range doesn't do
> writeback (comments ftw :) and thus allows splitting of delalloc
> extents, but it looks like that might no longer be the case in the
> kernel (since zero range was simplified to reuse punch/alloc).

It's usually one of the fsstress tests that triggers it. For some
reason generic/233 sticks in my mind, but it's a pretty rare failure
these days...

> > IIRC, the problem is the delayed allocation extent split runs out of
> > it's reserved block count if you split it enough times. The case
> > I've seen is that  the indlen calculated in xfs_bmap_worst_indlen()
> > ends up too small for a subsequent allocation after we've called
> > xfs_bmap_del_extent() to delete the middle of a delalloc extent too
> > many times.
> > 
> > Brian had some patches that attempted to solve it - we may have
> > simply dropped the ball on this (again).
> > 
> > http://oss.sgi.com/archives/xfs/2014-09/msg00337.html
> > 
> 
> I recall working on this, but not quite where it left off. If I dig back
> to my old tree from before the oss.sgi.com->vger switchover, I have a v1
> branch for this work that was posted here:
> 
> http://oss.sgi.com/archives/xfs/2014-10/msg00294.html
> 
> It looks like we just never got it reviewed and I since lost track of
> it. I can resurrect it if warranted. I would like to nail down a current
> reproducer though...

*nod*. Not sure what we can use to trigger it, though.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs