Re: [RFC PATCH 07/11] xfs: retry trans reservation on ENOSPC in xfs_inactive()

Dave Chinner <david@xxxxxxxxxxxxx> · Fri, 6 Sep 2013 10:17:18 +1000

On Thu, Sep 05, 2013 at 12:18:31PM -0400, Brian Foster wrote:
> On 09/04/2013 09:40 PM, Dave Chinner wrote:
> > On Tue, Sep 03, 2013 at 02:25:04PM -0400, Brian Foster wrote:
> >> An ifree data block reservation can fail with ENOSPC. Flush inodes
> >> to try and free up space or attempt without a data block
> >> reservation to avoid failing out of xfs_inactive().
> >>
> >> Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx>
> >> ---
> >>  fs/xfs/xfs_inode.c | 11 +++++++++++
> >>  1 file changed, 11 insertions(+)
> >>
> >> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> >> index 56cbf63..92de4b7 100644
> >> --- a/fs/xfs/xfs_inode.c
> >> +++ b/fs/xfs/xfs_inode.c
> >> @@ -1779,7 +1779,18 @@ xfs_inactive(
> >>  	tp = xfs_trans_alloc(mp, XFS_TRANS_INACTIVE);
> >>  	error = xfs_trans_reserve(tp, &M_RES(mp)->tr_ifree,
> >>  				  XFS_IFREE_SPACE_RES(mp), 0);
> >> +	if (error == ENOSPC) {
> >> +		/* flush outstanding delalloc blocks and retry */
> >> +		xfs_flush_inodes(mp);
> >> +		error = xfs_trans_reserve(tp, &M_RES(mp)->tr_ifree,
> >> +					  XFS_IFREE_SPACE_RES(mp), 0);
> >> +	}
> > 
> > We don't want to be blocking for inode flushes here. We might be in
> > a shrinker context, for example, and blocking those for a filesystem
> > sync is going to be unfriendly.
> > 
> 
> Ok.
> 
> > If this really is a problem, then the right thing to do is to allow
> > this transaction to dip into the reserve block pool so the
> > transaction can complete and make progress - other write operations
> > will trigger the flushing of the filesystem, and freeing of whole
> > inode chunks should return more free space than we need for the
> > finobt modifications in the removing lots of zero length inodes
> > at ENOSPC case....
> > 
> 
> I did have one of the enospc xfstests lead to this situation, though I
> don't have the particular test in my notes. It initially manifested as
> an assert failure due to the fs not being shutdown after an
> xfs_trans_reserve() ENOSPC failure.

Ok. I can see how ENOSPC might occur here :)

> Subsequent to avoiding that, I
> believe there were inconsistent fs issues called out due to the unlinked
> lists being populated after umount.

That sounds like a recovery failure, not so much an ENOSPC failure.
i.e. that recovery only looks at the log to see if it's clean, and
only recovers unlinked lists if it's dirty. There is the
*possibility* of having a clean log with inodes on the unlinked
list, and log recovery doesn't run the unlinked list processing in
that case.

This is one of the issues we'll need to fix for O_TMPFILE support
as it will actively use inodes on unlinked list for potentially long
periods of time.

> Taking a further look, I missed the XFS_TRANS_RESERVE flag and whole
> m_resblks mechanism. I'll take a closer look at that and see if that
> works to resolve the problem instead of the flush.

It should - the only time it won't is if we exhaust the pool, but
that doesn't happen in normal ENOSPC situations and any blocks we do
end up freeing will immediately refill the reserve pool...

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs