Re: [PATCH RFC 1/2] xfs: return committed status from xfs_trans_roll()

Brian Foster <bfoster@xxxxxxxxxx> · Wed, 29 Jul 2015 07:47:52 -0400

On Wed, Jul 29, 2015 at 07:51:40AM +1000, Dave Chinner wrote:
> On Tue, Jul 28, 2015 at 09:40:06AM -0400, Brian Foster wrote:
> > On Tue, Jul 28, 2015 at 10:40:09AM +1000, Dave Chinner wrote:
> > > [ reply to both patches in one reply, because it's related. ]
> > > 
> > > On Thu, Jul 23, 2015 at 04:13:29PM -0400, Brian Foster wrote:
...
> > > >  			free->xbfi_blockcount);
> > > >  
> > > > -	error = xfs_trans_roll(tp, NULL);
> > > > -	*committed = 1;
> > > > +	error = __xfs_trans_roll(tp, NULL, committed);
> > > > +
> > > >  	/*
> > > > -	 * We have a new transaction, so we should return committed=1,
> > > > -	 * even though we're returning an error.
> > > > +	 * We have a new transaction, so we should return committed=1, even
> > > > +	 * though we're returning an error. If an error was returned after the
> > > > +	 * original transaction was committed, defer the error handling until
> > > > +	 * the EFD is logged. We do this because a committed EFI requires an EFD
> > > > +	 * transaction to be processed to ensure the EFI is released.
> > > >  	 */
> > > > -	if (error)
> > > > +	if (error && *committed == 0) {
> > > > +		*committed = 1;
> > > >  		return error;
> > > > +	}
> > > 
> > > So if we failed to commit the EFI, we say we did and then return.
> > > Why do we need to do that?
> > > 
> > > /me goes an looks at all the callers.
> > > 
> > > Hmm - only xfs_itruncate_extents() relies on that behaviour, but it
> > > could just as easily do an inode join on error, because on success
> > > the inode has already been joined to the new transaction by
> > > xfs_trans_roll().
> > > 
> > 
> > Interesting, though I don't follow why this caller even depends on it.
> > It doesn't transfer lock ownership to the transaction. What difference
> > does it make in the error path if the inode is joined?
> 
> Callers of xfs_itruncate_extents() expect it to be locked and joined
> on return, even on error.
> 

All of the callers I see cancel the transaction and unlock the inode
separately on error. I could be glossing over some obvious point here,
but so far I'm not seeing it...

> > > Looking further, we have quite a bit of inconsistency in the error
> > > handling of xfs_bmap_finish() - some callers issue a
> > > xfs_bmap_cancel() in the error path, and some don't. failing to
> > > cancel the freelist on error looks to me like a memory leak, because
> > > we don't free the extents from the free list until the EFD for the
> > > extent has been logged. If we error out earlier, we still have items
> > > on the free list that haven't been processed.
> > > 
> > > So it looks to me like we need fixes there.
> > > 
> > 
> > Heh, not too surprising. I'll make a note to make a pass through these.
> > 
> > > Further, it appears to me that there is really one xfs_bmap_finish()
> > > caller that requires the committed flag: xfs_qm_dqalloc(). All the
> > > others either use it for an assert or joining the inode to the
> > > transaction when committed = 1, which xfs_trans_roll() will have
> > > already done if we return committed = 1....
> > > 
> > 
> > Assuming xfs_trans_reserve() hasn't failed, which could cause *committed
> > == 1 without the inode joined. We could probably change this in
> > __xfs_trans_roll() since the inode is presumably already locked.
> 
> I don't think we can join the inode until after the reservation is
> done. It could still be done in __xfs_trans_roll() regardless.
> 

Hmm, I don't see anything obvious that prevents it. It probably defies
convention though. Anyways, we're probably a few levels of indirection
away from the bug fixes and work that we know needs to happen at this
point. I'll try to get the EFI/EFD stuff worked out, along with some of
the other issues I'm reproducing, then we can go from there...

Brian

> > > > +	return error;
> > > 
> > > This loop doesn't obviously do the right thing now. It
> > > will log the first extent into the EFD and then trigger a shutdown
> > > and return. The extent count in the EFD may not match the
> > > extent count on the EFI, so releasing the EFD at this point may not
> > > release all the extents in the EFI and hence not release the EFI.
> > > 
> > 
> > The EFD unlock handler forcibly releases the EFI on abort. It drops the
> > EFI extent count reference by whatever the extent count on the EFD is,
> > and that is determined on EFD initialization (xfs_trans_get_efd())
> > regardless of how many extents were logged to the EFD.
> > 
> > That said, the error handling here is certainly not obvious because it
> > depends on the lifecycle of the associated log items. The broader goal
> > is to reduce that dependency so the code here is more straightforward...
> 
> *nod*
> 
> > > I think I'd prefer to see a xfs_trans_cancel_efi() call to handle
> > > this error path rather than having to go through the efd to release
> > > the reference on the EFI. i.e.
> > > 
> > > 	error = __xfs_trans_roll(tp, NULL, &committed);
> > > 	if (error) {
> > > 		if (committed) {
> > > 			if (!XFS_FORCED_SHUTDOWN(mp))
> > > 				xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
> > > 			xfs_efi_cancel(tp, efi);
> > > 		}
> > > 		return error;
> > > 	}
> > > 
> > 
> > That's a nice idea. It pulls some error handling out of the log item
> > handling code explicitly. An EFD version might be useful for the
> > unlogged EFD error case as well. IMO, the more of these cases that are
> > handled explicitly in xfs_bmap_finish() rather than implicitly via the
> > transaction management code, the more reliable and robust to future
> > change it will be. I'll explore it further...
> 
> Yes, that mirrors my thinking exactly - the EFI/EFD error handling
> has always been problematic with the reliance on reference counting
> via transaction commit/abort callbacks to handle it.
> 
> > > Hmmm - something I just noticed: if we only have one EFD per EFI,
> > > why do we do we have that layer of extent counting before dropping
> > > real references?
> > > 
> > 
> > I wondered this myself, but hadn't made it deep enough to see if we used
> > the reference count elsewhere.
> > 
> > > >  xfs_efd_item_unlock(
> > > >  	struct xfs_log_item	*lip)
> > > >  {
> > > > -	if (lip->li_flags & XFS_LI_ABORTED)
> > > > -		xfs_efd_item_free(EFD_ITEM(lip));
> > > > +	struct xfs_efd_log_item	*efdp = EFD_ITEM(lip);
> > > > +
> > > > +	if (lip->li_flags & XFS_LI_ABORTED) {
> > > > +		xfs_efi_release(efdp->efd_efip, efdp->efd_format.efd_nextents);
> > > > +		xfs_efd_item_free(efdp);
> > > > +	}
> > > >  }
> > > 
> > > i.e. we always call xfs_efi_release() with efi_nextents or
> > > efd_nextents, which are always the same, and so we never partially
> > > complete an EFI. Should we just kill that layer, as it does tend to
> > > complicate the EFI release code?
> > > 
> > 
> > Yeah, that might be a good idea if we don't use the reference count
> > elsewhere. I'll look into that as a subsequent cleanup as well.
> 
> Excellent!
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@xxxxxxxxxxxxx

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs