On Tue, Jul 28, 2015 at 09:40:06AM -0400, Brian Foster wrote: > On Tue, Jul 28, 2015 at 10:40:09AM +1000, Dave Chinner wrote: > > [ reply to both patches in one reply, because it's related. ] > > > > On Thu, Jul 23, 2015 at 04:13:29PM -0400, Brian Foster wrote: > > > Some callers need to make error handling decisions based on whether the > > > current transaction successfully committed or not. Rename > > > xfs_trans_roll(), add a new parameter and provide a wrapper to preserve > > > existing callers. > > > > > > Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx> > > > --- > > > fs/xfs/xfs_trans.c | 15 +++++++++++++-- > > > fs/xfs/xfs_trans.h | 1 + > > > 2 files changed, 14 insertions(+), 2 deletions(-) > > > > > > diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c > > > index 0582a27..a0ab1da 100644 > > > --- a/fs/xfs/xfs_trans.c > > > +++ b/fs/xfs/xfs_trans.c > > > @@ -1019,9 +1019,10 @@ xfs_trans_cancel( > > > * chunk we've been working on and get a new transaction to continue. > > > */ > > > int > > > -xfs_trans_roll( > > > +__xfs_trans_roll( > > > struct xfs_trans **tpp, > > > - struct xfs_inode *dp) > > > + struct xfs_inode *dp, > > > + int *committed) > > > { > > > struct xfs_trans *trans; > > > struct xfs_trans_res tres; > > > @@ -1052,6 +1053,7 @@ xfs_trans_roll( > > > if (error) > > > return error; > > > > So here we return committed = 0, error != 0. > > > > > > > > + *committed = 1; > > > trans = *tpp; > > > > And from here on in we return committed = 1 regardless of the error. > > > > So looking at the next patch in xfs_bmap_finish(): > > > > > free->xbfi_blockcount); > > > > > > - error = xfs_trans_roll(tp, NULL); > > > - *committed = 1; > > > + error = __xfs_trans_roll(tp, NULL, committed); > > > + > > > /* > > > - * We have a new transaction, so we should return committed=1, > > > - * even though we're returning an error. > > > + * We have a new transaction, so we should return committed=1, even > > > + * though we're returning an error. If an error was returned after the > > > + * original transaction was committed, defer the error handling until > > > + * the EFD is logged. We do this because a committed EFI requires an EFD > > > + * transaction to be processed to ensure the EFI is released. > > > */ > > > - if (error) > > > + if (error && *committed == 0) { > > > + *committed = 1; > > > return error; > > > + } > > > > So if we failed to commit the EFI, we say we did and then return. > > Why do we need to do that? > > > > /me goes an looks at all the callers. > > > > Hmm - only xfs_itruncate_extents() relies on that behaviour, but it > > could just as easily do an inode join on error, because on success > > the inode has already been joined to the new transaction by > > xfs_trans_roll(). > > > > Interesting, though I don't follow why this caller even depends on it. > It doesn't transfer lock ownership to the transaction. What difference > does it make in the error path if the inode is joined? Callers of xfs_itruncate_extents() expect it to be locked and joined on return, even on error. > > Looking further, we have quite a bit of inconsistency in the error > > handling of xfs_bmap_finish() - some callers issue a > > xfs_bmap_cancel() in the error path, and some don't. failing to > > cancel the freelist on error looks to me like a memory leak, because > > we don't free the extents from the free list until the EFD for the > > extent has been logged. If we error out earlier, we still have items > > on the free list that haven't been processed. > > > > So it looks to me like we need fixes there. > > > > Heh, not too surprising. I'll make a note to make a pass through these. > > > Further, it appears to me that there is really one xfs_bmap_finish() > > caller that requires the committed flag: xfs_qm_dqalloc(). All the > > others either use it for an assert or joining the inode to the > > transaction when committed = 1, which xfs_trans_roll() will have > > already done if we return committed = 1.... > > > > Assuming xfs_trans_reserve() hasn't failed, which could cause *committed > == 1 without the inode joined. We could probably change this in > __xfs_trans_roll() since the inode is presumably already locked. I don't think we can join the inode until after the reservation is done. It could still be done in __xfs_trans_roll() regardless. > > > + return error; > > > > This loop doesn't obviously do the right thing now. It > > will log the first extent into the EFD and then trigger a shutdown > > and return. The extent count in the EFD may not match the > > extent count on the EFI, so releasing the EFD at this point may not > > release all the extents in the EFI and hence not release the EFI. > > > > The EFD unlock handler forcibly releases the EFI on abort. It drops the > EFI extent count reference by whatever the extent count on the EFD is, > and that is determined on EFD initialization (xfs_trans_get_efd()) > regardless of how many extents were logged to the EFD. > > That said, the error handling here is certainly not obvious because it > depends on the lifecycle of the associated log items. The broader goal > is to reduce that dependency so the code here is more straightforward... *nod* > > I think I'd prefer to see a xfs_trans_cancel_efi() call to handle > > this error path rather than having to go through the efd to release > > the reference on the EFI. i.e. > > > > error = __xfs_trans_roll(tp, NULL, &committed); > > if (error) { > > if (committed) { > > if (!XFS_FORCED_SHUTDOWN(mp)) > > xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE); > > xfs_efi_cancel(tp, efi); > > } > > return error; > > } > > > > That's a nice idea. It pulls some error handling out of the log item > handling code explicitly. An EFD version might be useful for the > unlogged EFD error case as well. IMO, the more of these cases that are > handled explicitly in xfs_bmap_finish() rather than implicitly via the > transaction management code, the more reliable and robust to future > change it will be. I'll explore it further... Yes, that mirrors my thinking exactly - the EFI/EFD error handling has always been problematic with the reliance on reference counting via transaction commit/abort callbacks to handle it. > > Hmmm - something I just noticed: if we only have one EFD per EFI, > > why do we do we have that layer of extent counting before dropping > > real references? > > > > I wondered this myself, but hadn't made it deep enough to see if we used > the reference count elsewhere. > > > > xfs_efd_item_unlock( > > > struct xfs_log_item *lip) > > > { > > > - if (lip->li_flags & XFS_LI_ABORTED) > > > - xfs_efd_item_free(EFD_ITEM(lip)); > > > + struct xfs_efd_log_item *efdp = EFD_ITEM(lip); > > > + > > > + if (lip->li_flags & XFS_LI_ABORTED) { > > > + xfs_efi_release(efdp->efd_efip, efdp->efd_format.efd_nextents); > > > + xfs_efd_item_free(efdp); > > > + } > > > } > > > > i.e. we always call xfs_efi_release() with efi_nextents or > > efd_nextents, which are always the same, and so we never partially > > complete an EFI. Should we just kill that layer, as it does tend to > > complicate the EFI release code? > > > > Yeah, that might be a good idea if we don't use the reference count > elsewhere. I'll look into that as a subsequent cleanup as well. Excellent! Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs