On Mon, Aug 27, 2018 at 10:25:48AM -0400, Brian Foster wrote: > The xfs_buf_log_item structure has a reference counter with slightly > tricky semantics. In the common case, a buffer is logged and > committed in a transaction, committed to the on-disk log (added to > the AIL) and then finally written back and removed from the AIL. The > bli refcount covers two potentially overlapping timeframes: > > 1. the bli is held in an active transaction > 2. the bli is pinned by the log > > The caveat to this approach is that the reference counter does not > purely dictate the lifetime of the bli. IOW, when a dirty buffer is > physically logged and unpinned, the bli refcount may go to zero as > the log item is inserted into the AIL. Only once the buffer is > written back can the bli finally be freed. > > The above semantics means that it is not enough for the various > refcount decrementing contexts to release the bli on decrement to > zero. xfs_trans_brelse(), transaction commit (->iop_unlock()) and > unpin (->iop_unpin()) must all drop the associated reference and > make additional checks to determine if the current context is > responsible for freeing the item. > > For example, if a transaction holds but does not dirty a particular > bli, the commit may drop the refcount to zero. If the bli itself is > clean, it is also not AIL resident and must be freed at this time. > The same is true for xfs_trans_brelse(). If the transaction dirties > a bli and then aborts or an unpin results in an abort due to a log > I/O error, the last reference count holder is expected to explicitly > remove the item from the AIL and release it (since an abort means > filesystem shutdown and metadata writeback will never occur). > > This leads to fairly complex checks being replicated in a few > different places. Since ->iop_unlock() and xfs_trans_brelse() are > nearly identical, refactor the logic into a common helper that > implements and documents the semantics in one place. This patch does > not change behavior. I think this improves the code further, because now we don't have to care about the buf item cleanup when dropping references. > Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx> > --- > fs/xfs/xfs_buf_item.c | 85 ++++++++++++++++++++++++------------------ > fs/xfs/xfs_buf_item.h | 1 + > fs/xfs/xfs_trans_buf.c | 22 +---------- > 3 files changed, 51 insertions(+), 57 deletions(-) > > diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c > index b39d1b5320e7..d0191451fe18 100644 > --- a/fs/xfs/xfs_buf_item.c > +++ b/fs/xfs/xfs_buf_item.c > @@ -556,13 +556,12 @@ xfs_buf_item_unlock( > { > struct xfs_buf_log_item *bip = BUF_ITEM(lip); > struct xfs_buf *bp = bip->bli_buf; > - bool freed; > - bool aborted; > + bool released; > bool hold = bip->bli_flags & XFS_BLI_HOLD; > - bool dirty = bip->bli_flags & XFS_BLI_DIRTY; > bool stale = bip->bli_flags & XFS_BLI_STALE; > #if defined(DEBUG) || defined(XFS_WARN) > bool ordered = bip->bli_flags & XFS_BLI_ORDERED; > + bool dirty = bip->bli_flags & XFS_BLI_DIRTY; > #endif > trace_xfs_buf_item_unlock(bip); > > @@ -574,8 +573,6 @@ xfs_buf_item_unlock( > (ordered && dirty && !xfs_buf_item_dirty_format(bip))); > ASSERT(!stale || (bip->__bli_format.blf_flags & XFS_BLF_CANCEL)); > > - aborted = test_bit(XFS_LI_ABORTED, &lip->li_flags); > - > /* > * Clear the buffer's association with this transaction and > * per-transaction state from the bli, which has been copied above. > @@ -584,40 +581,16 @@ xfs_buf_item_unlock( > bip->bli_flags &= ~(XFS_BLI_LOGGED | XFS_BLI_HOLD | XFS_BLI_ORDERED); > > /* > - * Drop the transaction's bli reference and deal with the item if we had > - * the last one. We must free the item if clean or aborted since it > - * wasn't pinned by the log and this is the last chance to do so. If the > - * bli is freed and dirty (but non-aborted), the buffer was not dirty in > - * this transaction but modified by a previous one and still awaiting > - * writeback. In that case, the bli is freed on buffer writeback > - * completion. > + * Unref the item and unlock the buffer unless held or stale. Stale > + * buffers remain locked until final unpin unless the bli is freed by > + * the unref call. The latter implies shutdown because buffer > + * invalidation dirties the bli and transaction. > */ > - freed = atomic_dec_and_test(&bip->bli_refcount); > - if (freed) { > - ASSERT(!aborted || XFS_FORCED_SHUTDOWN(lip->li_mountp)); > - /* > - * An aborted item may be in the AIL regardless of dirty state. > - * For example, consider an aborted transaction that invalidated > - * a dirty bli and cleared the dirty state. > - */ > - if (aborted) > - xfs_trans_ail_remove(lip, SHUTDOWN_LOG_IO_ERROR); > - if (aborted || !dirty) > - xfs_buf_item_relse(bp); > - } else if (stale) { > - /* > - * Stale buffers remain locked until final unpin unless the bli > - * was freed in the branch above. A freed stale bli implies an > - * abort because buffer invalidation dirties the bli and > - * transaction. > - */ > - ASSERT(!freed); > + released = xfs_buf_item_unref(bip); > + if ((stale && !released) || hold) > return; Personal preference, but I'd put the hold check first because it's not a compound logic statement. > +bool > +xfs_buf_item_unref( "+xfs_buf_item_put()" to be consistent with dropping references on other reference counted objects (e.g. iput, dput, bio_put, xfs_perag_put, xfs_log_ticket_put, etc). > + struct xfs_buf_log_item *bip) > +{ > + struct xfs_log_item *lip = &bip->bli_item; > + bool aborted; > + bool dirty; > + bool freed; > + > + /* drop the bli ref and return if it wasn't the last one */ > + freed = atomic_dec_and_test(&bip->bli_refcount); > + if (!freed) > + return false; We don't really need the freed variable here. This if (!atomic_dec_and_test(refcount)) return false is a common enough "do something only when we drop the last reference" pattern that everyone should understand it by now. > + * If the bli is dirty and non-aborted, the buffer was clean in the > + * transaction but still awaiting writeback from previous changes. In > + * that case, the bli is freed on buffer writeback completion. > + */ > + aborted = test_bit(XFS_LI_ABORTED, &lip->li_flags) || > + XFS_FORCED_SHUTDOWN(lip->li_mountp); > + dirty = bip->bli_flags & XFS_BLI_DIRTY; > + if (!aborted && dirty) > + return false; Personal preference (again) is to check dirty first - same as the comment text mentions "dirty and not-aborted"... These are all minor things - I think it's a good improvement overall. Cheers, dave. -- Dave Chinner david@xxxxxxxxxxxxx