On Tue, Aug 21, 2018 at 10:01:16AM -0400, Brian Foster wrote: > xfstests generic/388,475 occasionally reproduce assertion failures > in xfs_buf_item_unpin() when the final bli reference is dropped on > an invalidated buffer and the buffer is not locked as it is expected > to be. Invalidated buffers should remain locked on transaction > commit until the final unpin, at which point the buffer is removed > from the AIL and the bli is freed since stale buffers are not > written back. > > The assert failures are associated with filesystem shutdown, > typically due to log I/O errors injected by the test. The > problematic situation can occur if the shutdown happens to cause a > race between an active transaction that has invalidated a particular > buffer and an I/O error on a log buffer that contains the bli > associated with the same (now stale) buffer. > > Both transaction and log contexts acquire a bli reference. If the > transaction has already invalidated the buffer by the time the I/O > error occurs and ends up aborting due to shutdown, the transaction > and log hold the last two references to a stale bli. If the > transaction cancel occurs first, it treats the buffer as non-stale > due to the aborted state: the bli reference is dropped and the > buffer is released/unlocked. The log buffer I/O error handling > eventually calls into xfs_buf_item_unpin(), drops the final > reference to the bli and treats it as stale. The buffer wasn't left > locked by xfs_buf_item_unlock(), however, so the assert fails and > the buffer is double unlocked. The latter problem is mitigated by > the fact that the fs is shutdown and no further damage is possible. > > ->iop_unlock() of an invalidated buffer should behave consistently > with respect to the bli refcount, regardless of aborted state. If > the refcount remains elevated on commit, we know the bli is awaiting > an unpin (since it can't be in another transaction) and will be > handled appropriately on log buffer completion. If the final bli > reference of an invalidated buffer is dropped in ->iop_unlock(), we > can assume the transaction has aborted because invalidation implies > a dirty transaction. In the non-abort case, the log would have > acquired a bli reference in ->iop_pin() and prevented bli release at > ->iop_unlock() time. In the abort case the item must be freed and > buffer unlocked because it wasn't pinned by the log. > > Rework xfs_buf_item_unlock() to simplify the currently circuitous > and duplicate logic and leave invalidated buffers locked based on > bli refcount, regardless of aborted state. This ensures that a > pinned, stale buffer is always found locked when eventually > unpinned. > > Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx> > --- > > Hi all, > > This survives the assert reproducer, several xfstests runs with v4/v5 > superblock filesystems, and fsstress and fsx workloads over a few hours. > > Brian So far, this patch resolves the problem I had seen looping g/388,475 on debug kernels. I never encountered the issue on non-debug builds. Thanks! Reviewed-by: Bill O'Donnell <billodo@xxxxxxxxxx> > > fs/xfs/xfs_buf_item.c | 85 +++++++++++++++++++------------------------ > 1 file changed, 37 insertions(+), 48 deletions(-) > > diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c > index 1c9d1398980b..5e8b91543d93 100644 > --- a/fs/xfs/xfs_buf_item.c > +++ b/fs/xfs/xfs_buf_item.c > @@ -556,73 +556,62 @@ xfs_buf_item_unlock( > { > struct xfs_buf_log_item *bip = BUF_ITEM(lip); > struct xfs_buf *bp = bip->bli_buf; > + int freed; > bool aborted; > bool hold = !!(bip->bli_flags & XFS_BLI_HOLD); > bool dirty = !!(bip->bli_flags & XFS_BLI_DIRTY); > + bool stale = !!(bip->bli_flags & XFS_BLI_STALE); > #if defined(DEBUG) || defined(XFS_WARN) > bool ordered = !!(bip->bli_flags & XFS_BLI_ORDERED); > #endif > + /* > + * The bli dirty state should match whether the blf has logged segments > + * except for ordered buffers, where only the bli should be dirty. > + */ > + ASSERT((!ordered && dirty == xfs_buf_item_dirty_format(bip)) || > + (ordered && dirty && !xfs_buf_item_dirty_format(bip))); > + ASSERT(!stale || (bip->__bli_format.blf_flags & XFS_BLF_CANCEL)); > > - aborted = test_bit(XFS_LI_ABORTED, &lip->li_flags); > + trace_xfs_buf_item_unlock(bip); > > - /* Clear the buffer's association with this transaction. */ > - bp->b_transp = NULL; > + aborted = test_bit(XFS_LI_ABORTED, &lip->li_flags); > > /* > - * The per-transaction state has been copied above so clear it from the > - * bli. > + * Clear the buffer's association with this transaction and > + * per-transaction state from the bli, which has been copied above. > */ > + bp->b_transp = NULL; > bip->bli_flags &= ~(XFS_BLI_LOGGED | XFS_BLI_HOLD | XFS_BLI_ORDERED); > > /* > - * If the buf item is marked stale, then don't do anything. We'll > - * unlock the buffer and free the buf item when the buffer is unpinned > - * for the last time. > + * Drop the transaction bli reference and free the item if clean or > + * aborted and we had the last reference. In either case this is the > + * last opportunity to free the item since it won't be written back. > + * Otherwise, the bli is still referenced or dirty and will be freed on > + * final unpin of the buffer (if stale) or writeback completion. > */ > - if (bip->bli_flags & XFS_BLI_STALE) { > - trace_xfs_buf_item_unlock_stale(bip); > - ASSERT(bip->__bli_format.blf_flags & XFS_BLF_CANCEL); > - if (!aborted) { > - atomic_dec(&bip->bli_refcount); > - return; > - } > + freed = atomic_dec_and_test(&bip->bli_refcount); > + if (freed && (aborted || !dirty)) { > + ASSERT(!aborted || XFS_FORCED_SHUTDOWN(lip->li_mountp)); > + ASSERT(!stale || aborted); > + /* an aborted item may be in the AIL, remove it first */ > + if (aborted) > + xfs_trans_ail_remove(lip, SHUTDOWN_LOG_IO_ERROR); > + xfs_buf_item_relse(bp); > } > > - trace_xfs_buf_item_unlock(bip); > - > - /* > - * If the buf item isn't tracking any data, free it, otherwise drop the > - * reference we hold to it. If we are aborting the transaction, this may > - * be the only reference to the buf item, so we free it anyway > - * regardless of whether it is dirty or not. A dirty abort implies a > - * shutdown, anyway. > - * > - * The bli dirty state should match whether the blf has logged segments > - * except for ordered buffers, where only the bli should be dirty. > - */ > - ASSERT((!ordered && dirty == xfs_buf_item_dirty_format(bip)) || > - (ordered && dirty && !xfs_buf_item_dirty_format(bip))); > - > /* > - * Clean buffers, by definition, cannot be in the AIL. However, aborted > - * buffers may be in the AIL regardless of dirty state. An aborted > - * transaction that invalidates a buffer already in the AIL may have > - * marked it stale and cleared the dirty state, for example. > - * > - * Therefore if we are aborting a buffer and we've just taken the last > - * reference away, we have to check if it is in the AIL before freeing > - * it. We need to free it in this case, because an aborted transaction > - * has already shut the filesystem down and this is the last chance we > - * will have to do so. > + * If the buffer was invalidated, leave it locked on transaction commit > + * unless we just dropped the final reference. The latter case should > + * only ever happen on abort because invalidation dirties the > + * transaction and the log would have grabbed another bli reference when > + * the buffer was pinned. In the non-abort case, the buffer is unlocked > + * on final unpin and the bli freed since stale buffers are not written > + * back. > */ > - if (atomic_dec_and_test(&bip->bli_refcount)) { > - if (aborted) { > - ASSERT(XFS_FORCED_SHUTDOWN(lip->li_mountp)); > - xfs_trans_ail_remove(lip, SHUTDOWN_LOG_IO_ERROR); > - xfs_buf_item_relse(bp); > - } else if (!dirty) > - xfs_buf_item_relse(bp); > - } > + if (stale && !freed) > + return; > + ASSERT(!stale || (aborted && freed)); > > if (!hold) > xfs_buf_relse(bp); > -- > 2.17.1 >