On Wed, Aug 22, 2018 at 10:28:40AM +1000, Dave Chinner wrote: > On Tue, Aug 21, 2018 at 10:01:16AM -0400, Brian Foster wrote: > > xfstests generic/388,475 occasionally reproduce assertion failures > > in xfs_buf_item_unpin() when the final bli reference is dropped on > > an invalidated buffer and the buffer is not locked as it is expected > > to be. Invalidated buffers should remain locked on transaction > > commit until the final unpin, at which point the buffer is removed > > from the AIL and the bli is freed since stale buffers are not > > written back. > > > > The assert failures are associated with filesystem shutdown, > > typically due to log I/O errors injected by the test. The > > problematic situation can occur if the shutdown happens to cause a > > race between an active transaction that has invalidated a particular > > buffer and an I/O error on a log buffer that contains the bli > > associated with the same (now stale) buffer. > > > > Both transaction and log contexts acquire a bli reference. If the > > transaction has already invalidated the buffer by the time the I/O > > error occurs and ends up aborting due to shutdown, the transaction > > and log hold the last two references to a stale bli. If the > > transaction cancel occurs first, it treats the buffer as non-stale > > due to the aborted state: the bli reference is dropped and the > > buffer is released/unlocked. The log buffer I/O error handling > > eventually calls into xfs_buf_item_unpin(), drops the final > > reference to the bli and treats it as stale. The buffer wasn't left > > locked by xfs_buf_item_unlock(), however, so the assert fails and > > the buffer is double unlocked. The latter problem is mitigated by > > the fact that the fs is shutdown and no further damage is possible. > > > > ->iop_unlock() of an invalidated buffer should behave consistently > > with respect to the bli refcount, regardless of aborted state. If > > the refcount remains elevated on commit, we know the bli is awaiting > > an unpin (since it can't be in another transaction) and will be > > handled appropriately on log buffer completion. If the final bli > > reference of an invalidated buffer is dropped in ->iop_unlock(), we > > can assume the transaction has aborted because invalidation implies > > a dirty transaction. In the non-abort case, the log would have > > acquired a bli reference in ->iop_pin() and prevented bli release at > > ->iop_unlock() time. In the abort case the item must be freed and > > buffer unlocked because it wasn't pinned by the log. > > > > Rework xfs_buf_item_unlock() to simplify the currently circuitous > > and duplicate logic and leave invalidated buffers locked based on > > bli refcount, regardless of aborted state. This ensures that a > > pinned, stale buffer is always found locked when eventually > > unpinned. > > Sounds reasonable to me. > > BTW, can you follow this up by checking the freeing logic in > xfs_trans_brelse()? There's a commented out assert that the pincount > should be zero, implying that at some point there was a similar > free within transaction context vs unpin from log io completion > race occurring there. > Sure, I'll look into it once I get this patch fixed up. > > > > Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx> > > --- > > > > Hi all, > > > > This survives the assert reproducer, several xfstests runs with v4/v5 > > superblock filesystems, and fsstress and fsx workloads over a few hours. > > > > Brian > > > > fs/xfs/xfs_buf_item.c | 85 +++++++++++++++++++------------------------ > > 1 file changed, 37 insertions(+), 48 deletions(-) > > > > diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c > > index 1c9d1398980b..5e8b91543d93 100644 > > --- a/fs/xfs/xfs_buf_item.c > > +++ b/fs/xfs/xfs_buf_item.c > > @@ -556,73 +556,62 @@ xfs_buf_item_unlock( > > { > > struct xfs_buf_log_item *bip = BUF_ITEM(lip); > > struct xfs_buf *bp = bip->bli_buf; > > + int freed; > > atomic_dec_and_test() returns a bool. > Ok. > > bool aborted; > > bool hold = !!(bip->bli_flags & XFS_BLI_HOLD); > > bool dirty = !!(bip->bli_flags & XFS_BLI_DIRTY); > > + bool stale = !!(bip->bli_flags & XFS_BLI_STALE); > > #if defined(DEBUG) || defined(XFS_WARN) > > bool ordered = !!(bip->bli_flags & XFS_BLI_ORDERED); > > #endif > > Can we get rid of the "!!" here? The variables are booleans, so > there's not need to do a double negation to squash the value down to > 0 or 1 as the compiler will do that implicitly... > Yeah, I'll remove those. > > + /* > > + * The bli dirty state should match whether the blf has logged segments > > + * except for ordered buffers, where only the bli should be dirty. > > + */ > > + ASSERT((!ordered && dirty == xfs_buf_item_dirty_format(bip)) || > > + (ordered && dirty && !xfs_buf_item_dirty_format(bip))); > > + ASSERT(!stale || (bip->__bli_format.blf_flags & XFS_BLF_CANCEL)); > > + trace_xfs_buf_item_unlock(bip); > > Can you put the trace before the asserts? If an assert fires, we > want the ibuf item state in the event trace record... > Ok. > Speaking of trace events, the trace_xfs_buf_item_unlock_stale() > event is no longer called by this function. Can you remove it or > or re-add it as per below? > I removed it intentionally because the existing tracepoint already expresses stale state, I just forgot to delete it. > > + aborted = test_bit(XFS_LI_ABORTED, &lip->li_flags); > > > > /* > > + * Clear the buffer's association with this transaction and > > + * per-transaction state from the bli, which has been copied above. > > */ > > + bp->b_transp = NULL; > > bip->bli_flags &= ~(XFS_BLI_LOGGED | XFS_BLI_HOLD | XFS_BLI_ORDERED); > > > > /* > > + * Drop the transaction bli reference and free the item if clean or > > + * aborted and we had the last reference. In either case this is the > > + * last opportunity to free the item since it won't be written back. > > + * Otherwise, the bli is still referenced or dirty and will be freed on > > + * final unpin of the buffer (if stale) or writeback completion. > > */ > > + freed = atomic_dec_and_test(&bip->bli_refcount); > > + if (freed && (aborted || !dirty)) { > > + ASSERT(!aborted || XFS_FORCED_SHUTDOWN(lip->li_mountp)); > > + ASSERT(!stale || aborted); > > + /* an aborted item may be in the AIL, remove it first */ > > + if (aborted) > > + xfs_trans_ail_remove(lip, SHUTDOWN_LOG_IO_ERROR); > > + xfs_buf_item_relse(bp); > > } > > > > /* > > + * If the buffer was invalidated, leave it locked on transaction commit > > + * unless we just dropped the final reference. The latter case should > > + * only ever happen on abort because invalidation dirties the > > + * transaction and the log would have grabbed another bli reference when > > + * the buffer was pinned. In the non-abort case, the buffer is unlocked > > + * on final unpin and the bli freed since stale buffers are not written > > + * back. > > */ > > + if (stale && !freed) > > + return; > > This logic is still a bit circuitous. We have one action if the item > is freed, and another action if it not freed. But they are separated > by large comments and it's not clear that we do one or the other, > but not both. Maybe it would be clearer if it were structured like > this: > I mainly wanted to condense some of the comments and duplicate code because the existing function seemed far longer than necessary, adding to the confusion. Beyond that, I made the above structure changes because I found it more simple to isolate the cases where we were responsible to free the item from the stale specific check (and comments) around unlocking it or not. That said... > /* > * Comment explaining life, the universe and everything. > */ > freed = atomic_dec_and_test(&bip->bli_refcount); > if (freed) { > if (aborted || !dirty) { > .... > } else if (stale) { > trace_xfs_buf_item_unlock_stale(bip); > return; > } ... this looks equivalent (minus the stale tp that I still want to kill) to me and facilitates the other changes. I'll give it a try. Brian > > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx