Re: [PATCH] xfs: don't unlock invalidated buf on aborted tx commit

"Bill O'Donnell" <billodo@xxxxxxxxxx> · Tue, 21 Aug 2018 14:56:01 -0500

On Tue, Aug 21, 2018 at 10:01:16AM -0400, Brian Foster wrote:
> xfstests generic/388,475 occasionally reproduce assertion failures
> in xfs_buf_item_unpin() when the final bli reference is dropped on
> an invalidated buffer and the buffer is not locked as it is expected
> to be. Invalidated buffers should remain locked on transaction
> commit until the final unpin, at which point the buffer is removed
> from the AIL and the bli is freed since stale buffers are not
> written back.
> 
> The assert failures are associated with filesystem shutdown,
> typically due to log I/O errors injected by the test. The
> problematic situation can occur if the shutdown happens to cause a
> race between an active transaction that has invalidated a particular
> buffer and an I/O error on a log buffer that contains the bli
> associated with the same (now stale) buffer.
> 
> Both transaction and log contexts acquire a bli reference. If the
> transaction has already invalidated the buffer by the time the I/O
> error occurs and ends up aborting due to shutdown, the transaction
> and log hold the last two references to a stale bli. If the
> transaction cancel occurs first, it treats the buffer as non-stale
> due to the aborted state: the bli reference is dropped and the
> buffer is released/unlocked. The log buffer I/O error handling
> eventually calls into xfs_buf_item_unpin(), drops the final
> reference to the bli and treats it as stale. The buffer wasn't left
> locked by xfs_buf_item_unlock(), however, so the assert fails and
> the buffer is double unlocked. The latter problem is mitigated by
> the fact that the fs is shutdown and no further damage is possible.
> 
> ->iop_unlock() of an invalidated buffer should behave consistently
> with respect to the bli refcount, regardless of aborted state. If
> the refcount remains elevated on commit, we know the bli is awaiting
> an unpin (since it can't be in another transaction) and will be
> handled appropriately on log buffer completion. If the final bli
> reference of an invalidated buffer is dropped in ->iop_unlock(), we
> can assume the transaction has aborted because invalidation implies
> a dirty transaction. In the non-abort case, the log would have
> acquired a bli reference in ->iop_pin() and prevented bli release at
> ->iop_unlock() time. In the abort case the item must be freed and
> buffer unlocked because it wasn't pinned by the log.
> 
> Rework xfs_buf_item_unlock() to simplify the currently circuitous
> and duplicate logic and leave invalidated buffers locked based on
> bli refcount, regardless of aborted state. This ensures that a
> pinned, stale buffer is always found locked when eventually
> unpinned.
> 
> Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx>
> ---
> 
> Hi all,
> 
> This survives the assert reproducer, several xfstests runs with v4/v5
> superblock filesystems, and fsstress and fsx workloads over a few hours.
> 
> Brian

So far, this patch resolves the problem I had seen looping g/388,475
on debug kernels. I never encountered the issue on non-debug builds.
Thanks!

Reviewed-by: Bill O'Donnell <billodo@xxxxxxxxxx>
> 
>  fs/xfs/xfs_buf_item.c | 85 +++++++++++++++++++------------------------
>  1 file changed, 37 insertions(+), 48 deletions(-)
> 
> diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
> index 1c9d1398980b..5e8b91543d93 100644
> --- a/fs/xfs/xfs_buf_item.c
> +++ b/fs/xfs/xfs_buf_item.c
> @@ -556,73 +556,62 @@ xfs_buf_item_unlock(
>  {
>  	struct xfs_buf_log_item	*bip = BUF_ITEM(lip);
>  	struct xfs_buf		*bp = bip->bli_buf;
> +	int			freed;
>  	bool			aborted;
>  	bool			hold = !!(bip->bli_flags & XFS_BLI_HOLD);
>  	bool			dirty = !!(bip->bli_flags & XFS_BLI_DIRTY);
> +	bool			stale = !!(bip->bli_flags & XFS_BLI_STALE);
>  #if defined(DEBUG) || defined(XFS_WARN)
>  	bool			ordered = !!(bip->bli_flags & XFS_BLI_ORDERED);
>  #endif
> +	/*
> +	 * The bli dirty state should match whether the blf has logged segments
> +	 * except for ordered buffers, where only the bli should be dirty.
> +	 */
> +	ASSERT((!ordered && dirty == xfs_buf_item_dirty_format(bip)) ||
> +	       (ordered && dirty && !xfs_buf_item_dirty_format(bip)));
> +	ASSERT(!stale || (bip->__bli_format.blf_flags & XFS_BLF_CANCEL));
>  
> -	aborted = test_bit(XFS_LI_ABORTED, &lip->li_flags);
> +	trace_xfs_buf_item_unlock(bip);
>  
> -	/* Clear the buffer's association with this transaction. */
> -	bp->b_transp = NULL;
> +	aborted = test_bit(XFS_LI_ABORTED, &lip->li_flags);
>  
>  	/*
> -	 * The per-transaction state has been copied above so clear it from the
> -	 * bli.
> +	 * Clear the buffer's association with this transaction and
> +	 * per-transaction state from the bli, which has been copied above.
>  	 */
> +	bp->b_transp = NULL;
>  	bip->bli_flags &= ~(XFS_BLI_LOGGED | XFS_BLI_HOLD | XFS_BLI_ORDERED);
>  
>  	/*
> -	 * If the buf item is marked stale, then don't do anything.  We'll
> -	 * unlock the buffer and free the buf item when the buffer is unpinned
> -	 * for the last time.
> +	 * Drop the transaction bli reference and free the item if clean or
> +	 * aborted and we had the last reference. In either case this is the
> +	 * last opportunity to free the item since it won't be written back.
> +	 * Otherwise, the bli is still referenced or dirty and will be freed on
> +	 * final unpin of the buffer (if stale) or writeback completion.
>  	 */
> -	if (bip->bli_flags & XFS_BLI_STALE) {
> -		trace_xfs_buf_item_unlock_stale(bip);
> -		ASSERT(bip->__bli_format.blf_flags & XFS_BLF_CANCEL);
> -		if (!aborted) {
> -			atomic_dec(&bip->bli_refcount);
> -			return;
> -		}
> +	freed = atomic_dec_and_test(&bip->bli_refcount);
> +	if (freed && (aborted || !dirty)) {
> +		ASSERT(!aborted || XFS_FORCED_SHUTDOWN(lip->li_mountp));
> +		ASSERT(!stale || aborted);
> +		/* an aborted item may be in the AIL, remove it first */
> +		if (aborted)
> +			xfs_trans_ail_remove(lip, SHUTDOWN_LOG_IO_ERROR);
> +		xfs_buf_item_relse(bp);
>  	}
>  
> -	trace_xfs_buf_item_unlock(bip);
> -
> -	/*
> -	 * If the buf item isn't tracking any data, free it, otherwise drop the
> -	 * reference we hold to it. If we are aborting the transaction, this may
> -	 * be the only reference to the buf item, so we free it anyway
> -	 * regardless of whether it is dirty or not. A dirty abort implies a
> -	 * shutdown, anyway.
> -	 *
> -	 * The bli dirty state should match whether the blf has logged segments
> -	 * except for ordered buffers, where only the bli should be dirty.
> -	 */
> -	ASSERT((!ordered && dirty == xfs_buf_item_dirty_format(bip)) ||
> -	       (ordered && dirty && !xfs_buf_item_dirty_format(bip)));
> -
>  	/*
> -	 * Clean buffers, by definition, cannot be in the AIL. However, aborted
> -	 * buffers may be in the AIL regardless of dirty state. An aborted
> -	 * transaction that invalidates a buffer already in the AIL may have
> -	 * marked it stale and cleared the dirty state, for example.
> -	 *
> -	 * Therefore if we are aborting a buffer and we've just taken the last
> -	 * reference away, we have to check if it is in the AIL before freeing
> -	 * it. We need to free it in this case, because an aborted transaction
> -	 * has already shut the filesystem down and this is the last chance we
> -	 * will have to do so.
> +	 * If the buffer was invalidated, leave it locked on transaction commit
> +	 * unless we just dropped the final reference. The latter case should
> +	 * only ever happen on abort because invalidation dirties the
> +	 * transaction and the log would have grabbed another bli reference when
> +	 * the buffer was pinned. In the non-abort case, the buffer is unlocked
> +	 * on final unpin and the bli freed since stale buffers are not written
> +	 * back.
>  	 */
> -	if (atomic_dec_and_test(&bip->bli_refcount)) {
> -		if (aborted) {
> -			ASSERT(XFS_FORCED_SHUTDOWN(lip->li_mountp));
> -			xfs_trans_ail_remove(lip, SHUTDOWN_LOG_IO_ERROR);
> -			xfs_buf_item_relse(bp);
> -		} else if (!dirty)
> -			xfs_buf_item_relse(bp);
> -	}
> +	if (stale && !freed)
> +		return;
> +	ASSERT(!stale || (aborted && freed));
>  
>  	if (!hold)
>  		xfs_buf_relse(bp);
> -- 
> 2.17.1
>