Re: [PATCH 1/4] xfs: buffer pins need to hold a buffer reference

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, May 17, 2023 at 10:04:46AM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@xxxxxxxxxx>
> 
> When a buffer is unpinned by xfs_buf_item_unpin(), we need to access
> the buffer after we've dropped the buffer log item reference count.
> This opens a window where we can have two racing unpins for the
> buffer item (e.g. shutdown checkpoint context callback processing
> racing with journal IO iclog completion processing) and both attempt
> to access the buffer after dropping the BLI reference count.  If we
> are unlucky, the "BLI freed" context wins the race and frees the
> buffer before the "BLI still active" case checks the buffer pin
> count.
> 
> This results in a use after free that can only be triggered
> in active filesystem shutdown situations.
> 
> To fix this, we need to ensure that buffer existence extends beyond
> the BLI reference count checks and until the unpin processing is
> complete. This implies that a buffer pin operation must also take a
> buffer reference to ensure that the buffer cannot be freed until the
> buffer unpin processing is complete.
> 
> Reported-by: yangerkun <yangerkun@xxxxxxxxxx>
> Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>

LGTM
Reviewed-by: Darrick J. Wong <djwong@xxxxxxxxxx>

--D

> ---
>  fs/xfs/xfs_buf_item.c | 88 ++++++++++++++++++++++++++++++++-----------
>  1 file changed, 65 insertions(+), 23 deletions(-)
> 
> diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
> index df7322ed73fa..b2d211730fd2 100644
> --- a/fs/xfs/xfs_buf_item.c
> +++ b/fs/xfs/xfs_buf_item.c
> @@ -452,10 +452,18 @@ xfs_buf_item_format(
>   * This is called to pin the buffer associated with the buf log item in memory
>   * so it cannot be written out.
>   *
> - * We also always take a reference to the buffer log item here so that the bli
> - * is held while the item is pinned in memory. This means that we can
> - * unconditionally drop the reference count a transaction holds when the
> - * transaction is completed.
> + * We take a reference to the buffer log item here so that the BLI life cycle
> + * extends at least until the buffer is unpinned via xfs_buf_item_unpin() and
> + * inserted into the AIL.
> + *
> + * We also need to take a reference to the buffer itself as the BLI unpin
> + * processing requires accessing the buffer after the BLI has dropped the final
> + * BLI reference. See xfs_buf_item_unpin() for an explanation.
> + * If unpins race to drop the final BLI reference and only the
> + * BLI owns a reference to the buffer, then the loser of the race can have the
> + * buffer fgreed from under it (e.g. on shutdown). Taking a buffer reference per
> + * pin count ensures the life cycle of the buffer extends for as
> + * long as we hold the buffer pin reference in xfs_buf_item_unpin().
>   */
>  STATIC void
>  xfs_buf_item_pin(
> @@ -470,13 +478,30 @@ xfs_buf_item_pin(
>  
>  	trace_xfs_buf_item_pin(bip);
>  
> +	xfs_buf_hold(bip->bli_buf);
>  	atomic_inc(&bip->bli_refcount);
>  	atomic_inc(&bip->bli_buf->b_pin_count);
>  }
>  
>  /*
> - * This is called to unpin the buffer associated with the buf log item which
> - * was previously pinned with a call to xfs_buf_item_pin().
> + * This is called to unpin the buffer associated with the buf log item which was
> + * previously pinned with a call to xfs_buf_item_pin().  We enter this function
> + * with a buffer pin count, a buffer reference and a BLI reference.
> + *
> + * We must drop the BLI reference before we unpin the buffer because the AIL
> + * doesn't acquire a BLI reference whenever it accesses it. Therefore if the
> + * refcount drops to zero, the bli could still be AIL resident and the buffer
> + * submitted for I/O at any point before we return. This can result in IO
> + * completion freeing the buffer while we are still trying to access it here.
> + * This race condition can also occur in shutdown situations where we abort and
> + * unpin buffers from contexts other that journal IO completion.
> + *
> + * Hence we have to hold a buffer reference per pin count to ensure that the
> + * buffer cannot be freed until we have finished processing the unpin operation.
> + * The reference is taken in xfs_buf_item_pin(), and we must hold it until we
> + * are done processing the buffer state. In the case of an abort (remove =
> + * true) then we re-use the current pin reference as the IO reference we hand
> + * off to IO failure handling.
>   */
>  STATIC void
>  xfs_buf_item_unpin(
> @@ -493,24 +518,18 @@ xfs_buf_item_unpin(
>  
>  	trace_xfs_buf_item_unpin(bip);
>  
> -	/*
> -	 * Drop the bli ref associated with the pin and grab the hold required
> -	 * for the I/O simulation failure in the abort case. We have to do this
> -	 * before the pin count drops because the AIL doesn't acquire a bli
> -	 * reference. Therefore if the refcount drops to zero, the bli could
> -	 * still be AIL resident and the buffer submitted for I/O (and freed on
> -	 * completion) at any point before we return. This can be removed once
> -	 * the AIL properly holds a reference on the bli.
> -	 */
>  	freed = atomic_dec_and_test(&bip->bli_refcount);
> -	if (freed && !stale && remove)
> -		xfs_buf_hold(bp);
>  	if (atomic_dec_and_test(&bp->b_pin_count))
>  		wake_up_all(&bp->b_waiters);
>  
> -	 /* nothing to do but drop the pin count if the bli is active */
> -	if (!freed)
> +	 /*
> +	  * Nothing to do but drop the buffer pin reference if the BLI is
> +	  * still active
> +	  */
> +	if (!freed) {
> +		xfs_buf_rele(bp);
>  		return;
> +	}
>  
>  	if (stale) {
>  		ASSERT(bip->bli_flags & XFS_BLI_STALE);
> @@ -522,6 +541,15 @@ xfs_buf_item_unpin(
>  
>  		trace_xfs_buf_item_unpin_stale(bip);
>  
> +		/*
> +		 * The buffer has been locked and referenced since it was marked
> +		 * stale so we own both lock and reference exclusively here. We
> +		 * do not need the pin reference any more, so drop it now so
> +		 * that we only have one reference to drop once item completion
> +		 * processing is complete.
> +		 */
> +		xfs_buf_rele(bp);
> +
>  		/*
>  		 * If we get called here because of an IO error, we may or may
>  		 * not have the item on the AIL. xfs_trans_ail_delete() will
> @@ -538,16 +566,30 @@ xfs_buf_item_unpin(
>  			ASSERT(bp->b_log_item == NULL);
>  		}
>  		xfs_buf_relse(bp);
> -	} else if (remove) {
> +		return;
> +	}
> +
> +	if (remove) {
>  		/*
> -		 * The buffer must be locked and held by the caller to simulate
> -		 * an async I/O failure. We acquired the hold for this case
> -		 * before the buffer was unpinned.
> +		 * We need to simulate an async IO failures here to ensure that
> +		 * the correct error completion is run on this buffer. This
> +		 * requires a reference to the buffer and for the buffer to be
> +		 * locked. We can safely pass ownership of the pin reference to
> +		 * the IO to ensure that nothing can free the buffer while we
> +		 * wait for the lock and then run the IO failure completion.
>  		 */
>  		xfs_buf_lock(bp);
>  		bp->b_flags |= XBF_ASYNC;
>  		xfs_buf_ioend_fail(bp);
> +		return;
>  	}
> +
> +	/*
> +	 * BLI has no more active references - it will be moved to the AIL to
> +	 * manage the remaining BLI/buffer life cycle. There is nothing left for
> +	 * us to do here so drop the pin reference to the buffer.
> +	 */
> +	xfs_buf_rele(bp);
>  }
>  
>  STATIC uint
> -- 
> 2.40.1
> 



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux