Re: [PATCH 8/9] xfs: don't run shutdown callbacks on active iclogs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jul 14, 2021 at 01:19:57PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@xxxxxxxxxx>
> 
> When the log is shutdown, it currently walks all the iclogs and runs
> callbacks that are attached to the iclogs, regardless of whether the
> iclog is queued for IO completion or not. This creates a problem for
> contexts attaching callbacks to iclogs in that a racing shutdown can
> run the callbacks even before the attaching context has finished
> processing the iclog and releasing it for IO submission.
> 
> If the callback processing of the iclog frees the structure that is
> attached to the iclog, then this leads to an UAF scenario that can
> only be protected against by holding the icloglock from the point
> callbacks are attached through to the release of the iclog. While we
> currently do this, it is not practical or sustainable.
> 
> Hence we need to make shutdown processing the responsibility of the
> context that holds active references to the iclog. We know that the
> contexts attaching callbacks to the iclog must have active
> references to the iclog, and that means they must be in either
> ACTIVE or WANT_SYNC states. xlog_state_do_callback() will skip over
> iclogs in these states -except- when the log is shut down.
> 
> xlog_state_do_callback() checks the state of the iclogs while
> holding the icloglock, therefore the reference count/state change
> that occurs in xlog_state_release_iclog() after the callbacks are
> atomic w.r.t. shutdown processing.
> 
> We can't push the responsibility of callback cleanup onto the CIL
> context because we can have ACTIVE iclogs that have callbacks
> attached that have already been released. Hence we really need to
> internalise the cleanup of callbacks into xlog_state_release_iclog()
> processing.
> 
> Indeed, we already have that internalisation via:
> 
> xlog_state_release_iclog
>   drop last reference
>     ->SYNCING
>   xlog_sync
>     xlog_write_iclog
>       if (log_is_shutdown)
>         xlog_state_done_syncing()
> 	  xlog_state_do_callback()
> 	    <process shutdown on iclog that is now in SYNCING state>
> 
> The problem is that xlog_state_release_iclog() aborts before doing
> anything if the log is already shut down. It assumes that the
> callbacks have already been cleaned up, and it doesn't need to do
> any cleanup.
> 
> Hence the fix is to remove the xlog_is_shutdown() check from
> xlog_state_release_iclog() so that reference counts are correctly
> released from the iclogs, and when the reference count is zero we
> always transition to SYNCING if the log is shut down. Hence we'll
> always enter the xlog_sync() path in a shutdown and eventually end
> up erroring out the iclog IO and running xlog_state_do_callback() to
> process the callbacks attached to the iclog.
> 
> This allows us to stop processing referenced ACTIVE/WANT_SYNC iclogs
> directly in the shutdown code, and in doing so gets rid of the UAF
> vector that currently exists. This then decouples the adding of
> callbacks to the iclogs from xlog_state_release_iclog() as we
> guarantee that xlog_state_release_iclog() will process the callbacks
> if the log has been shut down before xlog_state_release_iclog() has
> been called.
> 
> Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
> Reviewed-by: Christoph Hellwig <hch@xxxxxx>

Seems reasonable...
Reviewed-by: Darrick J. Wong <djwong@xxxxxxxxxx>

--D

> ---
>  fs/xfs/xfs_log.c     | 43 +++++++++++++++++++++++++++++++++++++------
>  fs/xfs/xfs_log_cil.c | 15 +++++++--------
>  2 files changed, 44 insertions(+), 14 deletions(-)
> 
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index 4d72d9efed7c..01c20b42b2fc 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -41,6 +41,8 @@ xlog_dealloc_log(
>  /* local state machine functions */
>  STATIC void xlog_state_done_syncing(
>  	struct xlog_in_core	*iclog);
> +STATIC void xlog_state_do_callback(
> +	struct xlog		*log);
>  STATIC int
>  xlog_state_get_iclog_space(
>  	struct xlog		*log,
> @@ -492,6 +494,11 @@ xfs_log_reserve(
>   * space waiters so they can process the newly set shutdown state. We really
>   * don't care what order we process callbacks here because the log is shut down
>   * and so state cannot change on disk anymore.
> + *
> + * We avoid processing actively referenced iclogs so that we don't run callbacks
> + * while the iclog owner might still be preparing the iclog for IO submssion.
> + * These will be caught by xlog_state_iclog_release() and call this function
> + * again to process any callbacks that may have been added to that iclog.
>   */
>  static void
>  xlog_state_shutdown_callbacks(
> @@ -503,7 +510,12 @@ xlog_state_shutdown_callbacks(
>  	spin_lock(&log->l_icloglock);
>  	iclog = log->l_iclog;
>  	do {
> +		if (atomic_read(&iclog->ic_refcnt)) {
> +			/* Reference holder will re-run iclog callbacks. */
> +			continue;
> +		}
>  		list_splice_init(&iclog->ic_callbacks, &cb_list);
> +		wake_up_all(&iclog->ic_write_wait);
>  		wake_up_all(&iclog->ic_force_wait);
>  	} while ((iclog = iclog->ic_next) != log->l_iclog);
>  
> @@ -514,7 +526,7 @@ xlog_state_shutdown_callbacks(
>  }
>  
>  static bool
> -__xlog_state_release_iclog(
> +xlog_state_want_sync(
>  	struct xlog		*log,
>  	struct xlog_in_core	*iclog)
>  {
> @@ -537,27 +549,46 @@ __xlog_state_release_iclog(
>  }
>  
>  /*
> - * Flush iclog to disk if this is the last reference to the given iclog and the
> - * it is in the WANT_SYNC state.
> + * Release the active reference to the iclog. If this is the last reference to
> + * the iclog being dropped, check if the caller wants to be synced to disk and
> + * initiate IO submission. If the log has been shut down, then we need to run
> + * callback processing on this iclog as shutdown callback processing skips
> + * actively referenced iclogs.
>   */
>  int
>  xlog_state_release_iclog(
>  	struct xlog		*log,
>  	struct xlog_in_core	*iclog)
> +		__releases(&log->l_icloglock)
> +		__acquires(&log->l_icloglock)
>  {
>  	lockdep_assert_held(&log->l_icloglock);
>  
>  	trace_xlog_iclog_release(iclog, _RET_IP_);
> -	if (xlog_is_shutdown(log))
> +	if (!atomic_dec_and_test(&iclog->ic_refcnt))
> +		goto out_check_shutdown;
> +
> +	if (xlog_is_shutdown(log)) {
> +		/*
> +		 * No more references to this iclog, so process the pending
> +		 * iclog callbacks that were waiting on the release of this
> +		 * iclog.
> +		 */
> +		spin_unlock(&log->l_icloglock);
> +		xlog_state_shutdown_callbacks(log);
> +		spin_lock(&log->l_icloglock);
>  		return -EIO;
> +	}
>  
> -	if (atomic_dec_and_test(&iclog->ic_refcnt) &&
> -	    __xlog_state_release_iclog(log, iclog)) {
> +	if (xlog_state_want_sync(log, iclog)) {
>  		spin_unlock(&log->l_icloglock);
>  		xlog_sync(log, iclog);
>  		spin_lock(&log->l_icloglock);
>  	}
>  
> +out_check_shutdown:
> +	if (xlog_is_shutdown(log))
> +		return -EIO;
>  	return 0;
>  }
>  
> diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> index 2c9d9bcd25cb..62967449fe8c 100644
> --- a/fs/xfs/xfs_log_cil.c
> +++ b/fs/xfs/xfs_log_cil.c
> @@ -874,11 +874,10 @@ xlog_cil_push_work(
>  	xfs_log_ticket_ungrant(log, tic);
>  
>  	/*
> -	 * Once we attach the ctx to the iclog, a shutdown can process the
> -	 * iclog, run the callbacks and free the ctx. The only thing preventing
> -	 * this potential UAF situation here is that we are holding the
> -	 * icloglock. Hence we cannot access the ctx once we have attached the
> -	 * callbacks and dropped the icloglock.
> +	 * Once we attach the ctx to the iclog, it is effectively owned by the
> +	 * iclog and we can only use it while we still have an active reference
> +	 * to the iclog. i.e. once we call xlog_state_release_iclog() we can no
> +	 * longer safely reference the ctx.
>  	 */
>  	spin_lock(&log->l_icloglock);
>  	if (xlog_is_shutdown(log)) {
> @@ -910,9 +909,6 @@ xlog_cil_push_work(
>  	 * wakeup until this commit_iclog is written to disk.  Hence we use the
>  	 * iclog header lsn and compare it to the commit lsn to determine if we
>  	 * need to wait on iclogs or not.
> -	 *
> -	 * NOTE: It is not safe to reference the ctx after this check as we drop
> -	 * the icloglock if we have to wait for completion of other iclogs.
>  	 */
>  	if (ctx->start_lsn != commit_lsn) {
>  		xfs_lsn_t	plsn;
> @@ -942,6 +938,9 @@ xlog_cil_push_work(
>  	 */
>  	commit_iclog->ic_flags |= XLOG_ICL_NEED_FUA;
>  	xlog_state_release_iclog(log, commit_iclog);
> +
> +	/* Not safe to reference ctx now! */
> +
>  	spin_unlock(&log->l_icloglock);
>  	return;
>  
> -- 
> 2.31.1
> 



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux