Re: [PATCH 06/11] xfs: don't leave EFIs on AIL on mount failure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Aug 06, 2015 at 01:44:27PM -0400, Brian Foster wrote:
> Log recovery occurs in two phases at mount time. In the first phase,
> EFIs and EFDs are processed and potentially cancelled out. EFIs without
> EFD objects are inserted into the AIL for processing and recovery in the
> second phase. xfs_mountfs() runs various other operations between the
> phases and is thus subject to failure. If failure occurs after the first
> phase but before the second, pending EFIs sit on the AIL, pin it and
> cause the mount to hang.
> 
> Update the mount sequence to ensure that pending EFIs are cancelled in
> the event of failure. Add a recovery cancellation mechanism to iterate
> the AIL and cancel all EFI items when requested. Plumb cancellation
> support through the log mount finish helper and update xfs_mountfs() to
> invoke cancellation in the event of failure after recovery has started.
> 
> Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx>
> ---
>  fs/xfs/xfs_log.c         | 16 +++++++++++-----
>  fs/xfs/xfs_log.h         |  2 +-
>  fs/xfs/xfs_log_priv.h    |  2 ++
>  fs/xfs/xfs_log_recover.c | 41 ++++++++++++++++++++++++++++++++++++-----
>  fs/xfs/xfs_mount.c       | 35 +++++++++++++++++++++++------------
>  5 files changed, 73 insertions(+), 23 deletions(-)
> 
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index 6b5a84a..522286c 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -740,19 +740,25 @@ out:
>   * it.
>   */
>  int
> -xfs_log_mount_finish(xfs_mount_t *mp)
> +xfs_log_mount_finish(
> +	struct xfs_mount	*mp,
> +	bool			cancel)
>  {
>  	int	error = 0;
>  
> -	if (!(mp->m_flags & XFS_MOUNT_NORECOVERY)) {
> +	if (mp->m_flags & XFS_MOUNT_NORECOVERY) {
> +		ASSERT(mp->m_flags & XFS_MOUNT_RDONLY);
> +		return 0;
> +	}
> +
> +	if (cancel) {
> +		error = xlog_recover_cancel(mp->m_log);
> +	} else {
>  		error = xlog_recover_finish(mp->m_log);
>  		if (!error)
>  			xfs_log_work_queue(mp);
> -	} else {
> -		ASSERT(mp->m_flags & XFS_MOUNT_RDONLY);
>  	}
>  
> -
>  	return error;
>  }

As I mentioned on #xfs, I don't think the error/cancel path needs to
go through this function. Add a new function xfs_log_mount_cancel(),
and put the contents of xlog_recover_cancel() inside it, along with
the log unmount function...

(General rule: prefix "xfs_log_....()" is for functions that call
into the log from outside, passing a struct xfs_mount. prefix
"xlog_...()" is for internal calls, passing a struct xlog.)

> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> index a74ff68..a7ba078 100644
> --- a/fs/xfs/xfs_log_recover.c
> +++ b/fs/xfs/xfs_log_recover.c
> @@ -3823,10 +3823,11 @@ abort_error:
>   */
>  STATIC int
>  xlog_recover_process_efis(
> -	struct xlog	*log)
> +	struct xlog		*log,
> +	bool			cancel)
>  {
> -	xfs_log_item_t		*lip;
> -	xfs_efi_log_item_t	*efip;
> +	struct xfs_log_item	*lip;
> +	struct xfs_efi_log_item	*efip;
>  	int			error = 0;
>  	struct xfs_ail_cursor	cur;
>  	struct xfs_ail		*ailp;
> @@ -3847,10 +3848,24 @@ xlog_recover_process_efis(
>  			break;
>  		}
>  
> +		efip = (struct xfs_efi_log_item *) lip;
> +
> +		/*
> +		 * A cancel occurs when the mount has failed and we're bailing
> +		 * out. Release all pending EFIs so they don't pin the AIL.
> +		 */
> +		if (cancel) {
> +			spin_unlock(&ailp->xa_lock);
> +			xfs_efi_release(efip);
> +			spin_lock(&ailp->xa_lock);
> +
> +			lip = xfs_trans_ail_cursor_next(ailp, &cur);
> +			continue;
> +		}

This might be better to separate into a xlog_recover_cancel_efis()
function because this is much simpler than the rest of the loop.
It may be more code, but its simpler to read and understand.

> +
>  		/*
>  		 * Skip EFIs that we've already processed.
>  		 */
> -		efip = (xfs_efi_log_item_t *)lip;
>  		if (test_bit(XFS_EFI_RECOVERED, &efip->efi_flags)) {
>  			lip = xfs_trans_ail_cursor_next(ailp, &cur);
>  			continue;
> @@ -4617,8 +4632,13 @@ xlog_recover_finish(
>  	 */
>  	if (log->l_flags & XLOG_RECOVERY_NEEDED) {
>  		int	error;
> -		error = xlog_recover_process_efis(log);
> +		error = xlog_recover_process_efis(log, false);
>  		if (error) {
> +			/*
> +			 * We could still have pending EFIs. Cancel them so we
> +			 * don't hang...
> +			 */
> +			xlog_recover_process_efis(log, true);

Not actually necessary. We don't clear the XLOG_RECOVERY_NEEDED
flag, so when the error stack is run in xfs_mountfs(), we will
call xfs_log_mount_cancel(), it will see the XLOG_RECOVERY_NEEDED
and run xlog_recover_cancel_efis()....

>  			xfs_alert(log->l_mp, "Failed to recover EFIs");
>  			return error;
>  		}
> @@ -4644,6 +4664,17 @@ xlog_recover_finish(
>  	return 0;
>  }
>  
> +int
> +xlog_recover_cancel(
> +	struct xlog	*log)
> +{
> +	int		error = 0;
> +
> +	if (log->l_flags & XLOG_RECOVERY_NEEDED)
> +		error = xlog_recover_process_efis(log, true);
> +
> +	return error;
> +}

void
xfs_log_mount_cancel(
	struct xfs_mount	*mp)
{
	int		error = 0;

	if (log->l_flags & XLOG_RECOVERY_NEEDED)
		xlog_recover_cancel_efis(log);

	xfs_log_unmount(mp);
	return error;
}

> @@ -799,7 +800,10 @@ xfs_mountfs(
>  	}
>  
>  	/*
> -	 * log's mount-time initialization. Perform 1st part recovery if needed
> +	 * Log's mount-time initialization. The first part of recovery can place
> +	 * some items on the AIL, to be handled when recovery is finished. Set
> +	 * the cancel flag to ensure that the recovery is cancelled should we
> +	 * fail before then.
>  	 */
>  	error = xfs_log_mount(mp, mp->m_logdev_targp,
>  			      XFS_FSB_TO_DADDR(mp, sbp->sb_logstart),
> @@ -808,6 +812,7 @@ xfs_mountfs(
>  		xfs_warn(mp, "log mount failed");
>  		goto out_fail_wait;
>  	}
> +	log_mount_cancel = true;

At this point, XLOG_RECOVERY_NEEDED will be set if EFI processing is
required, so a variable to track this here is not needed.

> @@ -910,11 +915,15 @@ xfs_mountfs(
>  	}
>  
>  	/*
> -	 * Finish recovering the file system.  This part needed to be
> -	 * delayed until after the root and real-time bitmap inodes
> -	 * were consistently read in.
> +	 * Finish recovering the file system.  This part needed to be delayed
> +	 * until after the root and real-time bitmap inodes were consistently
> +	 * read in.
> +	 *
> +	 * Reset the cancel flag as the finish cleans up after itself on
> +	 * failure.
>  	 */
> -	error = xfs_log_mount_finish(mp);
> +	log_mount_cancel = false;
> +	error = xfs_log_mount_finish(mp, false);
>  	if (error) {
>  		xfs_warn(mp, "log mount finish failed");
>  		goto out_rtunmount;

xfs_log_mount_finish() will clear the XLOG_RECOVERY_NEEDED. On
error, we jump into the error stack above the call to
xfs_log_mount_cancel(), so it will handle the cleanup....

> @@ -956,6 +965,8 @@ xfs_mountfs(
>   out_rele_rip:
>  	IRELE(rip);
>   out_log_dealloc:
> +	if (log_mount_cancel)
> +		xfs_log_mount_finish(mp, true);
>  	xfs_log_unmount(mp);

And this just becomes a call to xfs_log_mount_cancel().

I hope this is a bit clearer that mud ;)

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs



[Index of Archives]     [Linux XFS Devel]     [Linux Filesystem Development]     [Filesystem Testing]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux