Re: [PATCH] xfs_buf_iodone_callbacks to force shutdown & resubmit buf in case of permanent failure

Dave Chinner <david@xxxxxxxxxxxxx> · Wed, 27 Apr 2016 08:46:12 +1000

On Tue, Apr 26, 2016 at 07:31:59PM +0530, Shyam Kaushik wrote:
> When XFS underlying disk fails, it could take several milliseconds for the
> FS to be marked
> shutdown. xfs_buf_iodone_callbacks() retries buf upon first failure by
> submitting it once
> again. But if the buf fails 2nd time before FS is marked for shutdown, it
> just releases the
> buf with xfs_buf_relse(). This is flawed that nobody is releasing the
> XFS_IFLOCK on the inode.
> Because of this AIL tasks repeated effort to xfs_inode_item_push() will
> see that xfs_iflock()
> cannot be acquired. This blocks XFS from being unmounted as
> xfs_ail_push_all_sync() will keep
> looping without progress.

Patch formatting first: commit messages should wrap at 68-72
columns...

> Fix this by marking the FS for shutdown if we have a permanent failure &
> resubmit the buf.
> xfs_buf_submit() will see FS marked for shutdown & invoke the callback
> which releases
> XFS_IFLOCK.
> 
> diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
> index 1a6c9b9..6f73ee0 100644
> --- a/fs/xfs/xfs_buf_item.c
> +++ b/fs/xfs/xfs_buf_item.c
> @@ -1100,7 +1100,12 @@ xfs_buf_iodone_callbacks(
>                                        XBF_DONE | XBF_WRITE_FAIL;
>                         xfs_buf_submit(bp);
>                 } else {
> -                       xfs_buf_relse(bp);
> +                       /*
> +                       * if we have the buf fail 2nd time, force a FS
> shutdown & resubmit
> +                       * the buf for it to be failed back immediately
> +                       */
> +                       xfs_force_shutdown(mp, SHUTDOWN_LOG_IO_ERROR);
> +                       xfs_buf_submit(bp);
>                 }
> 
>                 return;

... and patches should not be wrapped or have tabs converted to
spaces. See Documentation/SubmittingPatches.

Ok, so yes, it would seem there is a bug here, but I don't think
this is correct solution: this will shut down the filesystem
immediately on inode write IO errors. As i've said *many* times
before: this is might seem like a fix, but it is incorrect behaviour
as it does not give transient failures (e.g. multipath failover
causing timeouts) a chance to be resolved before shutting down the
filesystem.

That's the reason for all the retry behaviour on XFS metadata - the
system can continue to run logging new changes even if it can't
write back the changes immediately. As long as the log writes
continue to work, the filesystem can continue to function correctly.
Hence shutting down the filesystem immediately on any metadata write
error (other than a journal write) is premature and will lead to
spurious errors causing shutdowns rather than just logging a
warning.

We are in the process of making this error behaviour configurable -
Carlos is going to finish off the patchset I originally wrote to do
this, so the "shutdown immediately" option will be avaialable
through that set of interfaces. We'll still need to fix the unlock
case for retry here, though...

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs