Re: [PATCH] xfs: Introduce permanent async buffer write IO failures

Dave Chinner <david@xxxxxxxxxxxxx> · Fri, 20 Feb 2015 08:34:50 +1100

On Thu, Feb 19, 2015 at 09:28:42AM -0500, Brian Foster wrote:
> On Thu, Feb 19, 2015 at 09:32:20AM +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@xxxxxxxxxx>
> > 
> > Currently we consider all async metadata buffer write failures as
> > transient failures and retry the IO immediately and then again at a
> > a later time. The issue with this is that we can hang forever is the
> > error is not recoverable.
> > 
> > An example of this is a device that has been pulled and so is
> > returning ENODEV to all IO. Attempting to unmount the filesystem
> > with the device in this state will result in a hang waiting for the
> > async metadata IO to be flushed and written successfully. In this
> > case, we need metadata writeback to error out and trigger a shutdown
> > state so the unmount can complete.
> > 
> > Put simply, we need to consider some errors as permanent and
> > unrecoverable rather than transient.
> > 
> > Hence add infrastructure that allows us to classify the async write
> > errors and hence apply different policies to the different failures.
> > In this patch, classify ENODEV as a permanent error but leave
> > all the other types of error handling unchanged.
> > 
> > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
....
> > +	trace_xfs_buf_item_iodone_async(bp, _RET_IP_);
> > +	ASSERT(bp->b_iodone != NULL);
> > +	xfs_buf_ioerror(bp, 0);
> > +
> 
> Otherwise it's async... we clear the error here.

Ah, well spotted, that shouldn't be there.

> 
> >  	/*
> >  	 * If the write was asynchronous then no one will be looking for the
> > -	 * error.  Clear the error state and write the buffer out again.
> > -	 *
> > -	 * XXX: This helps against transient write errors, but we need to find
> > -	 * a way to shut the filesystem down if the writes keep failing.
> > -	 *
> > -	 * In practice we'll shut the filesystem down soon as non-transient
> > -	 * errors tend to affect the whole device and a failing log write
> > -	 * will make us give up.  But we really ought to do better here.
> > +	 * error.  If this is the first failure, clear the error state and write
> > +	 * the buffer out again.
> >  	 */
> > -	if (XFS_BUF_ISASYNC(bp)) {
> > -		ASSERT(bp->b_iodone != NULL);
> > -
> > -		trace_xfs_buf_item_iodone_async(bp, _RET_IP_);
> > -
> > -		xfs_buf_ioerror(bp, 0); /* errno of 0 unsets the flag */
> > -
> > -		if (!(bp->b_flags & (XBF_STALE|XBF_WRITE_FAIL))) {
> > -			bp->b_flags |= XBF_WRITE | XBF_ASYNC |
> > -				       XBF_DONE | XBF_WRITE_FAIL;
> > -			xfs_buf_submit(bp);
> > -		} else {
> > -			xfs_buf_relse(bp);
> > -		}
> > -
> > -		return;
> > +	if (!(bp->b_flags & (XBF_STALE|XBF_WRITE_FAIL))) {
> > +		bp->b_flags |= XBF_WRITE | XBF_ASYNC |
> > +			       XBF_DONE | XBF_WRITE_FAIL;
> > +		xfs_buf_submit(bp);
> > +		return true;
> 
> ... and retry once via the XBF_WRITE_FAIL flag.

It should be in this branch.

> 
> >  	}
> >  
> >  	/*
> > -	 * If the write of the buffer was synchronous, we want to make
> > -	 * sure to return the error to the caller of xfs_bwrite().
> > +	 * Repeated failure on an async write.
> > +	 *
> > +	 * Now we need to classify the error and determine the correct action to
> > +	 * take. Different types of errors will require different processing,
> > +	 * but make the default classification "transient" so that we keep
> > +	 * retrying in these cases.  If this is the wrog thing to do, then we'll
> > +	 * get reports that the filesystem hung in the presence of that type of
> > +	 * error and we can take appropriate action to remedy the issue for that
> > +	 * type of error.
> >  	 */
> > +	switch (bp->b_error) {
> 
> When is this ever true if we clear the error above?
> 
> > +	case ENODEV:
> > +		/* permanent error, no recovery possible */
> > +		xfs_force_shutdown(mp, SHUTDOWN_META_IO_ERROR);
> > +		goto out_stale;
> 
> If we can't retry, do the error-specific processing. Is doing this after
> a retry really the intended behavior? E.g., if ENODEV is permanent, the
> retry seems pointless.

I'm leaving the "retry once" logic there on purpose - as a just in
case we had some other transient problem. It doesn't hurt to retry
the IO if there was an error, and if we get another error we can be
pretty certain it wasn't a transient problem.

> > +	default:
> > +		/* do nothing, higher layers will retry */
> > +		break;
> > +	}
> > +

bp->b_error should be cleared here....

> > +	xfs_buf_relse(bp);
> > +	return true;
> > +
> > +out_stale:
> >  	xfs_buf_stale(bp);
> >  	XFS_BUF_DONE(bp);
> > -
> >  	trace_xfs_buf_error_relse(bp, _RET_IP_);
> > +	return false;
> > +}
> > +
> > +/*
> > + * This is the iodone() function for buffers which have had callbacks attached
> > + * to them by xfs_buf_attach_iodone(). We need to iterate the items on the
> > + * callback list, mark the buffer as having no more callbacks and then push the
> > + * buffer through IO completion processing.
> > + */
> > +void
> > +xfs_buf_iodone_callbacks(
> > +	struct xfs_buf		*bp)
> > +{
> > +	/*
> > +	 * If there is an error, process it. Some errors require us
> > +	 * to run callbacks after failure processing is done so we
> > +	 * detect that and take appropriate action.
> > +	 */
> 
> 80 columns?

*nod*

That's what I get for hand editing patches. :)

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs