Re: [PATCH 5/7] xfs: add configuration of error failure speed

Dave Chinner <david@xxxxxxxxxxxxx> · Fri, 6 May 2016 10:04:33 +1000

On Wed, May 04, 2016 at 05:43:18PM +0200, Carlos Maiolino wrote:
> On reception of an error, we can fail immediately, perform some
> bound amount of retries or retry indefinitely. The current behaviour
> we have is to retry forever.
> 
> However, we'd like the ability to choose how long the filesystem should try
> after an error, it can either fail immediately, retry a few times, or retry
> forever. This is implemented by using max_retries sysfs attribute, to hold the
> amount of times we allow the filesystem to retry after an error. Being -1 a
> special case where the filesystem will retry indefinitely.
> 
> Add both a maximum retry count and a retry timeout so that we can bound by
> time and/or physical IO attempts.
> 
> Finally, plumb these into xfs_buf_iodone error processing so that
> the error behaviour follows the selected configuration.
> 
> Changelog:
> 
> V3:
> 	- In xfs_buf_iodone_callback_error, use max_retries to decide how long
> 	  the filesystem should retry on errors instead of XFS_ERR_FAIL enums
> 	  and fail_speed
> 
> 	- Remove all code implementing fail_speed attribute from the original
> 	  patch
> 		-- failure_speed_show/store attributes function implementation
> 		-- max_retries_store() now accepts values from -1 up to INT_MAX
> 
> 	- retry_timeout_seconds_show() print fixed:
> 		-- jiffies_to_msecs() should be divided by MSEC_PER_SEC
> 		-- trailing whitespace removed

Where's XFS_ERR_RETRY_FOREVER?

> @@ -1095,8 +1098,12 @@ xfs_buf_iodone_callback_error(
>  	 * Repeated failure on an async write. Take action according to the
>  	 * error configuration we have been set up to use.
>  	 */
> -	if (!cfg->max_retries)
> -		goto permanent_error;
> +	if ((cfg->max_retries >= 0) &&
> +	    (++bp->b_retries > cfg->max_retries))
> +			goto permanent_error;

I suggested:

        if (cfg->max_retries != XFS_ERR_RETRY_FOREVER &&
            ++bp->b_retries > cfg->max_retries)
                goto permanent_error;

so that we document that there is a "retry forever" case being
handled here. I really don't like magic "-1", ">= 0" or other
implicit comparisions that don't document that it is valid to retry
forever in these cases.

> +	if (cfg->retry_timeout &&
> +	    time_after(jiffies, cfg->retry_timeout + bp->b_first_retry_time))
> +			goto permanent_error;
>  
>  	/* still a transient error, higher layers will retry */
>  	xfs_buf_ioerror(bp, 0);
> @@ -1139,6 +1146,7 @@ xfs_buf_iodone_callbacks(
>  	 * retry state here in preparation for the next error that may occur.
>  	 */
>  	bp->b_last_error = 0;
> +	bp->b_retries = 0;
>  
>  	xfs_buf_do_callbacks(bp);
>  	bp->b_fspriv = NULL;
> diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
> index 0c5a976..0382140 100644
> --- a/fs/xfs/xfs_mount.h
> +++ b/fs/xfs/xfs_mount.h
> @@ -54,7 +54,8 @@ enum {
>  
>  struct xfs_error_cfg {
>  	struct xfs_kobj	kobj;
> -	int		max_retries;
> +	int		max_retries;	/* -1 = retry forever */

as per my last review, remove the comment, add XFS_ERR_RETRY_FOREVER
to document that "-1 = retry forever" and use that in the code so
it's explicit that the code is intended to handle this case.

Cheers,

Dave.

-- 
Dave Chinner
david@xxxxxxxxxxxxx

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs