Re: [PATCH 5/7] xfs: add configuration of error failure speed

Carlos Maiolino <cmaiolino@xxxxxxxxxx> · Fri, 6 May 2016 12:59:44 +0200



On Fri, May 06, 2016 at 10:04:33AM +1000, Dave Chinner wrote:
> On Wed, May 04, 2016 at 05:43:18PM +0200, Carlos Maiolino wrote:
> > On reception of an error, we can fail immediately, perform some
> > bound amount of retries or retry indefinitely. The current behaviour
> > we have is to retry forever.
> > 
> > However, we'd like the ability to choose how long the filesystem should try
> > after an error, it can either fail immediately, retry a few times, or retry
> > forever. This is implemented by using max_retries sysfs attribute, to hold the
> > amount of times we allow the filesystem to retry after an error. Being -1 a
> > special case where the filesystem will retry indefinitely.
> > 
> > Add both a maximum retry count and a retry timeout so that we can bound by
> > time and/or physical IO attempts.
> > 
> > Finally, plumb these into xfs_buf_iodone error processing so that
> > the error behaviour follows the selected configuration.
> > 
> > Changelog:
> > 
> > V3:
> > 	- In xfs_buf_iodone_callback_error, use max_retries to decide how long
> > 	  the filesystem should retry on errors instead of XFS_ERR_FAIL enums
> > 	  and fail_speed
> > 
> > 	- Remove all code implementing fail_speed attribute from the original
> > 	  patch
> > 		-- failure_speed_show/store attributes function implementation
> > 		-- max_retries_store() now accepts values from -1 up to INT_MAX
> > 
> > 	- retry_timeout_seconds_show() print fixed:
> > 		-- jiffies_to_msecs() should be divided by MSEC_PER_SEC
> > 		-- trailing whitespace removed
> 
> Where's XFS_ERR_RETRY_FOREVER?

Hi, I didn't understand you literally meant to have XFS_ERR_RETRY_FOREVER macro
implemented, sorry about that, I'll implement it, add fail_at_unmount to
<dev>/error and re-send the patchset

> 
> > @@ -1095,8 +1098,12 @@ xfs_buf_iodone_callback_error(
> >  	 * Repeated failure on an async write. Take action according to the
> >  	 * error configuration we have been set up to use.
> >  	 */
> > -	if (!cfg->max_retries)
> > -		goto permanent_error;
> > +	if ((cfg->max_retries >= 0) &&
> > +	    (++bp->b_retries > cfg->max_retries))
> > +			goto permanent_error;
> 
> I suggested:
> 
>         if (cfg->max_retries != XFS_ERR_RETRY_FOREVER &&
>             ++bp->b_retries > cfg->max_retries)
>                 goto permanent_error;
> 
> so that we document that there is a "retry forever" case being
> handled here. I really don't like magic "-1", ">= 0" or other
> implicit comparisions that don't document that it is valid to retry
> forever in these cases.
> 
> > +	if (cfg->retry_timeout &&
> > +	    time_after(jiffies, cfg->retry_timeout + bp->b_first_retry_time))
> > +			goto permanent_error;
> >  
> >  	/* still a transient error, higher layers will retry */
> >  	xfs_buf_ioerror(bp, 0);
> > @@ -1139,6 +1146,7 @@ xfs_buf_iodone_callbacks(
> >  	 * retry state here in preparation for the next error that may occur.
> >  	 */
> >  	bp->b_last_error = 0;
> > +	bp->b_retries = 0;
> >  
> >  	xfs_buf_do_callbacks(bp);
> >  	bp->b_fspriv = NULL;
> > diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
> > index 0c5a976..0382140 100644
> > --- a/fs/xfs/xfs_mount.h
> > +++ b/fs/xfs/xfs_mount.h
> > @@ -54,7 +54,8 @@ enum {
> >  
> >  struct xfs_error_cfg {
> >  	struct xfs_kobj	kobj;
> > -	int		max_retries;
> > +	int		max_retries;	/* -1 = retry forever */
> 
> as per my last review, remove the comment, add XFS_ERR_RETRY_FOREVER
> to document that "-1 = retry forever" and use that in the code so
> it's explicit that the code is intended to handle this case.
> 
> Cheers,
> 
> Dave.
> 
> -- 
> Dave Chinner
> david@xxxxxxxxxxxxx
> 
> _______________________________________________
> xfs mailing list
> xfs@xxxxxxxxxxx
> http://oss.sgi.com/mailman/listinfo/xfs

-- 
Carlos

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs