On Fri, May 06, 2016 at 10:04:33AM +1000, Dave Chinner wrote: > On Wed, May 04, 2016 at 05:43:18PM +0200, Carlos Maiolino wrote: > > On reception of an error, we can fail immediately, perform some > > bound amount of retries or retry indefinitely. The current behaviour > > we have is to retry forever. > > > > However, we'd like the ability to choose how long the filesystem should try > > after an error, it can either fail immediately, retry a few times, or retry > > forever. This is implemented by using max_retries sysfs attribute, to hold the > > amount of times we allow the filesystem to retry after an error. Being -1 a > > special case where the filesystem will retry indefinitely. > > > > Add both a maximum retry count and a retry timeout so that we can bound by > > time and/or physical IO attempts. > > > > Finally, plumb these into xfs_buf_iodone error processing so that > > the error behaviour follows the selected configuration. > > > > Changelog: > > > > V3: > > - In xfs_buf_iodone_callback_error, use max_retries to decide how long > > the filesystem should retry on errors instead of XFS_ERR_FAIL enums > > and fail_speed > > > > - Remove all code implementing fail_speed attribute from the original > > patch > > -- failure_speed_show/store attributes function implementation > > -- max_retries_store() now accepts values from -1 up to INT_MAX > > > > - retry_timeout_seconds_show() print fixed: > > -- jiffies_to_msecs() should be divided by MSEC_PER_SEC > > -- trailing whitespace removed > > Where's XFS_ERR_RETRY_FOREVER? Hi, I didn't understand you literally meant to have XFS_ERR_RETRY_FOREVER macro implemented, sorry about that, I'll implement it, add fail_at_unmount to <dev>/error and re-send the patchset > > > @@ -1095,8 +1098,12 @@ xfs_buf_iodone_callback_error( > > * Repeated failure on an async write. Take action according to the > > * error configuration we have been set up to use. > > */ > > - if (!cfg->max_retries) > > - goto permanent_error; > > + if ((cfg->max_retries >= 0) && > > + (++bp->b_retries > cfg->max_retries)) > > + goto permanent_error; > > I suggested: > > if (cfg->max_retries != XFS_ERR_RETRY_FOREVER && > ++bp->b_retries > cfg->max_retries) > goto permanent_error; > > so that we document that there is a "retry forever" case being > handled here. I really don't like magic "-1", ">= 0" or other > implicit comparisions that don't document that it is valid to retry > forever in these cases. > > > + if (cfg->retry_timeout && > > + time_after(jiffies, cfg->retry_timeout + bp->b_first_retry_time)) > > + goto permanent_error; > > > > /* still a transient error, higher layers will retry */ > > xfs_buf_ioerror(bp, 0); > > @@ -1139,6 +1146,7 @@ xfs_buf_iodone_callbacks( > > * retry state here in preparation for the next error that may occur. > > */ > > bp->b_last_error = 0; > > + bp->b_retries = 0; > > > > xfs_buf_do_callbacks(bp); > > bp->b_fspriv = NULL; > > diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h > > index 0c5a976..0382140 100644 > > --- a/fs/xfs/xfs_mount.h > > +++ b/fs/xfs/xfs_mount.h > > @@ -54,7 +54,8 @@ enum { > > > > struct xfs_error_cfg { > > struct xfs_kobj kobj; > > - int max_retries; > > + int max_retries; /* -1 = retry forever */ > > as per my last review, remove the comment, add XFS_ERR_RETRY_FOREVER > to document that "-1 = retry forever" and use that in the code so > it's explicit that the code is intended to handle this case. > > Cheers, > > Dave. > > -- > Dave Chinner > david@xxxxxxxxxxxxx > > _______________________________________________ > xfs mailing list > xfs@xxxxxxxxxxx > http://oss.sgi.com/mailman/listinfo/xfs -- Carlos _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs