On Wed, May 04, 2016 at 05:43:18PM +0200, Carlos Maiolino wrote: > On reception of an error, we can fail immediately, perform some > bound amount of retries or retry indefinitely. The current behaviour > we have is to retry forever. > > However, we'd like the ability to choose how long the filesystem should try > after an error, it can either fail immediately, retry a few times, or retry > forever. This is implemented by using max_retries sysfs attribute, to hold the > amount of times we allow the filesystem to retry after an error. Being -1 a > special case where the filesystem will retry indefinitely. > > Add both a maximum retry count and a retry timeout so that we can bound by > time and/or physical IO attempts. > > Finally, plumb these into xfs_buf_iodone error processing so that > the error behaviour follows the selected configuration. > > Changelog: > > V3: > - In xfs_buf_iodone_callback_error, use max_retries to decide how long > the filesystem should retry on errors instead of XFS_ERR_FAIL enums > and fail_speed > > - Remove all code implementing fail_speed attribute from the original > patch > -- failure_speed_show/store attributes function implementation > -- max_retries_store() now accepts values from -1 up to INT_MAX > > - retry_timeout_seconds_show() print fixed: > -- jiffies_to_msecs() should be divided by MSEC_PER_SEC > -- trailing whitespace removed Where's XFS_ERR_RETRY_FOREVER? > @@ -1095,8 +1098,12 @@ xfs_buf_iodone_callback_error( > * Repeated failure on an async write. Take action according to the > * error configuration we have been set up to use. > */ > - if (!cfg->max_retries) > - goto permanent_error; > + if ((cfg->max_retries >= 0) && > + (++bp->b_retries > cfg->max_retries)) > + goto permanent_error; I suggested: if (cfg->max_retries != XFS_ERR_RETRY_FOREVER && ++bp->b_retries > cfg->max_retries) goto permanent_error; so that we document that there is a "retry forever" case being handled here. I really don't like magic "-1", ">= 0" or other implicit comparisions that don't document that it is valid to retry forever in these cases. > + if (cfg->retry_timeout && > + time_after(jiffies, cfg->retry_timeout + bp->b_first_retry_time)) > + goto permanent_error; > > /* still a transient error, higher layers will retry */ > xfs_buf_ioerror(bp, 0); > @@ -1139,6 +1146,7 @@ xfs_buf_iodone_callbacks( > * retry state here in preparation for the next error that may occur. > */ > bp->b_last_error = 0; > + bp->b_retries = 0; > > xfs_buf_do_callbacks(bp); > bp->b_fspriv = NULL; > diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h > index 0c5a976..0382140 100644 > --- a/fs/xfs/xfs_mount.h > +++ b/fs/xfs/xfs_mount.h > @@ -54,7 +54,8 @@ enum { > > struct xfs_error_cfg { > struct xfs_kobj kobj; > - int max_retries; > + int max_retries; /* -1 = retry forever */ as per my last review, remove the comment, add XFS_ERR_RETRY_FOREVER to document that "-1 = retry forever" and use that in the code so it's explicit that the code is intended to handle this case. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs