On Wed, May 04, 2016 at 09:24:19AM +1000, Dave Chinner wrote: > On Tue, May 03, 2016 at 07:55:38PM +0200, Carlos Maiolino wrote: > > From: Dave Chinner <dchinner@xxxxxxxxxx> > > > > On reception of an error, we can fail immediately, perform some > > bound amount of retries or retry indefinitely. The current behaviour > > we have is to retry forever. > > > > However, we'd like the ability to choose what behaviour we have, and > > that requires the ability to configure the behaviour through the new > > sysfs interfaces. > > > > max_retries is used to set the number of retries to do until permanently fail, > > the only special case is when it is set to -1, which, will cause the system to > > retry indefinitely. > > This doesn't read particularly well, and its more a description of > the mechanism (which shoul dbe in the code) rather than the reason > for implementing it.. The original commit message talked about > different failure characteristics, which still need to be mentioned > here. e.g: > > We'd like to be able to configure errors to either fail > immediately (fail fast), retry for some time before > failing (fail slow) or retry forever (fail never). This is > implemented by using specific values for the number of > retries we allow. Fair enough, I will re-work the patch descriptions in the next version. > > > > Add both a maximum retry count and a retry timeout so that we can > > bound by time and/or physical IO attempts. > > > > Finally, plumb these into xfs_buf_iodone error processing so that > > the error behaviour follows the selected configuration. > > > > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> Signed-off-by: > > Carlos Maiolino <cmaiolino@xxxxxxxxxx> > > You also need to document what changes you've made to the original > patches. I can't tell what you've changed in each patch just by > looking at them - this is the first one where I noticed something > while reading it. i.e. the commit message should have: > > [ cmaiolino: removed fail-fast/slow/never and instead use retry count > to determine behaviour ] > Ok > > --- > > fs/xfs/xfs_buf.h | 23 ++++++++++++++- > > fs/xfs/xfs_buf_item.c | 13 +++++++-- > > fs/xfs/xfs_mount.h | 3 +- > > fs/xfs/xfs_sysfs.c | 81 ++++++++++++++++++++++++++++++++++++++++++++++++--- > > 4 files changed, 112 insertions(+), 8 deletions(-) > > > > diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h > > index adef116..bd978f0 100644 > > --- a/fs/xfs/xfs_buf.h > > +++ b/fs/xfs/xfs_buf.h > > @@ -183,7 +183,28 @@ typedef struct xfs_buf { > > unsigned int b_page_count; /* size of page array */ > > unsigned int b_offset; /* page offset in first page */ > > int b_error; /* error code on I/O */ > > - int b_last_error; /* previous async I/O error */ > > + > > + /* > > + * async write failure retry count. Initialised to zero on the first > > + * failure, then when it exceeds the maximum configured without a > > + * success the write is considered to be failed permanently and the > > + * iodone handler will take appropriate action. > > + * > > + * For retry timeouts, we record the jiffie of the first failure. This > > + * means that we can change the retry timeout and it on the next error > > + * it will be checked against the newly configured timeout. This > > + * prevents buffers getting stuck in retry loops with a really long > > + * timeout. > > + * > > + * last_error is used to ensure that we are getting repeated errors, not > > + * different errors. e.g. a block device might change ENOSPC to EIO when > > + * a failure timeout occurs, so we want to re-initialise the error > > + * retry behaviour appropriately when that happens. > > + */ > > + int b_retries; > > + unsigned long b_first_retry_time; /* in jiffies */ > > + int b_last_error; > > + > > const struct xfs_buf_ops *b_ops; > > > > #ifdef XFS_BUF_LOCK_TRACKING > > diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c > > index 3a30a88..68ed2a0 100644 > > --- a/fs/xfs/xfs_buf_item.c > > +++ b/fs/xfs/xfs_buf_item.c > > @@ -1086,6 +1086,9 @@ xfs_buf_iodone_callback_error( > > bp->b_flags |= (XBF_WRITE | XBF_ASYNC | > > XBF_DONE | XBF_WRITE_FAIL); > > bp->b_last_error = bp->b_error; > > + bp->b_retries = 0; > > + bp->b_first_retry_time = jiffies; > > + > > xfs_buf_ioerror(bp, 0); > > xfs_buf_submit(bp); > > return true; > > @@ -1095,8 +1098,13 @@ xfs_buf_iodone_callback_error( > > * Repeated failure on an async write. Take action according to the > > * error configuration we have been set up to use. > > */ > > - if (!cfg->max_retries) > > - goto permanent_error; > > + if (cfg->max_retries >= 0) > > + if (++bp->b_retries > cfg->max_retries) > > + goto permanent_error; > > + if (cfg->retry_timeout) > > + if (time_after(jiffies, > > + cfg->retry_timeout + bp->b_first_retry_time)) > > + goto permanent_error; > > This is hard to read and the pattern is prone to future maintenance > problems. i.e. nested single line if statements are prone to people > making mistakes with indentation when adding more conditions to > them. > > So: > > if (cfg->max_retries != XFS_ERR_RETRY_FOREVER && > ++bp->b_retries > cfg->max_retries) > goto permanent_error; > if (cfg->retry_timeout && > time_after(jiffies, cfg->retry_timeout + bp->b_first_retry_time)) > goto permanent_error; > Fair enough, will change this in the next revision. Thanks for the review > > /* still a transient error, higher layers will retry */ > > xfs_buf_ioerror(bp, 0); > > @@ -1139,6 +1147,7 @@ xfs_buf_iodone_callbacks( > > * retry state here in preparation for the next error that may occur. > > */ > > bp->b_last_error = 0; > > + bp->b_retries = 0; > > > > xfs_buf_do_callbacks(bp); > > bp->b_fspriv = NULL; > > diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h > > index 0c5a976..0382140 100644 > > --- a/fs/xfs/xfs_mount.h > > +++ b/fs/xfs/xfs_mount.h > > @@ -54,7 +54,8 @@ enum { > > > > struct xfs_error_cfg { > > struct xfs_kobj kobj; > > - int max_retries; > > + int max_retries; > > #define XFS_ERR_RETRY_FOREVER (-1) > > > + unsigned long retry_timeout; /* in jiffies, 0 = no timeout */ > > }; > .... > > +static ssize_t > > +max_retries_store( > > + struct kobject *kobject, > > + const char *buf, > > + size_t count) > > +{ > > + struct xfs_error_cfg *cfg = to_error_cfg(kobject); > > + int ret; > > + int val; > > + > > + ret = kstrtoint(buf, 0, &val); > > + if (ret) > > + return ret; > > + > > + if (val < -1 || val > INT_MAX) > > + return -EINVAL; > > Should special case XFS_ERR_RETRY_FOREVER here. i.e. range is > 0-INT_MAX || XFS_ERR_RETRY_FOREVER. > > if ((val < -1 || val > INT_MAX) && > val != XFS_ERR_RETRY_FOREVER) > return -EINVAL; > > -- > Dave Chinner > david@xxxxxxxxxxxxx -- Carlos _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs