On 12/07/2016 07:42 AM, peng.liang5@xxxxxxxxxx wrote: > Hello, Ben > > Sorry for late to reply. > > Such is the case as you said below. If fast_io_fail_tmo is off we have > to cap dev_loss_tmo at 600. So, this patch is a wrong guide and will be > cause a kernel error. > Indeed. We've had _far_ too many fixes for the 'dev_loss_tmo defaults to 600' issue, but seems to have it fixed by now. So any patches in this area should be treated with utmost caution. > And one more question. Should the system limit dev_loss_tmo to 600 if > fast_io_fail_tmo set to 0? > There kernel surely does. And if there is no error in the current algorithm I'm strongly in favour of just leave it alone. Cheers, Hannes > On Thu, Dec 01, 2016 at 09:06:14AM +0800, peng.liang5@xxxxxxxxxx wrote: > > If fast_io_fail_tmo isn't set, it will be use the DEFAULT_FAST_IO_FAIL > > in select_fast_io_fail. > > > > So, multipath will not run the limited of dev_loss_tmo to 600. > > Yes, but the kernel will. With your patch installed, if I disable > fast_io_fail_tmo and set no_path_retry to queue, I get these messages > > Dec 01 04:19:02 | rport-11:0-0: failed to set dev_loss_tmo to > 2147483647, error 22 > > Because if fast_io_fail_tmo is not set, the kernel itself will bar > dev_loss_tmo from being above 600 seconds. Also, even if you could set > dev_loss_tmo to it's maximum without fast_io_fail_tmo set, you would > never want to, because you would break multipath. > > With fast_io_fail_tmo disabled, the scsi device will never pass the > failed IO back up until dev_loss_tmo triggers. This means that if you > lose a path on your multipath device while doing IO, you won't be able > to resend that IO down another path for 68 years (2147483647 seconds). > Also, all the synchronous checker functions will not return for 648 > years. And during all this time these processes will be uninterruptable > sleep. At that point, there would be no point to even having multiple > paths, because you couldn't ever actually use them if one went down. > > > > > And I think using MP_FAST_IO_FAIL_UNSET as the condition is meaningless > > after multipath > > > > run select_fast_io_fail even if it's not set. > > This is true in the default case, but we can't rely on the default case. > Since we allow users to turn it off, we need to correctly configure > multipath when it is off. > > -Ben > > > 原始邮件 > > 发件人:BenjaminMarzinski > > 收件人:彭亮10137102; > > 抄送人:<dm-devel@xxxxxxxxxx>张凯10072500; > > 日 期 :2016年11月29日 08:30 > > 主 题 :Re: [dm- > devel] [PATCH] libmultipath: ensure dev_loss_tmo will be > > update to MAX_DEV_LOSS_TMO if no_path_retry set to queue > > > > On Fri, Nov 25, 2016 at 02:36:04PM +0800, peng.liang5@xxxxxxxxxx wrote: > > > From: PengLiang <peng.liang5@xxxxxxxxxx> > > > > > > If no_path_retry set to queue, we should make sure dev_loss_tmo update to MAX_DEV_LOSS_TMO. > > > But, it will be limit to 600 if fast_io_fail_tmo set to off or 0 meanwhile. > > > > Doesn't the system still limit dev_loss_tmo to 600 if fast_io_fail_tmo isn't set. Multipath > > was using this limit, since the underlying system uses it. > > > > -Ben > > > > > > > > Signed-off-by: PengLiang <peng.liang5@xxxxxxxxxx> > > > --- > > > libmultipath/discovery.c | 3 ++- > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > diff --git a/libmultipath/discovery.c b/libmultipath/discovery.c > > > index aaa915c..05b0842 100644 > > > --- a/libmultipath/discovery.c > > > +++ b/libmultipath/discovery.c > > > @@ -608,7 +608,8 @@ sysfs_set_rport_tmo(struct multipath *mpp, struct path *pp) > > > goto out; > > > } > > > } > > > - } else if (mpp->dev_loss > DEFAULT_DEV_LOSS_TMO) { > > > + } else if (mpp->dev_loss > DEFAULT_DEV_LOSS_TMO && > > > + mpp->no_path_retry != NO_PATH_RETRY_QUEUE) { > > > condlog(3, "%s: limiting dev_loss_tmo to %d, since " > > > "fast_io_fail is not set", > > > rport_id, DEFAULT_DEV_LOSS_TMO); > > > -- > > > 2.8.1.windows.1 > > > > -- > > dm-devel mailing list > > dm-devel@xxxxxxxxxx > > https://www.redhat.com/mailman/listinfo/dm-devel > > > -- > dm-devel mailing list > dm-devel@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/dm-devel > > > > > > -- > dm-devel mailing list > dm-devel@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/dm-devel > -- Dr. Hannes Reinecke Teamlead Storage & Networking hare@xxxxxxx +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg) -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel