Hello, Hannes
The kernel didn't limit the dev_loss_tmo if fast_io_fail_tmo is 0.
But multipath did. Should I leave it alone and just revert this patch?
Thanks.
原始邮件
发件人:HannesReinecke
收件人:<dm-devel@xxxxxxxxxx>
日 期 :2016年12月07日 15:04
主 题 :Re: [dm-devel] [PATCH] libmultipath: ensure dev_loss_tmo will be update to MAX_DEV_LOSS_TMO if no_path_retry set to queue
On 12/07/2016 07:42 AM, peng.liang5@xxxxxxxxxx wrote:
> Hello, Ben
>
> Sorry for late to reply.
>
> Such is the case as you said below. If fast_io_fail_tmo is off we have
> to cap dev_loss_tmo at 600. So, this patch is a wrong guide and will be
> cause a kernel error.
>
Indeed.
We've had _far_ too many fixes for the 'dev_loss_tmo defaults to 600'
issue, but seems to have it fixed by now.
So any patches in this area should be treated with utmost caution.
> And one more question. Should the system limit dev_loss_tmo to 600 if
> fast_io_fail_tmo set to 0?
>
There kernel surely does. And if there is no error in the current
algorithm I'm strongly in favour of just leave it alone.
Cheers,
Hannes
> On Thu, Dec 01, 2016 at 09:06:14AM +0800, peng.liang5@xxxxxxxxxx wrote:
> > If fast_io_fail_tmo isn't set, it will be use the DEFAULT_FAST_IO_FAIL
> > in select_fast_io_fail.
> >
> > So, multipath will not run the limited of dev_loss_tmo to 600.
>
> Yes, but the kernel will. With your patch installed, if I disable
> fast_io_fail_tmo and set no_path_retry to queue, I get these messages
>
> Dec 01 04:19:02 | rport-11:0-0: failed to set dev_loss_tmo to
> 2147483647, error 22
>
> Because if fast_io_fail_tmo is not set, the kernel itself will bar
> dev_loss_tmo from being above 600 seconds. Also, even if you could set
> dev_loss_tmo to it's maximum without fast_io_fail_tmo set, you would
> never want to, because you would break multipath.
>
> With fast_io_fail_tmo disabled, the scsi device will never pass the
> failed IO back up until dev_loss_tmo triggers. This means that if you
> lose a path on your multipath device while doing IO, you won't be able
> to resend that IO down another path for 68 years (2147483647 seconds).
> Also, all the synchronous checker functions will not return for 648
> years. And during all this time these processes will be uninterruptable
> sleep. At that point, there would be no point to even having multiple
> paths, because you couldn't ever actually use them if one went down.
>
> >
> > And I think using MP_FAST_IO_FAIL_UNSET as the condition is meaningless
> > after multipath
> >
> > run select_fast_io_fail even if it's not set.
>
> This is true in the default case, but we can't rely on the default case.
> Since we allow users to turn it off, we need to correctly configure
> multipath when it is off.
>
> -Ben
>
> > 原始邮件
> > 发件人:BenjaminMarzinski
> > 收件人:彭亮10137102;
> > 抄送人:<dm-devel@xxxxxxxxxx>张凯10072500;
> > 日 期 :2016年11月29日 08:30
> > 主 题 :Re: [dm-
> devel] [PATCH] libmultipath: ensure dev_loss_tmo will be
> > update to MAX_DEV_LOSS_TMO if no_path_retry set to queue
> >
> > On Fri, Nov 25, 2016 at 02:36:04PM +0800, peng.liang5@xxxxxxxxxx wrote:
> > > From: PengLiang <peng.liang5@xxxxxxxxxx>
> > >
> > > If no_path_retry set to queue, we should make sure dev_loss_tmo update to MAX_DEV_LOSS_TMO.
> > > But, it will be limit to 600 if fast_io_fail_tmo set to off or 0 meanwhile.
> >
> > Doesn't the system still limit dev_loss_tmo to 600 if fast_io_fail_tmo isn't set. Multipath
> > was using this limit, since the underlying system uses it.
> >
> > -Ben
> >
> > >
> > > Signed-off-by: PengLiang <peng.liang5@xxxxxxxxxx>
> > > ---
> > > libmultipath/discovery.c | 3 ++-
> > > 1 file changed, 2 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/libmultipath/discovery.c b/libmultipath/discovery.c
> > > index aaa915c..05b0842 100644
> > > --- a/libmultipath/discovery.c
> > > +++ b/libmultipath/discovery.c
> > > @@ -608,7 +608,8 @@ sysfs_set_rport_tmo(struct multipath *mpp, struct path *pp)
> > > goto out;
> > > }
> > > }
> > > - } else if (mpp->dev_loss > DEFAULT_DEV_LOSS_TMO) {
> > > + } else if (mpp->dev_loss > DEFAULT_DEV_LOSS_TMO &&
> > > + mpp->no_path_retry != NO_PATH_RETRY_QUEUE) {
> > > condlog(3, "%s: limiting dev_loss_tmo to %d, since "
> > > "fast_io_fail is not set",
> > > rport_id, DEFAULT_DEV_LOSS_TMO);
> > > --
> > > 2.8.1.windows.1
> >
> > --
> > dm-devel mailing list
> > dm-devel@xxxxxxxxxx
> > https://www.redhat.com/mailman/listinfo/dm-devel
>
>
> --
> dm-devel mailing list
> dm-devel@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/dm-devel
>
>
>
>
>
> --
> dm-devel mailing list
> dm-devel@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/dm-devel
>
--
Dr. Hannes Reinecke Teamlead Storage & Networking
hare@xxxxxxx +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel
> Hello, Ben
>
> Sorry for late to reply.
>
> Such is the case as you said below. If fast_io_fail_tmo is off we have
> to cap dev_loss_tmo at 600. So, this patch is a wrong guide and will be
> cause a kernel error.
>
Indeed.
We've had _far_ too many fixes for the 'dev_loss_tmo defaults to 600'
issue, but seems to have it fixed by now.
So any patches in this area should be treated with utmost caution.
> And one more question. Should the system limit dev_loss_tmo to 600 if
> fast_io_fail_tmo set to 0?
>
There kernel surely does. And if there is no error in the current
algorithm I'm strongly in favour of just leave it alone.
Cheers,
Hannes
> On Thu, Dec 01, 2016 at 09:06:14AM +0800, peng.liang5@xxxxxxxxxx wrote:
> > If fast_io_fail_tmo isn't set, it will be use the DEFAULT_FAST_IO_FAIL
> > in select_fast_io_fail.
> >
> > So, multipath will not run the limited of dev_loss_tmo to 600.
>
> Yes, but the kernel will. With your patch installed, if I disable
> fast_io_fail_tmo and set no_path_retry to queue, I get these messages
>
> Dec 01 04:19:02 | rport-11:0-0: failed to set dev_loss_tmo to
> 2147483647, error 22
>
> Because if fast_io_fail_tmo is not set, the kernel itself will bar
> dev_loss_tmo from being above 600 seconds. Also, even if you could set
> dev_loss_tmo to it's maximum without fast_io_fail_tmo set, you would
> never want to, because you would break multipath.
>
> With fast_io_fail_tmo disabled, the scsi device will never pass the
> failed IO back up until dev_loss_tmo triggers. This means that if you
> lose a path on your multipath device while doing IO, you won't be able
> to resend that IO down another path for 68 years (2147483647 seconds).
> Also, all the synchronous checker functions will not return for 648
> years. And during all this time these processes will be uninterruptable
> sleep. At that point, there would be no point to even having multiple
> paths, because you couldn't ever actually use them if one went down.
>
> >
> > And I think using MP_FAST_IO_FAIL_UNSET as the condition is meaningless
> > after multipath
> >
> > run select_fast_io_fail even if it's not set.
>
> This is true in the default case, but we can't rely on the default case.
> Since we allow users to turn it off, we need to correctly configure
> multipath when it is off.
>
> -Ben
>
> > 原始邮件
> > 发件人:BenjaminMarzinski
> > 收件人:彭亮10137102;
> > 抄送人:<dm-devel@xxxxxxxxxx>张凯10072500;
> > 日 期 :2016年11月29日 08:30
> > 主 题 :Re: [dm-
> devel] [PATCH] libmultipath: ensure dev_loss_tmo will be
> > update to MAX_DEV_LOSS_TMO if no_path_retry set to queue
> >
> > On Fri, Nov 25, 2016 at 02:36:04PM +0800, peng.liang5@xxxxxxxxxx wrote:
> > > From: PengLiang <peng.liang5@xxxxxxxxxx>
> > >
> > > If no_path_retry set to queue, we should make sure dev_loss_tmo update to MAX_DEV_LOSS_TMO.
> > > But, it will be limit to 600 if fast_io_fail_tmo set to off or 0 meanwhile.
> >
> > Doesn't the system still limit dev_loss_tmo to 600 if fast_io_fail_tmo isn't set. Multipath
> > was using this limit, since the underlying system uses it.
> >
> > -Ben
> >
> > >
> > > Signed-off-by: PengLiang <peng.liang5@xxxxxxxxxx>
> > > ---
> > > libmultipath/discovery.c | 3 ++-
> > > 1 file changed, 2 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/libmultipath/discovery.c b/libmultipath/discovery.c
> > > index aaa915c..05b0842 100644
> > > --- a/libmultipath/discovery.c
> > > +++ b/libmultipath/discovery.c
> > > @@ -608,7 +608,8 @@ sysfs_set_rport_tmo(struct multipath *mpp, struct path *pp)
> > > goto out;
> > > }
> > > }
> > > - } else if (mpp->dev_loss > DEFAULT_DEV_LOSS_TMO) {
> > > + } else if (mpp->dev_loss > DEFAULT_DEV_LOSS_TMO &&
> > > + mpp->no_path_retry != NO_PATH_RETRY_QUEUE) {
> > > condlog(3, "%s: limiting dev_loss_tmo to %d, since "
> > > "fast_io_fail is not set",
> > > rport_id, DEFAULT_DEV_LOSS_TMO);
> > > --
> > > 2.8.1.windows.1
> >
> > --
> > dm-devel mailing list
> > dm-devel@xxxxxxxxxx
> > https://www.redhat.com/mailman/listinfo/dm-devel
>
>
> --
> dm-devel mailing list
> dm-devel@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/dm-devel
>
>
>
>
>
> --
> dm-devel mailing list
> dm-devel@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/dm-devel
>
--
Dr. Hannes Reinecke Teamlead Storage & Networking
hare@xxxxxxx +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel
-- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel