Re: deterministic io throughput in multipath

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Ben,
After the below discussion we  came with the approach which will meet our requirement.
I have attached the patch which is working good in our field tests.
Could you please review the attached patch and provide us your valuable comments .
Below are the files that has been changed .
 
libmultipath/config.c      |  3 +++
libmultipath/config.h      |  9 +++++++++
libmultipath/configure.c   |  3 +++
libmultipath/defaults.h    |  1 +
libmultipath/dict.c             | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
libmultipath/dict.h        |  1 +
libmultipath/propsel.c     | 44 ++++++++++++++++++++++++++++++++++++++++++++
libmultipath/propsel.h     |  6 ++++++
libmultipath/structs.h     | 12 +++++++++++-
libmultipath/structs_vec.c | 10 ++++++++++
multipath/multipath.conf.5 | 58 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
multipathd/main.c          | 61 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 
We have added three new config parameters whose description is below.
1.san_path_err_threshold:
        If set to a value greater than 0, multipathd will watch paths and check how many times a path has been failed due to errors. If the number of failures on a particular path is greater then the san_path_err_threshold then the path will not  reinstate  till san_path_err_recovery_time. These path failures should occur within a san_path_err_threshold_window time frame, if not we will consider the path is good enough to reinstate.
 
2.san_path_err_threshold_window:
        If set to a value greater than 0, multipathd will check whether the path failures has exceeded  the san_path_err_threshold within this time frame i.e san_path_err_threshold_window . If so we will not reinstate the path till          san_path_err_recovery_time.
 
3.san_path_err_recovery_time:
If set to a value greater than 0, multipathd will make sure that when path failures has exceeded the san_path_err_threshold within san_path_err_threshold_window then the path  will be placed in failed state for san_path_err_recovery_time duration. Once san_path_err_recovery_time has timeout  we will reinstate the failed path .
 
Regards,
Muneendra.
 
-----Original Message-----
From: Muneendra Kumar M
Sent: Wednesday, January 04, 2017 6:56 PM
To: 'Benjamin Marzinski' <bmarzins@xxxxxxxxxx>
Cc: dm-devel@xxxxxxxxxx
Subject: RE: deterministic io throughput in multipath
 
Hi Ben,
Thanks for the information.
 
Regards,
Muneendra.
 
-----Original Message-----
From: Benjamin Marzinski [mailto:bmarzins@xxxxxxxxxx]
Sent: Tuesday, January 03, 2017 10:42 PM
To: Muneendra Kumar M <mmandala@xxxxxxxxxxx>
Cc: dm-devel@xxxxxxxxxx
Subject: Re: deterministic io throughput in multipath
 
On Mon, Dec 26, 2016 at 09:42:48AM +0000, Muneendra Kumar M wrote:
> Hi Ben,
>
> If there are two paths on a dm-1 say sda and sdb as below.
>
> #  multipath -ll
>        mpathd (3600110d001ee7f0102050001cc0b6751) dm-1 SANBlaze,VLUN MyLun
>        size=8.0M features='0' hwhandler='0' wp=rw
>        `-+- policy='round-robin 0' prio=50 status=active
>          |- 8:0:1:0  sda 8:48 active ready  running
>          `- 9:0:1:0  sdb 8:64 active ready  running         
>
> And on sda if iam seeing lot of errors due to which the sda path is fluctuating from failed state to active state and vicevera.
>
> My requirement is something like this if sda is failed for more then 5
> times in a hour duration ,then I want to keep the sda in failed state
> for few hours (3hrs)
>
> And the data should travel only thorugh sdb path.
> Will this be possible with the below parameters.
 
No. delay_watch_checks sets how may path checks you watch a path that has recently come back from the failed state. If the path fails again within this time, multipath device delays it.  This means that the delay is always trigger by two failures within the time limit.  It's possible to adapt this to count numbers of failures, and act after a certain number within a certain timeframe, but it would take a bit more work.
 
delay_wait_checks doesn't guarantee that it will delay for any set length of time.  Instead, it sets the number of consecutive successful path checks that must occur before the path is usable again. You could set this for 3 hours of path checks, but if a check failed during this time, you would restart the 3 hours over again.
 
-Ben
 
> Can you just let me know what values I should add for delay_watch_checks and delay_wait_checks.
>
> Regards,
> Muneendra.
>
>
>
> -----Original Message-----
> From: Muneendra Kumar M
> Sent: Thursday, December 22, 2016 11:10 AM
> To: 'Benjamin Marzinski' <bmarzins@xxxxxxxxxx>
> Cc: dm-devel@xxxxxxxxxx
> Subject: RE: deterministic io throughput in multipath
>
> Hi Ben,
>
> Thanks for the reply.
> I will look into this parameters will do the internal testing and let you know the results.
>
> Regards,
> Muneendra.
>
> -----Original Message-----
> From: Benjamin Marzinski [mailto:bmarzins@xxxxxxxxxx]
> Sent: Wednesday, December 21, 2016 9:40 PM
> To: Muneendra Kumar M <mmandala@xxxxxxxxxxx>
> Cc: dm-devel@xxxxxxxxxx
> Subject: Re: deterministic io throughput in multipath
>
> Have you looked into the delay_watch_checks and delay_wait_checks configuration parameters?  The idea behind them is to minimize the use of paths that are intermittently failing.
>
> -Ben
>
> On Mon, Dec 19, 2016 at 11:50:36AM +0000, Muneendra Kumar M wrote:
> >    Customers using Linux host (mostly RHEL host) using a SAN network for
> >    block storage, complain the Linux multipath stack is not resilient to
> >    handle non-deterministic storage network behaviors. This has caused many
> >    customer move away to non-linux based servers. The intent of the below
> >    patch and the prevailing issues are given below. With the below design we
> >    are seeing the Linux multipath stack becoming resilient to such network
> >    issues. We hope by getting this patch accepted will help in more Linux
> >    server adoption that use SAN network.
> >
> >    I have already sent the design details to the community in a different
> >    mail chain and the details are available in the below link.
> >
> >    [1]https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_archives_dm-2Ddevel_2016-2DDecember_msg00122.html&d=DgIDAw&c=IL_XqQWOjubgfqINi2jTzg&r=E3ftc47B6BGtZ4fVaYvkuv19wKvC_Mc6nhXaA1sBIP0&m=vfwpVp6e1KXtRA0ctwHYJ7cDmPsLi2C1L9pox7uexsY&s=q5OI-lfefNC2CHKmyUkokgiyiPo_Uj7MRu52hG3MKzM&e= .
> >
> >    Can you please go through the design and send the comments to us.
> >
> >     
> >
> >    Regards,
> >
> >    Muneendra.
> >
> >     
> >
> >     
> >
> > References
> >
> >    Visible links
> >    1.
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_
> > ar
> > chives_dm-2Ddevel_2016-2DDecember_msg00122.html&d=DgIDAw&c=IL_XqQWOj
> > ub
> > gfqINi2jTzg&r=E3ftc47B6BGtZ4fVaYvkuv19wKvC_Mc6nhXaA1sBIP0&m=vfwpVp6e
> > 1K
> > XtRA0ctwHYJ7cDmPsLi2C1L9pox7uexsY&s=q5OI-lfefNC2CHKmyUkokgiyiPo_Uj7M
> > Ru
> > 52hG3MKzM&e=
>
> > --
> > dm-devel mailing list
> > dm-devel@xxxxxxxxxx
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_
> > ma
> > ilman_listinfo_dm-2Ddevel&d=DgIDAw&c=IL_XqQWOjubgfqINi2jTzg&r=E3ftc4
> > 7B6BGtZ4fVaYvkuv19wKvC_Mc6nhXaA1sBIP0&m=vfwpVp6e1KXtRA0ctwHYJ7cDmPsL
> > i2C1L9pox7uexsY&s=UyE46dXOrNTbPz_TVGtpoHl3J3h_n0uYhI4TI-PgyWg&e=
 

Attachment: san_path_err.patch
Description: san_path_err.patch

Attachment: san_path_err.patch
Description: san_path_err.patch

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel

[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux