Hi Martin, The san_path_err_XX feature was added by me and pushed to the upstream. And this feature was driven from Brocade Customer Feedback. And the below link will give the history of this where couple of discussions went before we started this feature. https://www.redhat.com/archives/dm-devel/2017-January/msg00025.html Our requirement was simple For example If there are two paths on a dm-1 say sda and sdb as below. # multipath -ll mpathd (3600110d001ee7f0102050001cc0b6751) dm-1 SANBlaze,VLUN MyLun size=8.0M features='0' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=50 status=active |- 8:0:1:0 sda 8:48 active ready running `- 9:0:1:0 sdb 8:64 active ready running And on sda if iam seeing lot of errors due to which the sda path is fluctuating from failed state to active state and vicevera. The requirement was something like this if sda is failed(moved from active to failed state) for more than X times in a Y duration ,then I want to keep the sda in failed state for Z duration And the data should travel only through sdb path for Z hrs. From the configuration point of view san_path_err_threshold: The number of times the sda has been moved from active to failed (from the above example it is X) san_path_err_forget_rate: Watch window (within this time frame if the path failures (sda moving from active to failed ) are more than err threshold then don't reinstate the path) (from the above example it is Y) san_path_err_recovery_time: Place the path in failed state for this particular time (from the above example it is Z) Moving from active state to Failed state (good to bad) is considered as 1 count. It means if a particular path has failed (moved from active to failed states) san_path_err_threshold times within a san_path_err_forget_rate time frame window ,place the path in failed state and does not reinstantate it for san_path_err_recovery_time time. Coming back to the marginal path implementation i have rechecked the implementation and I completely agree with you it's difficult to derive the direct formula for both. And the example which I gave doesn't holds god. And both approaches are mutually exclusive in detecting the marginal/shaky path. In san_path_err_XX case we are taking the consideration of overall failures (san_path_err_threshold ) whereas in marginal case IMO we are considering the error rate (marginal_path_err_rate_threshold )? And you are correct if we merge the san_path_err_XX ,marginal_path_XX configuration as one parameters this will further confuse the user. Since there are different approaches we need to come up with a way as how the user can choose the algorithm in multipath.conf. Similar to Multipaths configuration in .conf file. Regards, Muneendra -----Original Message----- From: Martin Wilck [mailto:mwilck@xxxxxxxx] Sent: Friday, December 21, 2018 2:56 AM To: Muneendra Kumar M <muneendra.kumar@xxxxxxxxxxxx>; Christophe Varoqui <christophe.varoqui@xxxxxxxxxxx>; mwilck@xxxxxxxx Cc: M Muneendra Kumar <mmandala@xxxxxxxxxxx>; Guan Junxiong <guanjunxiong@xxxxxxxxxx>; Benjamin Marzinski <bmarzins@xxxxxxxxxx>; dm-devel@xxxxxxxxxx; Hannes Reinecke <hare@xxxxxxx> Subject: Re: [PATCH 04/19] Revert "multipath-tools: discard san_path_err_XXX feature" Hello Muneedra, On Thu, 2018-12-20 at 16:11 +0530, Muneendra Kumar M wrote: > Hi Martin, > I completely agree with you as we cannot derive a direct formula > behind these two unless we don't know the IOPS on a particular path. > > As the IOPS in both the cases are different during the detection of > Shaky path. > In marginal_path_XX case the IOPS are fixed i.e 100 (at a sample rate > of > 10HZ) ,Similarly in san_path_xx case the IOPS are not fixed(as it > depends on the application). > > But there are lot of ways to derive the IOPS on a particular path if > we can get that then we can derive the values like below IMO. > > And to calculate these we need to derive error threshold as the > percentage of IOPS and the percentage should not be less than 1(as > most of the Brocade SAN customers are using this configuration). > i.e san_path_errr_threshold and > marginal_path_err_rate_threshold needs to > be computed as percentage of IOPS for a given number of secs(derived > from san_path_err_forget_rate/ marginal_path_err_sample_time). You make me curious - are Brocade customers using our upstream multipath code? Do you have insights about if, and how, they apply marginal path checking in multipath-tools, and what parameter values they are applying? If yes, it would be very valuable for the community if you could share some of these insights. So far I'm gathering that you recommend to consider paths as shaky if they have an error rate of more than 1%. > > For example if 1000 IOPS are happening on a particular path and > making the percentage factor as 1 and sample time as 60 secs the > configuration will be as below > > san_path_err_threshold =600 (1 percentage of 60*1000) > san_path_err_forget_rate =60 > san_path_err_recovery_time 100 Hm, I understand it differently. In the san_path_err model, if you have an error rate of 1% and the settings above, IMO you will *never* reach the threshold. The failure count will increase (on average) in 1/100 ticks, but it will decrease in 1/60 ticks, resulting in a negative first derivative (more precisely, a stochastic process where the overall trend goes towards 0, not upwards towards the threshold). In the san_path_err model, the maximum tolerable failure rate is basically the reciprocal of the san_path_err_forget_rate parameter. The error threshold as a different effect, acting rather as a "delay" until the algorithm really considers the path shaky. The closer the failure rate to the forget rate, the longer it takes. For example, if you have an error rate of 1/30 (3.3%), the failure count will increase by one every 60 ticks (1/30-1/60 = 1/60), and it will take 60*600 = 36000 (!) ticks, or 10h at best, until the path is considered shaky. OTOH, with an error rate of 10%, the threshold is reached in 7200 ticks, and at an error rate of 50%, in 1200s. For you scenario, I'd use something like san_path_err_threshold 4 san_path_err_forget_rate 100 san_path_err_recovery_time 100 At least that's how I understand the algorithm. Am I wrong? Btw, are you aware that the san_path_err algorithm, at least in the form that was merged upstream, only counts good->bad transitions? Especially with high error rates, this is quite different from an overall error rate (failures / overall I/Os), because several subsequent failures are only counted as one. > > Now this user is supposed to migrate to marginal_path settings. > (IOPS in this case is fixed to 100 during the shaky path detection) > 60 (1 percentage of 60*100) > marginal_path_err_sample_time 60 > marginal_path_err_recheck_gap_time 100 > > > > And in this case san_path_err_forget_rate should be same as > marginal_path_err_sample_time and > san_path_err_recovery_time should be same as > marginal_path_err_recheck_gap_time . > only the variable factor is san_path_err_threshold and > marginal_path_err_rate_threshold which keeps changing based on the > number > of errors as a percentage of IOPS for a given number of secs. > > The only parameter that is extra in marginal case is > marginal_path_double_failed_time which we need to configure for > suspecting > a marginal path. I don't think these parameters will have the behavior as the san_path_err parameters above. Argument above. Note that marginal_path_err_sample_time 60 is invalid (the marginal path code requires at least 120s), and that the error threshold is always given as a "permillage" (should be set to 10 for 1%). > > As we still see some merits in the san_path_XX approach as you > mentioned earlier and we need both san_path_err_xx and > marginal_path_err_xx I am thinking of the below approach so that the > customers can have the common configuration for both. > From the functionality wise san_path_err_forget_rate , > marginal_path_err_sample_time and > san_path_err_recovery_time ,marginal_path_err_recheck_gap_time and > san_path_err_threshold , marginal_path_err_rate_threshold are same. > > So we can have the common configuration name as marginal_path_err_XX > (parameters) for both approaches and the deriving factor should be > marginal_path_double_failed_time . > If marginal_path_double_failed_time is not defined go with > san_path_err > approach else go with marginal_path_err approach to detect the Shaky > path. I'm not sure about that. It's important that users are able to understand the effect that each parameter has. If we use the same parameter name for different parameters of different algorithms, even bigger confusion might arise than we have now. "san_path_err_recovery_time" and "marginal_path_recheck_gap_time" obviously have very similar effects, but for the other parameters I don't see 1:1 equivalence. Best regards, Martin -- Dr. Martin Wilck <mwilck@xxxxxxxx>, Tel. +49 (0)911 74053 2107 SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel