Hi Martin, I completely agree with you as we cannot derive a direct formula behind these two unless we don't know the IOPS on a particular path. As the IOPS in both the cases are different during the detection of Shaky path. In marginal_path_XX case the IOPS are fixed i.e 100 (at a sample rate of 10HZ) ,Similarly in san_path_xx case the IOPS are not fixed(as it depends on the application). But there are lot of ways to derive the IOPS on a particular path if we can get that then we can derive the values like below IMO. And to calculate these we need to derive error threshold as the percentage of IOPS and the percentage should not be less than 1(as most of the Brocade SAN customers are using this configuration). i.e san_path_errr_threshold and marginal_path_err_rate_threshold needs to be computed as percentage of IOPS for a given number of secs(derived from san_path_err_forget_rate/ marginal_path_err_sample_time). For example if 1000 IOPS are happening on a particular path and making the percentage factor as 1 and sample time as 60 secs the configuration will be as below san_path_err_threshold =600 (1 percentage of 60*1000) san_path_err_forget_rate =60 san_path_err_recovery_time 100 Now this user is supposed to migrate to marginal_path settings. (IOPS in this case is fixed to 100 during the shaky path detection) marginal_path_err_rate_threshold 60 (1 percentage of 60*100) marginal_path_err_sample_time 60 marginal_path_err_recheck_gap_time 100 And in this case san_path_err_forget_rate should be same as marginal_path_err_sample_time and san_path_err_recovery_time should be same as marginal_path_err_recheck_gap_time . only the variable factor is san_path_err_threshold and marginal_path_err_rate_threshold which keeps changing based on the number of errors as a percentage of IOPS for a given number of secs. The only parameter that is extra in marginal case is marginal_path_double_failed_time which we need to configure for suspecting a marginal path. As we still see some merits in the san_path_XX approach as you mentioned earlier and we need both san_path_err_xx and marginal_path_err_xx I am thinking of the below approach so that the customers can have the common configuration for both. >From the functionality wise san_path_err_forget_rate , marginal_path_err_sample_time and san_path_err_recovery_time ,marginal_path_err_recheck_gap_time and san_path_err_threshold , marginal_path_err_rate_threshold are same. So we can have the common configuration name as marginal_path_err_XX (parameters) for both approaches and the deriving factor should be marginal_path_double_failed_time . If marginal_path_double_failed_time is not defined go with san_path_err approach else go with marginal_path_err approach to detect the Shaky path. Regards, Muneendra. -----Original Message----- From: Martin Wilck [mailto:mwilck@xxxxxxxx] Sent: Wednesday, December 19, 2018 5:32 PM To: Muneendra Kumar M <muneendra.kumar@xxxxxxxxxxxx>; Christophe Varoqui <christophe.varoqui@xxxxxxxxxxx>; mwilck+gmail@xxxxxxx Cc: M Muneendra Kumar <mmandala@xxxxxxxxxxx>; Guan Junxiong <guanjunxiong@xxxxxxxxxx>; Benjamin Marzinski <bmarzins@xxxxxxxxxx>; dm-devel@xxxxxxxxxx; Hannes Reinecke <hare@xxxxxxx> Subject: Re: [PATCH 04/19] Revert "multipath-tools: discard san_path_err_XXX feature" On Wed, 2018-12-19 at 17:02 +0530, Muneendra Kumar M wrote: > Hi Martin, > In one of the patch "[PATCH 00/19] san_path_err & multipath ANA > support" > > you have mentioned that san_path_err_XXX has some merits over > marginal_path_err_XXX. > > Is this understanding correct if so could you please explain the > scenario in which use case this was better. > > I can say Marginal_path_err_xx is superset of san_path_err_xx. If you think so, please explain how. Imagine a user who has configured san_path_err_threshold X san_path_err_forget_rate Y san_path_err_recovery_time Z Now this user is suppsed migrate to marginal_path settings. marginal_path_double_failed_time A marginal_path_err_sample_time B marginal_path_err_rate_threshold C marginal_path_err_recheck_gap_time D Can you provide a formula to calculate A,B,C,D such that the system behaves the same way (or "better") than previously with X, Y, Z? I have pondered this for a while and concluded that I can't. > If we need both san_path_err_xx , Marginal_path_err_xx then so many > configurations will really confuse the customers. True, the many different options are confusing. However, I don't think it becomes much worse by offering both methods. Both methods aren't easy to understand by themselves. Once users understand that these two parameter sets are mutually exclusive, I think they can deal with that. What we really need is easier set-up of either method (think of 2-3 sets of reasobable pre-set parameter values for different scenarios). I believe most admins are so intimidated by the complexity of the parameters and their interaction that they give up and use delay_xx_checks instead, or nothing at all. Unfortunately this is all based on guessing; we at least have no data if users are trying these parameters and if yes, what they are using. Martin -- Dr. Martin Wilck <mwilck@xxxxxxxx>, Tel. +49 (0)911 74053 2107 SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel