On Tue, 2024-02-13 at 00:42 +0100, Xose Vazquez Perez wrote: > On 8/26/21 8:47 AM, Martin Wilck wrote: > ^^^^^^^ > It is never too late! :-) > Some history: > > first commit 3eb8c380a : > { > /* IBM SAN Volume Controller */ > .vendor = "IBM", > .product = "2145", > .getuid = DEFAULT_GETUID, > .getprio = "mpath_prio_alua /dev/%n", > .features = "1 queue_if_no_path", > .hwhandler = DEFAULT_HWHANDLER, > .selector = DEFAULT_SELECTOR, > .pgpolicy = GROUP_BY_PRIO, > .pgfailback = -FAILBACK_IMMEDIATE, > .rr_weight = RR_WEIGHT_NONE, > .no_path_retry = NO_PATH_RETRY_UNDEF, > .minio = DEFAULT_MINIO, > .checker_name = TUR, > }, > > NO_PATH_RETRY_UNDEF was removed in b7c3cf014 because it was the > default value, > and later "1 queue_if_no_path" was replaced by NO_PATH_RETRY_QUEUE in > 87ea76f99 ... which shows that the default has been "queue" for almost 18 years. > IBM docs recommends: > no_path_retry 5 # or no_path_retry "fail" for some current linux > distros > > IBM Storage FlashSystem 5200, 5000, 5100, Storwize V5100 and V5000E: > https://www.ibm.com/docs/en/flashsystem-5x00/8.6.x?topic=system-settings-linux-hosts > > IBM Storage FlashSystem 7300, 7200 and Storwize V7000: > https://www.ibm.com/docs/en/flashsystem-7x00/8.6.x?topic=system-settings-linux-hosts > > IBM FlashSystem V9000: > https://www.ibm.com/docs/en/flashsystem-v9000/8.3.x?topic=system-settings-linux-hosts > > IBM Storage FlashSystem 9500, 9200 and 9100: > https://www.ibm.com/docs/en/flashsystem-9x00/8.6.x?topic=system-settings-linux-hosts > > Therefore, we should change this value. I tend to disagree. It's true that we usually follow vendor recommendations. But in this case, I think the change would do more harm than good, because we've defaulted to "queue" basically forever for this product. Suddenly switching to a rather short no_path_retry value might come as a unpleasant surprise for users. Users who follow the IBM recommendations (using explicit multipath.conf settings) won't notice the change anyway, but those who rely on our defaults might even loose data. In general, I believe vendors recommendations about "no_path_retry" don't mean much. This setting doesn't depend on the properties of the hardware, it's rather the preference of the end customer [*]. IMHO "fail" or low numeric values of no_path_retry mainly make sense in cluster configurations. Unfortunately, IBM gives no rationale for this recommendation in its manuals [+]. But I'm not religious on the matter; more opinions welcome. Martin [*] Vendors can recommend a lower limit for no_path_retry, in the sense "with this product, it can happen that zero paths are available for N seconds during a firmware update", but a fixed no_path_retry value acts as an upper limit. [+] I suspect that the recommendations in the current IBM manuals have just been copy/pasted from earlier ones, without much consideration.