On Wed, Jun 08, 2022 at 04:47:37PM +0000, Martin Wilck wrote: > On Wed, 2022-06-08 at 09:40 -0500, Benjamin Marzinski wrote: > > On Wed, Jun 08, 2022 at 07:56:27AM +0000, Martin Wilck wrote: > > > On Tue, 2022-06-07 at 17:45 -0500, Benjamin Marzinski wrote: > > > > Dell EMC would like to always use the emc_clariion checker. > > > > Currently > > > > detect_checker will switch the checker to TUR for Unity arrays. > > > > This can cause problems on some setups with replicated Unity > > > > LUNs, > > > > which are handled correctly the the emc_checker, but not the TUR > > > > checker. > > > > > > > > Cc: vincent.chen1@xxxxxxxx > > > > Signed-off-by: Benjamin Marzinski <bmarzins@xxxxxxxxxx> > > > > > > This points us to a flaw in our logic. > > > > > > It was wrong in the first place to have autodetection take > > > precedence, > > > even over "overrides". The effect for users is that whenever > > > "path_checker" is set, "detect_checker no" must also be set, which > > > is highly surprising and adds no benefit. > > > > > > We should assume that if a device has an explicit checker > > > configured > > > either in the device configuration, overrides, or the hwtable, this > > > checker must be used, and fall back to autodetection only if this > > > is > > > not the case. > > > > > > So while this patch is alright, I'd prefer a patch that fixes the > > > logic. > > > > I'm not sure that this is a good idea. IIRC, the current method was > > an > > intentional choice. The idea was that if a checker was autodetected, > > we > > would use that. If not, we would fall back to the user defined > > checker. > > For the most part this is useful for arrays that could be run in ALUA > > or > > non-alua mode. Changing the priority of detect_checker will suddenly > > change how these arrays are configured. Users that configured a > > failback checker for cases where their arrays weren't in ALUA mode > > would > > suddenly find that their arrays were always using the fallback > > method. > > > Now I'm not saying that the original logic was the best option. But I > > am > > saying that it hasn't caused many issues over the years that it's > > been > > in existance. There are a number of arrays in the builtin hardware > > table > > that define checkers. I assume that they either don't support ALUA, > > or > > they are happy with only using their configured checker when > > their<Thinking about it, > > arrays aren't in ALUA mode. > > Most of the built-in hwtable entries set the RDAC checker. For these, > the result will be unchanged if we lower the precedence of > detect_checker. There are two entries setting DIRECTIO for non-SCSI > devices (DASD and Huawei NVMe): no regression risk there. Then there's > Clariion CX/AX/VNX, for which the change in behavior is intended. > Finally, there are two entries for ancient HPE devices using HP_SW. If > we change the precedence, these devices might switch from TUR to HP_SW > (if they support ALUA, dunno). As well as any user configurations that set this. I realize that it has been a while since the days when most of the default device configs explicitly set a checker, but I don't have any idea how many users are still running with a user config like that. > > I would rather fix the remaining cases where > > the existing priority gives the wrong answer than suddenly change how > > things work, in a way that could suddenly break things for an unknown > > (but quite possibly large) number of existing users. > > According to the above, only people who'd use very old HPE storage > arrays with latest upstream multipath-tools would be affected. > > Remain those users that you mentioned — people who operate in ALUA mode > but keep some explicit multipath.conf settings around as fallback "when > their arrays aren't in ALUA mode" (for whatever reason). These users > would now observe that their arrays operate in fallback mode, even if > ALUA was enabled. > > I'm not sure if this matters much. OTOH, it's rather common for people > to forget to set "detect_checker no" and wonder why their > multipath.conf settings don't take effect. But like my patch showed, it's a simple fix. If users wanted to define a fallback method for arrays that optionally support ALUA (and I admit that I have no idea how many arrays like this are still being used) it's a lot tricker if the devices section path_checker takes precedence. They can't set the fallback checker there. They would have to set it in the defaults section. This is problematic, since you are changing the checker on arrays you may not have intended. > > > > > (The same could be said for detect_prio, but I don't want to make > > > outrageous demands). > > > > The above arguments apply here, only moreso. > > Besides the devices already mentioned for detect_checker, there's the > Hitachi Vantara AMS setting PRIO_HDS, and ONTAP using PRIO_ONTAP. The > big difference is that in detect_prio(), RDAC arrays are configured to > use ALUA rather than RDAC. So if we switch the precedence, all those > arrays would switch from ALUA to RDAC. I agree this would be surprising > and undesirable. Are these arrays still configurable to not support > ALUA? If no, we could just remove the RDAC entries. Same for ONTAP. > > Anyway, _if_ we change the precedence rules, we should do it for both > detect_prio and detect_checker, otherwise our behavior might appear > more inconsistent than it's now. Agreed. the hardware_handler also works in a similar way (with retaining the kernel detected one by default, and using the device's section hardware_handler option as a backup). I don't think we want to change that one either. -Ben > For now, I'll ack your patch and let this discussion sink in a bit. > Comments from users welcome! > > Regards > Martin > -- dm-devel mailing list dm-devel@xxxxxxxxxx https://listman.redhat.com/mailman/listinfo/dm-devel