On Sat, 2023-11-11 at 20:51 +0800, Heming Zhao wrote: > I remember we discussed about the mpath filter before. It looks lvm2 > developers didn't trust udev and wrote hard-coded scanning actions > (see commit 3b0f9cec7e999, and below function > dev_is_mpath_component()) to replace mpath+udev. But in SUSE env, we > had tested/ran a long time and worked fine with setting up lvm2 under > obtain_device_list_from_udev=1 & external_device_info_source = > "udev". > > From SUSE env, below function at least should put line 702~705 to > the beginning of this function. In the other word, consulting udev > first, then back off to hard-coded checks. > I don't know if the "udev+mpio+lvm2" combination in RedHat > environments often encounters problems with abnormal startup. From > SUSE env, it seems we do revert 3b0f9cec7e999 may got better result. Adding Ben as RH's multipath maintainer, and Hannes. TL;DR: I believe that 3b0f9ce ("filter-mpath: get wwids from sysfs vpd_pg83") is wrong. With "external_device_info_source = udev", LVM must fully rely on udev properties. Long story: multipath-tools has complex logic for determining whether a given device should be considered a multipath component. This logic depends non-trivially on configuration settings in multipath.conf. Other tools are ill-advised to try to re-implement multipath's logic. We have a mechanism that works. multipath and multipathd work together to set the udev property DM_MULTIPATH_DEVICE_PATH on potential multipath component devices to indicate multipath's own decision about the device. I can't stress enough that this is *the only mechanism* that works correctly. udev serves as central hub to retrieve device properties from, and this is how it ought to be. In know that LVM maintainers have a low opinion about udev. But all issues that I've been made aware of in the last couple of years have been addressed. There have been problems with all tools involved — multipath, lvm, udev and udev rules, systemd's device activation logic, dracut — but I firmly believe that they have been overcome, and that LVM can rely on DM_MULTIPATH_DEVICE_PATH safely on every real-world system. The only exceptions I am aware of are environments where udev isn't available, such as image build environments. If you know about any counter- examples, please let me know. As multipath maintainers, we are determined to fix them [1]. >From the code Heming showed, _dev_is_mpath_component_sysfs() is ok-ish, but redundant; DM_MULTIPATH_DEVICE_PATH implements the same logic. _dev_in_wwid_file() is wrong though. There are various possible cases in which a a device should not be part of a multipath map even though its WWID is listed in the WWIDs file. multipath might be disabled via systemd or kernel command line, the device might be blacklisted, or marked as "failed_wwid" [2]. This list is incomplete. DM_MULTIPATH_DEVICE_PATH takes all these possibilities into account, LVM's new logic does not. I am not suggesting that LVM improve it's implementation of multipath component detection. Rather, LVM must rely on DM_MULTIPATH_DEVICE_PATH if "external_device_info_source = udev". Current LVM release are lying about external_device_info_source when it's set to udev, as they do _not_ respect what udev tells them. If you really need a mode in which udev properties are only partially respected, don't call it "external_device_info_source = udev". Regards Martin PS: Here's a related remark about 17a3585 ("pvscan: use alternate device names from DEVLINKS to check filter"). I can see why this was necessary, but I don't understand why this is found to be necessary _now_; the same issue should have always existed if "pvscan" is running during a "change" event for any given device. The solution of 17a3585 "worked" for us, but it looks only semi-ok to me. Other udev rules may modify the DEVLINKS list after pvscan had been running. A correct solution must make sure that pvscan runs after all udev rules. IOW, pvscan should be triggered in a udev RUN= statement rather then IMPORT=. This would probably require a new systemd service, because it's not just "pvscan" alone. But the result would be more robust then what we currently have. [1] I assume that commit 3b0f9ce has been created to work around some problem. I'd appreciate if multipath maintainers were involved in issues like this. If I'd been involved, I would have told you that I believe the approach of 3b0f9ce is wrong, and I'm pretty sure we would have found a solution that respects the udev properties. [2] meaning that previous attempts to set up a multipath map on the device have failed.