Re: discuss about commit 3b0f9ce: filter-mpath: get wwids from sysfs vpd_pg83

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, 2023-11-11 at 20:51 +0800, Heming Zhao wrote:

> I remember we discussed about the mpath filter before. It looks lvm2
> developers didn't trust udev and wrote hard-coded scanning actions
> (see commit 3b0f9cec7e999, and below function
> dev_is_mpath_component()) to replace mpath+udev. But in SUSE env, we
> had tested/ran a long time and worked fine with setting up lvm2 under
> obtain_device_list_from_udev=1 & external_device_info_source =
> "udev".
> 
>  From SUSE env, below function at least should put line 702~705 to
> the beginning of this function. In the other word, consulting udev
> first, then back off to hard-coded checks.
> I don't know if the "udev+mpio+lvm2" combination in RedHat
> environments often encounters problems with abnormal startup. From
> SUSE env, it seems we do revert 3b0f9cec7e999 may got better result.

Adding Ben as RH's multipath maintainer, and Hannes.

TL;DR: I believe that 3b0f9ce ("filter-mpath: get wwids from sysfs
vpd_pg83") is wrong. With "external_device_info_source = udev", LVM
must fully rely on udev properties.

Long story:

multipath-tools has complex logic for determining whether a given
device should be considered a multipath component. This logic depends
non-trivially on configuration settings in multipath.conf. Other tools
are ill-advised to try to re-implement multipath's logic.

We have a mechanism that works. multipath and multipathd work together
to set the udev property DM_MULTIPATH_DEVICE_PATH on potential
multipath component devices to indicate multipath's own decision about
the device.

I can't stress enough that this is *the only mechanism* that works
correctly. udev serves as central hub to retrieve device properties
from, and this is how it ought to be. In know that LVM maintainers have
a low opinion about udev. But all issues that I've been made aware of
in the last couple of years have been addressed. There have been
problems with all tools involved — multipath, lvm, udev and udev rules,
systemd's device activation logic, dracut — but I firmly believe that
they have been overcome, and that LVM can rely on
DM_MULTIPATH_DEVICE_PATH safely on every real-world system. The only
exceptions I am aware of are environments where udev isn't available,
such as image build environments. If you know about any counter-
examples, please let me know. As multipath maintainers, we are
determined to fix them [1].

>From the code Heming showed, _dev_is_mpath_component_sysfs() is ok-ish,
but redundant; DM_MULTIPATH_DEVICE_PATH implements the same logic.
_dev_in_wwid_file() is wrong though. There are various possible cases
in which a a device should not be part of a multipath map even though
its WWID is listed in the WWIDs file. multipath might be disabled via
systemd or kernel command line, the device might be blacklisted, or
marked as "failed_wwid" [2]. This list is incomplete.
DM_MULTIPATH_DEVICE_PATH takes all these possibilities into account,
LVM's new logic does not. 

I am not suggesting that LVM improve it's implementation of multipath
component detection.

Rather, LVM must rely on DM_MULTIPATH_DEVICE_PATH if
"external_device_info_source = udev". Current LVM release are lying
about external_device_info_source when it's set to udev, as they do
_not_ respect what udev tells them. If you really need a mode in which
udev properties are only partially respected, don't call it
"external_device_info_source = udev".

Regards
Martin


PS: Here's a related remark about 17a3585 ("pvscan: use alternate
device names from DEVLINKS to check filter"). I can see why this was
necessary, but I don't understand why this is found to be necessary
_now_; the same issue should have always existed if "pvscan" is running
during a "change" event for any given device. The solution of 17a3585
"worked" for us, but it looks only semi-ok to me. Other udev rules may
modify the DEVLINKS list after pvscan had been running. A correct
solution must make sure that pvscan runs after all udev rules. IOW,
pvscan should be triggered in a udev RUN= statement rather then
IMPORT=. This would probably require a new systemd service, because
it's not just "pvscan" alone. But the result would be more robust then
what we currently have.

[1] I assume that commit 3b0f9ce has been created to work around some
problem. I'd appreciate if multipath maintainers were involved in
issues like this. If I'd been involved, I would have told you that I
believe the approach of 3b0f9ce is wrong, and I'm pretty sure we would
have found a solution that respects the udev properties.

[2] meaning that previous attempts to set up a multipath map on the
device have failed.





[Index of Archives]     [Gluster Users]     [Kernel Development]     [Linux Clusters]     [Device Mapper]     [Security]     [Bugtraq]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]

  Powered by Linux