On Tue, Feb 06, 2024 at 11:50:46AM +0100, Martin Wilck wrote: > On Mon, 2024-02-05 at 15:44 -0500, Benjamin Marzinski wrote: > > On Mon, Feb 05, 2024 at 01:46:33PM +0100, Martin Wilck wrote: > > > DM_UDEV_DISABLE_OTHER_RULES_FLAG and DM_NOSCAN may be already set > > > from previous rules, e.g. if the device is suspended. Make sure > > > we don't overwrite them. > > > > > > DM_DISABLE_OTHER_RULES_FLAG_OLD and MPATH_DEVICE_READY are only > > > used in this file, and not used in the scan_import code path. > > > > > > Signed-off-by: Martin Wilck <mwilck@xxxxxxxx> > > > --- > > > multipath/11-dm-mpath.rules | 15 +++++++++++---- > > > 1 file changed, 11 insertions(+), 4 deletions(-) > > > > > > diff --git a/multipath/11-dm-mpath.rules b/multipath/11-dm- > > > mpath.rules > > > index c339f52..2c4d006 100644 > > > --- a/multipath/11-dm-mpath.rules > > > +++ b/multipath/11-dm-mpath.rules > > > @@ -2,12 +2,19 @@ ACTION!="add|change", GOTO="mpath_end" > > > ENV{DM_UDEV_RULES_VSN}!="?*", GOTO="mpath_end" > > > ENV{DM_UUID}!="mpath-?*", GOTO="mpath_end" > > > > > > -IMPORT{db}="DM_DISABLE_OTHER_RULES_FLAG_OLD" > > > -IMPORT{db}="MPATH_DEVICE_READY" > > > - > > > # If this uevent didn't come from dm, don't try to update the > > > # device state > > > -ENV{DM_COOKIE}!="?*", ENV{DM_ACTION}!="PATH_*", > > > IMPORT{db}="DM_UDEV_DISABLE_OTHER_RULES_FLAG", > > > IMPORT{db}="DM_NOSCAN", GOTO="scan_import" > > > +ENV{DM_COOKIE}=="?*", GOTO="check_ready" > > > +ENV{DM_ACTION}=="PATH_*", GOTO="check_ready" > > > + > > > +ENV{DM_UDEV_DISABLE_OTHER_RULES_FLAG}!="?*", > > > IMPORT{db}="DM_UDEV_DISABLE_OTHER_RULES_FLAG" > > > +ENV{DM_NOSCAN}!="?*", IMPORT{db}="DM_NOSCAN" > > > +GOTO="scan_import" > > > + > > > > If we do this, don't we forget the values for > > DM_DISABLE_OTHER_RULES_FLAG_OLD and MPATH_DEVICE_READY whenever we > > get a > > non-dm uevent? If we skip importing them for a uevent, they're > > dropped > > from the database, which means that on the next dm-originated uevent > > we > > won't be able to get the old values. right? > > Right, I didn't consider that. It makes sense for MPATH_DEVICE_READY. > > However, while at it, I wonder about our rationale for saving > DM_UDEV_DISABLE_OTHER_RULES_FLAG in DM_DISABLE_OTHER_RULES_FLAG_OLD. > > In 10-dm.rules, DM_DISABLE_OTHER_RULES_FLAG changes its value only > - in DM_UDEV_PRIMARY_SOURCE_FLAG=1 events, according to the value of > DM_SUSPENDED > - for DISK_RO=1 events (switches the flag on) > > (11-dm-lvm.rules has some additional logic that doesn't matter for > multipath). > > For all other events, the flag is restored from the udev db in 10- > dm.rules. Ignoring DISK_RO, it always has the value that DM_SUSPENDED > had in the last DM_UDEV_PRIMARY_SOURCE_FLAG=1 (aka map reload) event. > > The logic in 11-dm-mpath.rules adds a check for unusable arrays > on top, setting DM_DISABLE_OTHER_RULES_FLAG if MPATH_DEVICE_READY > switches from 1 to 0. In this case we save the previous value in > DM_DISABLE_OTHER_RULES_FLAG_OLD, and restore it from there in a later > event if MPATH_DEVICE_READY switches back from 0 to 1. > > IMO the following can happen: > > 1. an event arrives while DM_SUSPENDED=1, causing > DM_DISABLE_OTHER_RULES_FLAG=1 to be set > 2. 11-dm-mpath.rules sees no paths and saves > DM_DISABLE_OTHER_RULES_FLAG=1 to DM_DISABLE_OTHER_RULES_FLAG_OLD > 3. an event arrives after the device has resumed, DM_SUSPENDED and > DM_DISABLE_OTHER_RULES_FLAG are cleared in 10-dm.rules > 4. 11-dm-mpath.rules sees MPATH_DEVICE_READY=1 and restores > DM_DISABLE_OTHER_RULES_FLAG, setting it to 1 > > ... and this would be wrong, no? > > It seems to me that we should not save DM_DISABLE_OTHER_RULES_FLAG_OLD > between uevents. We must set DM_DISABLE_OTHER_RULES_FLAG=1 to avoid > upper layer probing if there are no paths, but I suppose we should > restore the previous value after udev rule execution, e.g. in 99-dm- > mpath.rules: > > ENV{DM_DISABLE_OTHER_RULES_FLAG_OLD}=="?*", \ > ENV{DM_DISABLE_OTHER_RULES_FLAG}=$env{DM_DISABLE_OTHER_RULES_FLAG_OLD}, \ > ENV{DM_DISABLE_OTHER_RULES_FLAG_OLD}="" > > Am I confusing stuff here? Nope. That makes a lot of sense. > Unfortunately, testing of my patch series has shown that it's an > improvement, but it isn't sufficient for completely avoiding races > between multipathd and udev. DM_SUSPENDED=1 can be avoided, but it's > still possible that the device gets suspended after the udev rules test > for supended state and before they run kpartx, blkid, pvscan, or other > similar commands. > > I am quite clueless about this right now, and would be grateful for > ideas. Re-implementing LOCK_EX locking would be a possibility for > systemd >= 250, as noted in the cover letter of the series. But for > systemd <= 249, I don't see good options. We can't use LOCK_EX, because > when we release the lock, we have no means to know whether or not a > race with udev occurred (iow, whether udev tried to access the device > while we held the lock, failed, and dropped the event). Thus we'd need > to assume that this was the case, and force a reload after each reload, > which is obvious nonsense. We also have no means to know whether the > full set of rules has ever been run by udev for the device at hand, > because we don't know the set of rules that follow after 11-dm- > mpath.rules. multipathd already refuses to reload newly created devices before it sees the creation uevents to try to avoid this. I assume that the problem you are seeing is on the coldplug after the pivot root, where the devices already exist, or am I confused here. I wonder if we can do something similar where we would ideally delay all device reloads until after the coldplug. The problem is figuring out when it's finished. -Ben > Thanks, > Martin