Re: [PATCH 1/6] 11-dm-mpath.rules: don't import properties that are already set

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2024-02-08 at 19:30 -0500, Benjamin Marzinski wrote:
> On Tue, Feb 06, 2024 at 11:50:46AM +0100, Martin Wilck wrote:
> > 
> > Unfortunately, testing of my patch series has shown that it's an
> > improvement, but it isn't sufficient for completely avoiding races
> > between multipathd and udev. DM_SUSPENDED=1 can be avoided, but
> > it's
> > still possible that the device gets suspended after the udev rules
> > test
> > for supended state and before they run kpartx, blkid, pvscan, or
> > other
> > similar commands.
> > 
> > I am quite clueless about this right now, and would be grateful for
> > ideas. Re-implementing LOCK_EX locking would be a possibility for
> > systemd >= 250, as noted in the cover letter of the series. But for
> > systemd <= 249, I don't see good options. We can't use LOCK_EX,
> > because
> > when we release the lock, we have no means to know whether or not a
> > race with udev occurred (iow, whether udev tried to access the
> > device
> > while we held the lock, failed, and dropped the event). Thus we'd
> > need
> > to assume that this was the case, and force a reload after each
> > reload,
> > which is obvious nonsense. We also have no means to know whether
> > the
> > full set of rules has ever been run by udev for the device at hand,
> > because we don't know the set of rules that follow after 11-dm-
> > mpath.rules.
> 
> multipathd already refuses to reload newly created devices before it
> sees the creation uevents to try to avoid this.

Right, thanks for reminding me about that.

>   I assume that the
> problem you are seeing is on the coldplug after the pivot root, where
> the devices already exist, or am I confused here.  

No, that's exactly the situation in which the problem is observed.

> I wonder if we can do
> something similar where we would ideally delay all device reloads
> until
> after the coldplug.  The problem is figuring out when it's finished.

The current state of affairs is that patch 4 in my series has avoided
the issue on coldplug events, greatly reducing the frequency of errors
on the test sysstem. There's a variant that needs an additional fix:

 1 map set up in initrd
 2 creation uevent event is observed by multipathd
 3 switch root
 4 path addition, map reload
 5 change event
 6 path addition, another map reload (5-6 can repeat)
 7 DM_SUSPENDED is observed by udev while handling the change event,
   and saved as DM_UDEV_DISABLE_OTHER_RULES_FLAG
 8 Coldplug event, DM_UDEV_DISABLE_OTHER_RULES_FLAG is restored from
   udev db => pvscan is skipped

I've created another patch to fix this situation, and am now waiting
for for test feedback. I'll post about this in more detail later.
In short, I believe 10-dm.rules should differentiate
DM_UDEV_DISABLE_OTHER_RULES_FLAG originating from lvm itself and
DM_SUSPENDED.

Regards,
Martin






[Index of Archives]     [Gluster Users]     [Kernel Development]     [Linux Clusters]     [Device Mapper]     [Security]     [Bugtraq]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]

  Powered by Linux