On Fri, Feb 09, 2024 at 04:53:50PM +0100, Martin Wilck wrote: > On Thu, 2024-02-08 at 20:03 -0500, Benjamin Marzinski wrote: > > On Mon, Feb 05, 2024 at 01:46:36PM +0100, Martin Wilck wrote: > > > If a map reload happens while udev is processing rules for a > > > coldplug > > > event, DM_SUSPENDED may be set if the respective test in 10- > > > dm.rules > > > happens while the device is suspened. This will cause the rules for > > > all > > > higher block device layers to be skipped. Record this situation in > > > an udev > > > property. The reload operation will trigger another "change" uevent > > > later, > > > which would normally be treated as a reload, and be ignored without > > > rescanning the device. If a previous "coldplug while suspended" > > > situation is > > > detected, perform a full device rescan instead. > > > > For what it's worth, this seems reasonable. > > If that's so, I'd appreciate a Reviewed-by: :-) > See below. > > > But I do see the issue you > > pointed out where we can't trust DM_SUSPENDED. > > That was my suspicion originally, but the actual problem that was > observed was different, see my recent response to 1/6. > > Yes, it can happen any time that a device gets suspended between the > DM_SUSPENDED check and an attempt to do actual I/O in later udev rules, > but that is much less likely than the other case. So far I haven't been > able to actually observe it. > > Btw am I understanding correctly that you don't like the idea of going > back to exclusive locking? Oh, and yeah, I'd rather avoid exclusive locking if possible, just out of a hunch that it could lead to other unexpected problems. -Ben > > > Perhaps we could track > > in multipathd if an add event occured for a path between when we > > issued > > a reload, and when we got the change event (unfortunately, change > > events > > can occur for things other than reloads that multipathd triggers, and > > the only way I know to accurately associate a uevent with a reload is > > by > > using dm udev cookies. We have those turned off in multipathd and I > > think it would be a good bit of work to turn them back on without > > causing lots of waiting with locks held in multipathd). > > IMO doing this right in multipathd will be hard. We must be careful not > to trigger too many additional uevents just because something *might* > have gone wrong during udev handling (most of the time all is well, > even if a reload happens). I believe to do this properly we must listen > for *kernel* uevents, too, and that's something we've been trying to > avoid for good reason. > > I had a different idea: to track a "reload count" for a map somehow > (e.g. in a file under /run/multipath) and checking that at the > beginning and end of uevent handling in order to see if a reload > occurred while the uevent was being processed (which would be detected > by seeing a different the reload count change during the uevent). > > For now, I hope that this change plus my latest addition will make the > practical issue go away. All else can be discussed later. > We haven't got any reports about this sort of race for years, so it's > safe to assume that happens very rarely. > > Regards > Martin