Re: [PATCH 4/6] 11-dm-mpath.rules: handle reloads during coldplug events

Benjamin Marzinski <bmarzins@xxxxxxxxxx> · Fri, 9 Feb 2024 11:13:42 -0500

On Fri, Feb 09, 2024 at 04:53:50PM +0100, Martin Wilck wrote:
> On Thu, 2024-02-08 at 20:03 -0500, Benjamin Marzinski wrote:
> > On Mon, Feb 05, 2024 at 01:46:36PM +0100, Martin Wilck wrote:
> > > If a map reload happens while udev is processing rules for a
> > > coldplug
> > > event, DM_SUSPENDED may be set if the respective test in 10-
> > > dm.rules
> > > happens while the device is suspened. This will cause the rules for
> > > all
> > > higher block device layers to be skipped. Record this situation in
> > > an udev
> > > property. The reload operation will trigger another "change" uevent
> > > later,
> > > which would normally be treated as a reload, and be ignored without
> > > rescanning the device. If a previous "coldplug while suspended"
> > > situation is
> > > detected, perform a full device rescan instead.
> > 
> > For what it's worth, this seems reasonable. 
> 
> If that's so, I'd appreciate a Reviewed-by: :-)
> See below.

Reviewed-by: Benjamin Marzinski <bmarzins@xxxxxxxxxx>

> 
> > But I do see the issue you
> > pointed out where we can't trust DM_SUSPENDED.
> 
> That was my suspicion originally, but the actual problem that was
> observed was different, see my recent response to 1/6.
> 
> Yes, it can happen any time that a device gets suspended between the
> DM_SUSPENDED check and an attempt to do actual I/O in later udev rules,
> but that is much less likely than the other case. So far I haven't been
> able to actually observe it.
> 
> Btw am I understanding correctly that you don't like the idea of going
> back to exclusive locking?
> 
> >   Perhaps we could track
> > in multipathd if an add event occured for a path between when we
> > issued
> > a reload, and when we got the change event (unfortunately, change
> > events
> > can occur for things other than reloads that multipathd triggers, and
> > the only way I know to accurately associate a uevent with a reload is
> > by
> > using dm udev cookies. We have those turned off in multipathd and I
> > think it would be a good bit of work to turn them back on without
> > causing lots of waiting with locks held in multipathd). 
> 
> IMO doing this right in multipathd will be hard. We must be careful not
> to trigger too many additional uevents just because something *might*
> have gone wrong during udev handling (most of the time all is well,
> even if a reload happens). I believe to do this properly we must listen
> for *kernel* uevents, too, and that's something we've been trying to
> avoid for good reason.
> 
> I had a different idea: to track a "reload count" for a map somehow
> (e.g. in a file under /run/multipath) and checking that at the
> beginning and end of uevent handling in order to see if a reload
> occurred while the uevent was being processed (which would be detected
> by seeing a different the reload count change during the uevent).

That sounds much easier than messing with udev cookies.

-Ben

> For now, I hope that this change plus my latest addition will make the
> practical issue go away. All else can be discussed later.
> We haven't got any reports about this sort of race for years, so it's
> safe to assume that happens very rarely.
> 
> Regards
> Martin