On Mon, 2022-03-21 at 19:34 -0500, Benjamin Marzinski wrote: > On Fri, Mar 18, 2022 at 5:33 PM <mwilck@xxxxxxxx> wrote: > > > > From: Martin Wilck <mwilck@xxxxxxxx> > > > > If multipathd needs to delay reconfigure() because it's waiting for > > a map > > creation uevent, it can happen that child() isn't woken up if the > > map > > being waited for is removed before the uevent arrives. Fix this by > > unblocking reconfigure() with the remove_map_callback() function. > > > > Signed-off-by: Martin Wilck <mwilck@xxxxxxxx> > > --- > > multipathd/main.c | 11 +++++++++++ > > 1 file changed, 11 insertions(+) > > > > diff --git a/multipathd/main.c b/multipathd/main.c > > index f3b8eb4..4721cd8 100644 > > --- a/multipathd/main.c > > +++ b/multipathd/main.c > > @@ -319,6 +319,17 @@ static bool unblock_reconfigure(void) > > return was_delayed; > > } > > > > +/* > > + * Make sure child() is woken up when a map is removed that > > multipathd > > + * is currently waiting for. > > + * Overrides libmultipath's weak symbol by the same name > > + */ > > +void remove_map_callback(struct multipath *mpp) > > +{ > > + if (mpp->wait_for_udev > 0) > > Is there a reason why you don't decrement wait_for_udev, and check > need_to_delay_reconfig() like in ev_add_map()? I realize that it > doesn't hurt anything to unblock the reconfigure even if there are > other devices that need a delay. The main thread will notice that it > still needs to delay. I'm just wondering why we do it differently > here? The main reason was to keep it simple. need_to_delay_reconfig() needs to be passed a "vecs" pointer, and requires the vecs lock to be held. remove_map() is deep down in the call stack and has lots of direct and indirect callers. It can be called with mpvec == NULL and (in theory) also with pathvec == NULL, in which case it simply frees the memory obtained by the map, without unlinking itself or its members from any vectors. In that case it *could* be called without the vecs lock held (AFAICS, that's not the case today, but the function could be used this way, e.g. in an error handling code path). I thought it was easier and safer not to have to assert that every current and future caller holds the vecs lock, in particular because this is called indirectly from libmultipath, the function that grabs the lock is usually high up in the call stack. I had indeed pondered whether I should remove the call to need_to_delay_reconfig() from the ev_add_map(), too. I decided against it, because I realized that waking up child() for nothing is not cheap,as child() needs to grab the vecs lock just for calling need_to_delay_reconfig(). We should avoid this for the common case of an uevent which we are waiting for. The remove_map() code path, OTOH, is a corner case (map being removed while in the process of being added). We need to cover it, but we know that this code path will be rarely executed in practice, and thus unlikely to cause vecs lock contention. I hope this makes sense. Regards Martin -- dm-devel mailing list dm-devel@xxxxxxxxxx https://listman.redhat.com/mailman/listinfo/dm-devel