On Wed, 2022-03-16 at 15:13 -0500, Benjamin Marzinski wrote: > On Mon, Mar 14, 2022 at 10:30:35PM +0100, mwilck@xxxxxxxx wrote: > > From: Martin Wilck <mwilck@xxxxxxxx> > > > > With the previous patch, one race condition between child() and > > the uevent handler (ev_add_map()) remains: 1. child() sets > > __delayed_reconfig, 2. ev_add_map() calls schedule_reconfigure() > > and > > thus DAEMON_CONFIGURE, 3. child() sets DAEMON_IDLE. This would > > cause > > the pending reconfigure request to be missed. > > > > To fix this, reset __delayed_reconfig immediately from ev_add_map() > > (and likewise, missing_uev_wait_tick()). This way the wait loop in > > child() terminates on the reconfigure_pending condition. > > > > Also, these schedule_reconfigure() callers don't really request a > > reconfigure() call; they just unblock processing previously > > requested > > reconfigure() calls. Add a new helper, unblock_reconfigure(), that > > does just that. > > > > Doesn't unblock_reconfigure() allow us to set the state to > DAEMON_CONFIGURE, right after the checkerloop() has set it to > DAEMON_RUNNING. Right. I added the __post_config_state() there in order to be sure that child() would be woken up. But unblock_reconfigure() should only call __post_config_state if running_state is DAEMON_IDLE, like schedule_reconfigure(). Otherwise, it can be sure that some other process will wake up child(). > schedule_reconfigure() didn't update the state if it was > DAEMON_RUNNING. Instead, it would wait till checkerloop() set the > state > back to DAEMON_IDLE for the reconfigure to start. > I have also come up with a different bug that effects these fixes, > including mine. But it will effect this fix worse. This fix assumes > that > when a reconfigure is delayed, it should remain delayed until either > a > change event happens on the multipath device (ev_add_map) or > multipathd > times out waiting for one (missing_uev_wait_tick). However the map > could > be removed before either of those things happen. It's possible that > multipathd could get a remove uevent after the add event but before > the > change event. Either that or a multipathd client command could remove > it, or a dm event could happen either removing the device, or > updating > it, but with __setup_multipath() removing it. Those are just the > examples I came up with off the top of my head. > So there's a specific problem where we don't handle clearing > __delayed_reconfig if a map was removed. That can be handled in a > different patch. But this, or any other oversight we might have here > can be mitigated by being more proactive in removing > __delayed_reconfig. > For instance, when then main thread tries to reconfigure, it checks > need_to_delay_reconfig(), and only does the reconfigure if this > returns > that it's ok. __delayed_reconfig mostly exists so that the main > thread > doesn't need to grab the vecs lock and loop through the the multipath > devices to know if it need to delay, but need_to_delay_reconfig() is > the definitive answer. When the main thread checks this, we should be > updating __delayed_reconfig to match the result. This patch removes > the > "__delayed_reconfig = false;" line, which I think is a mistake. Agreed. My thinking was that this is done by the uevent handler when the state of need_to_delay_reconfig() changes. But you're right with the "remove" special case. We should re-add this, it definitely doesn't hurt. > Imagine if, because a map got removed before ev_add_map() was called, > __delayed_reconfig was true, even though there were no maps with > mpp->wait_for_udev set. Any existing config request would still be > delayed, but if another reconfigure was requested not only would it > happen, but if the "__delayed_reconfig = false;" line was back, > __delayed_reconfig would go back to the correct value. So we should call unblock_reconfigure() from remove_map(). Unfortunately that requires another callback from libmultipath into multipathd. Trying to figure it out... Thanks, Martin -- dm-devel mailing list dm-devel@xxxxxxxxxx https://listman.redhat.com/mailman/listinfo/dm-devel