Re: [PATCH 4/5] multipathd: reset __delayed_reconfig from ev_add_map()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2022-03-16 at 15:13 -0500, Benjamin Marzinski wrote:
> On Mon, Mar 14, 2022 at 10:30:35PM +0100, mwilck@xxxxxxxx wrote:
> > From: Martin Wilck <mwilck@xxxxxxxx>
> > 
> > With the previous patch, one race condition between child() and
> > the uevent handler (ev_add_map()) remains: 1. child() sets
> > __delayed_reconfig, 2. ev_add_map() calls schedule_reconfigure()
> > and
> > thus DAEMON_CONFIGURE, 3. child() sets DAEMON_IDLE. This would
> > cause
> > the pending reconfigure request to be missed.
> > 
> > To fix this, reset __delayed_reconfig immediately from ev_add_map()
> > (and likewise, missing_uev_wait_tick()). This way the wait loop in
> > child() terminates on the reconfigure_pending condition.
> > 
> > Also, these schedule_reconfigure() callers don't really request a
> > reconfigure() call; they just unblock processing previously
> > requested
> > reconfigure() calls. Add a new helper, unblock_reconfigure(), that
> > does just that.
> > 
> 
> Doesn't unblock_reconfigure() allow us to set the state to
> DAEMON_CONFIGURE, right after the checkerloop() has set it to
> DAEMON_RUNNING.

Right. I added the __post_config_state() there in order to be sure that
child() would be woken up. But unblock_reconfigure() should only call
__post_config_state if running_state is DAEMON_IDLE, like
schedule_reconfigure(). Otherwise, it can be sure that some other
process will wake up child().

>  schedule_reconfigure() didn't update the state if it was
> DAEMON_RUNNING. Instead, it would wait till checkerloop() set the
> state
> back to DAEMON_IDLE for the reconfigure to start.

> I have also come up with a different bug that effects these fixes,
> including mine. But it will effect this fix worse. This fix assumes
> that
> when a reconfigure is delayed, it should remain delayed until either
> a
> change event happens on the multipath device (ev_add_map) or
> multipathd
> times out waiting for one (missing_uev_wait_tick). However the map
> could
> be removed before either of those things happen.  It's possible that
> multipathd could get a remove uevent after the add event but before
> the
> change event. Either that or a multipathd client command could remove
> it, or a dm event could happen either removing the device, or
> updating
> it, but with __setup_multipath() removing it. Those are just the
> examples I came up with off the top of my head.


> So there's a specific problem where we don't handle clearing
> __delayed_reconfig if a map was removed. That can be handled in a
> different patch.  But this, or any other oversight we might have here
> can be mitigated by being more proactive in removing
> __delayed_reconfig.
> For instance, when then main thread tries to reconfigure, it checks
> need_to_delay_reconfig(), and only does the reconfigure if this
> returns
> that it's ok.  __delayed_reconfig mostly exists so that the main
> thread
> doesn't need to grab the vecs lock and loop through the the multipath
> devices to know if it need to delay, but need_to_delay_reconfig() is
> the definitive answer. When the main thread checks this, we should be
> updating __delayed_reconfig to match the result. This patch removes
> the
> "__delayed_reconfig = false;" line, which I think is a mistake.

Agreed. My thinking was that this is done by the uevent handler when
the state of need_to_delay_reconfig() changes. But you're right with
the "remove" special case. We should re-add this, it definitely doesn't
hurt.

> Imagine if, because a map got removed before ev_add_map() was called,
> __delayed_reconfig was true, even though there were no maps with
> mpp->wait_for_udev set. Any existing config request would still be
> delayed, but if another reconfigure was requested not only would it
> happen, but if the "__delayed_reconfig = false;" line was back,
> __delayed_reconfig would go back to the correct value.

So we should call unblock_reconfigure() from remove_map().
Unfortunately that requires another callback from libmultipath into
multipathd. Trying to figure it out... 

Thanks,
Martin

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://listman.redhat.com/mailman/listinfo/dm-devel





[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux