Re: [PATCH 03/13] multipathd: allow map removal in do_sync_mpp()

Martin Wilck <mwilck@xxxxxxxx> · Wed, 11 Dec 2024 21:20:58 +0100

On Wed, 2024-12-11 at 12:09 -0500, Benjamin Marzinski wrote:
> On Wed, Dec 11, 2024 at 01:06:46PM +0100, Martin Wilck wrote:
> > On Tue, 2024-12-10 at 18:30 -0500, Benjamin Marzinski wrote:
> > > On Sat, Dec 07, 2024 at 12:36:07AM +0100, Martin Wilck wrote:
> > > > We previously didn't allow map removal inside the checker loop.
> > > > But
> > > > with the late updates to the checkerloop code, it should be
> > > > safe to
> > > > orphan
> > > > paths and delete maps even in this situation. We remove such
> > > > maps
> > > > everywhere
> > > > else in the code already, whenever refresh_multipath() or
> > > > setup_multipath()
> > > > is called.
> > > 
> > > Actually, thinking about this more, what do we get by proactively
> > > deleting the multipath device if something goes wrong in the
> > > checker?
> > > If
> > > we successfully reload a device, but can't sync it with the
> > > kernel,
> > > that's one thing, But that was triggered by a change in the
> > > device,
> > > and
> > > we know that when we reloaded the device, device-mapper was
> > > working.
> > > I'm
> > > leery of possibly deleting the map because of a transient device-
> > > mapper
> > > issue.  I'm not sure if on a check that we do repeatedly, we
> > > should
> > > delete the device on an error.  We haven't in the past, and as
> > > far as
> > > I
> > > know, it doesn't cause problems.  
> > 
> > I don't disagree. But the same can be said for basically all call
> > chains where setup_multipath() is called for an existing map. I was
> > just following the pattern that we use e.g. in ev_add_path(), or in
> > update_mpp_prio(). Why would we treat the checker and path addition
> > differently in this respect?
> 
> I'm confused here. 

Well, I was writing confused things. My thinking was going in circles
about the removal of paths and maps, and I didn't properly distinguish
between map reloading and updating the state from the kernel.

Sorry.

> ev_add_path() doesn't remove the device if the reload
> fails. If a reload fails, the table should stay the same. That's why
> I
> said that in other cases where we delete the device, we know that
> when
> we just reloaded the device, device-mapper was working. Looking at
> the
> code, that isn't really true. After failed reloads, we still call
> setup_multipath to update our state, and we will delete the device if
> that fails.

> This is why we call setup_multipath after failed reloads, to make
> sure
> multipathd's view of the multipath device resyncs with the kernel's,
> which hasn't changed from what it was before the reload failed.

Right.

> > In the checker, this can't happen. Obviously, no other process can
> > grab
> > a path device while the device mapper is holding it, so -EBUSY
> > won't
> > occur if we reload an existing map. Even device deletion doesn't
> > cause
> > failure on reload. It is possible to delete a SCSI device while
> > it's
> > mapped, and execute a table reload / suspend / resume cycle on the
> > map
> > while referencing the deleted device. The kernel keeps holding the
> > reference to the deleted device, and will simply mark it as
> > failed. This holds also if the mapped paths are re-grouped or re-
> > ordered in the table. Failure occurs only if we temporarily remove
> > the
> > device from the map and re-add it, because as soon as the device is
> > removed from the map's dm table, its refcount drops to zero, and
> > it's
> > gone for good.
> > 
> > IOW, reloading a map with a table containing only already-mapped
> > devices will never fail, except in extreme situations like kernel
> > OOM.
> 
> Maybe I should clarify my position a bit. I am fine with reloading
> the
> device in the checkerloop if something has changed. This obviously
> does run a very small risk of something going wrong and a device
> getting
> removed unnecessarily, but we know that we need to reload the device,
> so
> we should.
> 
> What I would rather avoid is reloading the device because we failed
> to
> get it's state in do_sync_mpp().

FTR, in my v4 patchset, I won't try to do that any more.

> I'm not actually worried about the kernel so much as libdevmapper. It
> is
> not designed for multi-threaded processes, and that has bitten us in
> the
> past. For intance, it's why we don't delete devices in dmevent_loop()
> on
> libdevmapper errors. dm_get_events() just waits and retries if
> getting
> the device list fails, and for each device, it calls dm_is_mpath and
> will only delete a device on DM_IS_MPATH_NO, which is what I
> suggested
> for the cleanup function.
> 
> I'm pretty sure we've handled all of the known issues here, with
> fixes
> like:
> 02d4bf07 ("libmultipath: protect racy libdevmapper calls with a
> mutex")
> 34e01d2f ("multipath-tools: don't call dm_lib_release() any more")
> 
> I'd rather not risk having missed some issue that could cause a
> temporary error in a function that we call every couple of seconds
> (almost always unnecessarily).

Ok, getting it. I thought that an error in DM_TABLE_STATUS must almost
neccessarily mean -ENXIO (from the kernel pov), which would mean that
some external entity removed the device, and that we should act as if
someone had used the "remove map" CLI command. But I didn't think about
libdevmapper.

Martin