Re: [PATCH 03/13] multipathd: allow map removal in do_sync_mpp()

Martin Wilck <mwilck@xxxxxxxx> · Wed, 11 Dec 2024 13:06:46 +0100

On Tue, 2024-12-10 at 18:30 -0500, Benjamin Marzinski wrote:
> On Sat, Dec 07, 2024 at 12:36:07AM +0100, Martin Wilck wrote:
> > We previously didn't allow map removal inside the checker loop. But
> > with the late updates to the checkerloop code, it should be safe to
> > orphan
> > paths and delete maps even in this situation. We remove such maps
> > everywhere
> > else in the code already, whenever refresh_multipath() or
> > setup_multipath()
> > is called.
> 
> Actually, thinking about this more, what do we get by proactively
> deleting the multipath device if something goes wrong in the checker?
> If
> we successfully reload a device, but can't sync it with the kernel,
> that's one thing, But that was triggered by a change in the device,
> and
> we know that when we reloaded the device, device-mapper was working.
> I'm
> leery of possibly deleting the map because of a transient device-
> mapper
> issue.  I'm not sure if on a check that we do repeatedly, we should
> delete the device on an error.  We haven't in the past, and as far as
> I
> know, it doesn't cause problems.  

I don't disagree. But the same can be said for basically all call
chains where setup_multipath() is called for an existing map. I was
just following the pattern that we use e.g. in ev_add_path(), or in
update_mpp_prio(). Why would we treat the checker and path addition
differently in this respect?

If we look at this pragmatically (assuming that multipathd gets the
parameters right), the most probable reason for a map reload failure is
failure to open a path device in bdev_open(), either because the device
doesn't exist, or because it's busy or otherwise unavailable. If this
happens in ev_add_path(), the likely reason is that the path just added
was busy, and the smartest action upon such a failure would probably be
to just undo that addition. We currently don't do that; we remove the
entire map, which is questionable, as you state correctly.

In the checker, this can't happen. Obviously, no other process can grab
a path device while the device mapper is holding it, so -EBUSY won't
occur if we reload an existing map. Even device deletion doesn't cause
failure on reload. It is possible to delete a SCSI device while it's
mapped, and execute a table reload / suspend / resume cycle on the map
while referencing the deleted device. The kernel keeps holding the
reference to the deleted device, and will simply mark it as
failed. This holds also if the mapped paths are re-grouped or re-
ordered in the table. Failure occurs only if we temporarily remove the
device from the map and re-add it, because as soon as the device is
removed from the map's dm table, its refcount drops to zero, and it's
gone for good.

IOW, reloading a map with a table containing only already-mapped
devices will never fail, except in extreme situations like kernel OOM.

Thus, AFAICS, the only relevant scenario where a reloading would fail
is trying to add a path device that was not previously mapped, and
that's either busy (perhaps in another map) or has been deleted, IOW
only when we reload after calling adopt_paths(). This is where we could
improve. If we fail to reload after adopting new paths, we could fall
back to the existing table, and perhaps try to add paths one by one.
Again, this is post-0.11 material.

OTOH, practially impossible is not totally impossible, so we need to be
prepared to map reload failure either way. IMO the best thing we can do
in this case is to keep using the kernel's map, and retry reloading
later. 

The only critical situation is WWID change of path devices. We must try
to fix this situation ASAP when we detect it. I'm unsure what the best
action is if a reload fails in that situation, though (other than
failing the path, as we already do).

Martin