On Tue, 2024-12-10 at 18:30 -0500, Benjamin Marzinski wrote: > On Sat, Dec 07, 2024 at 12:36:07AM +0100, Martin Wilck wrote: > > We previously didn't allow map removal inside the checker loop. But > > with the late updates to the checkerloop code, it should be safe to > > orphan > > paths and delete maps even in this situation. We remove such maps > > everywhere > > else in the code already, whenever refresh_multipath() or > > setup_multipath() > > is called. > > Actually, thinking about this more, what do we get by proactively > deleting the multipath device if something goes wrong in the checker? > If > we successfully reload a device, but can't sync it with the kernel, > that's one thing, But that was triggered by a change in the device, > and > we know that when we reloaded the device, device-mapper was working. > I'm > leery of possibly deleting the map because of a transient device- > mapper > issue. I'm not sure if on a check that we do repeatedly, we should > delete the device on an error. We haven't in the past, and as far as > I > know, it doesn't cause problems. I don't disagree. But the same can be said for basically all call chains where setup_multipath() is called for an existing map. I was just following the pattern that we use e.g. in ev_add_path(), or in update_mpp_prio(). Why would we treat the checker and path addition differently in this respect? If we look at this pragmatically (assuming that multipathd gets the parameters right), the most probable reason for a map reload failure is failure to open a path device in bdev_open(), either because the device doesn't exist, or because it's busy or otherwise unavailable. If this happens in ev_add_path(), the likely reason is that the path just added was busy, and the smartest action upon such a failure would probably be to just undo that addition. We currently don't do that; we remove the entire map, which is questionable, as you state correctly. In the checker, this can't happen. Obviously, no other process can grab a path device while the device mapper is holding it, so -EBUSY won't occur if we reload an existing map. Even device deletion doesn't cause failure on reload. It is possible to delete a SCSI device while it's mapped, and execute a table reload / suspend / resume cycle on the map while referencing the deleted device. The kernel keeps holding the reference to the deleted device, and will simply mark it as failed. This holds also if the mapped paths are re-grouped or re- ordered in the table. Failure occurs only if we temporarily remove the device from the map and re-add it, because as soon as the device is removed from the map's dm table, its refcount drops to zero, and it's gone for good. IOW, reloading a map with a table containing only already-mapped devices will never fail, except in extreme situations like kernel OOM. Thus, AFAICS, the only relevant scenario where a reloading would fail is trying to add a path device that was not previously mapped, and that's either busy (perhaps in another map) or has been deleted, IOW only when we reload after calling adopt_paths(). This is where we could improve. If we fail to reload after adopting new paths, we could fall back to the existing table, and perhaps try to add paths one by one. Again, this is post-0.11 material. OTOH, practially impossible is not totally impossible, so we need to be prepared to map reload failure either way. IMO the best thing we can do in this case is to keep using the kernel's map, and retry reloading later. The only critical situation is WWID change of path devices. We must try to fix this situation ASAP when we detect it. I'm unsure what the best action is if a reload fails in that situation, though (other than failing the path, as we already do). Martin