On Wed, Dec 11, 2024 at 01:06:46PM +0100, Martin Wilck wrote: > On Tue, 2024-12-10 at 18:30 -0500, Benjamin Marzinski wrote: > > On Sat, Dec 07, 2024 at 12:36:07AM +0100, Martin Wilck wrote: > > > We previously didn't allow map removal inside the checker loop. But > > > with the late updates to the checkerloop code, it should be safe to > > > orphan > > > paths and delete maps even in this situation. We remove such maps > > > everywhere > > > else in the code already, whenever refresh_multipath() or > > > setup_multipath() > > > is called. > > > > Actually, thinking about this more, what do we get by proactively > > deleting the multipath device if something goes wrong in the checker? > > If > > we successfully reload a device, but can't sync it with the kernel, > > that's one thing, But that was triggered by a change in the device, > > and > > we know that when we reloaded the device, device-mapper was working. > > I'm > > leery of possibly deleting the map because of a transient device- > > mapper > > issue. I'm not sure if on a check that we do repeatedly, we should > > delete the device on an error. We haven't in the past, and as far as > > I > > know, it doesn't cause problems. > > I don't disagree. But the same can be said for basically all call > chains where setup_multipath() is called for an existing map. I was > just following the pattern that we use e.g. in ev_add_path(), or in > update_mpp_prio(). Why would we treat the checker and path addition > differently in this respect? I'm confused here. ev_add_path() doesn't remove the device if the reload fails. If a reload fails, the table should stay the same. That's why I said that in other cases where we delete the device, we know that when we just reloaded the device, device-mapper was working. Looking at the code, that isn't really true. After failed reloads, we still call setup_multipath to update our state, and we will delete the device if that fails. > If we look at this pragmatically (assuming that multipathd gets the > parameters right), the most probable reason for a map reload failure is > failure to open a path device in bdev_open(), either because the device > doesn't exist, or because it's busy or otherwise unavailable. If this > happens in ev_add_path(), the likely reason is that the path just added > was busy, and the smartest action upon such a failure would probably be > to just undo that addition. We currently don't do that; we remove the > entire map, which is questionable, as you state correctly. This is why we call setup_multipath after failed reloads, to make sure multipathd's view of the multipath device resyncs with the kernel's, which hasn't changed from what it was before the reload failed. > In the checker, this can't happen. Obviously, no other process can grab > a path device while the device mapper is holding it, so -EBUSY won't > occur if we reload an existing map. Even device deletion doesn't cause > failure on reload. It is possible to delete a SCSI device while it's > mapped, and execute a table reload / suspend / resume cycle on the map > while referencing the deleted device. The kernel keeps holding the > reference to the deleted device, and will simply mark it as > failed. This holds also if the mapped paths are re-grouped or re- > ordered in the table. Failure occurs only if we temporarily remove the > device from the map and re-add it, because as soon as the device is > removed from the map's dm table, its refcount drops to zero, and it's > gone for good. > > IOW, reloading a map with a table containing only already-mapped > devices will never fail, except in extreme situations like kernel OOM. Maybe I should clarify my position a bit. I am fine with reloading the device in the checkerloop if something has changed. This obviously does run a very small risk of something going wrong and a device getting removed unnecessarily, but we know that we need to reload the device, so we should. What I would rather avoid is reloading the device because we failed to get it's state in do_sync_mpp(). We don't do this because we know that something has changed. We do this as a safety measure to deal with corner cases where our state doesn't match the kernel's and we didn't get an event. Double checking this each time we check a path in a device saves having to catch all these corner cases elsewhere. But it's almost always completely unnecessary, and we're doing it on every multipath device every couple of seconds, unlike reloading a device, which is rare. > Thus, AFAICS, the only relevant scenario where a reloading would fail > is trying to add a path device that was not previously mapped, and > that's either busy (perhaps in another map) or has been deleted, IOW > only when we reload after calling adopt_paths(). This is where we could > improve. If we fail to reload after adopting new paths, we could fall > back to the existing table, and perhaps try to add paths one by one. > Again, this is post-0.11 material. > > OTOH, practially impossible is not totally impossible, so we need to be > prepared to map reload failure either way. IMO the best thing we can do > in this case is to keep using the kernel's map, and retry reloading > later. I'm not actually worried about the kernel so much as libdevmapper. It is not designed for multi-threaded processes, and that has bitten us in the past. For intance, it's why we don't delete devices in dmevent_loop() on libdevmapper errors. dm_get_events() just waits and retries if getting the device list fails, and for each device, it calls dm_is_mpath and will only delete a device on DM_IS_MPATH_NO, which is what I suggested for the cleanup function. I'm pretty sure we've handled all of the known issues here, with fixes like: 02d4bf07 ("libmultipath: protect racy libdevmapper calls with a mutex") 34e01d2f ("multipath-tools: don't call dm_lib_release() any more") I'd rather not risk having missed some issue that could cause a temporary error in a function that we call every couple of seconds (almost always unnecessarily). -Ben > The only critical situation is WWID change of path devices. We must try > to fix this situation ASAP when we detect it. I'm unsure what the best > action is if a reload fails in that situation, though (other than > failing the path, as we already do). > > Martin