On Wed, Feb 08, 2023 at 07:35:36AM +0000, Martin Wilck wrote: > Hello Ben, > > On Tue, 2023-02-07 at 18:32 -0600, Benjamin Marzinski wrote: > > If "multipath -u" fails, udev doesn't import any values from the > > program. This means that multipath.rules will continue to use the > > values > > for DM_MULTIPATH_DEVICE_PATH and FIND_MULTIPATHS_WAIT_UNTIL that it > > has > > already imported from the database. This is the correct thing to do > > for > > every case except the MAYBE case for "find_multipaths smart". In that > > case, DM_MULTIPATH_DEVICE_PATH will be set to 1, and the rules will > > assume that the device has been definitively claimed. > > > > In this case, we know that the device shouldn't have been claimed > > before, but we don't know if it should be claimed now, or if we have > > hit > > the timeout and it should be released, since we didn't get any > > information from multipath. The safest thing to do is assume that > > this > > was the timeout, and the device shouldn't be claimed. The only time > > when > > this could be the wrong answer is when we first see a new multipath > > device, and it could only cause problems if there is metadata on the > > device that will cause it to get autoassembled by something else, > > before > > multipathd can autoassemble it. If we assume that it is a multipath > > device, or we assume that this wasn't actually the timeout uevent, we > > can keep a necessary device from getting released to the reset of the > > system. > > > > Signed-off-by: Benjamin Marzinski <bmarzins@xxxxxxxxxx> > > We have this code in check_path_valid(): > > /* > * multipath -u must exit with status 0, otherwise udev won't > * import its output. > */ > if (!is_uevent && r == PATH_IS_NOT_VALID) > r = RTVL_FAIL; > else > r = RTVL_OK; > > > AFAICS about the only condition I can imagine in which "multipath -u" > would return an exit status other than 0 would be failure of > init_config(). If that's the case you're concerned about, the issue > might be easier to fix in main.c directly. I'm tracking down a bug where it appears that either "multpath -u" failed and caused this issue or incorrectly returned DM_MULTIPATH_DEVICE_PATH=1. I'm still not sure what exactly happened, but when going over the code I noticed this issue was possible and it doesn't seem like it should be. I don't think it's very likely that this is the issue that occurred in the bug I'm looking into, but I do think that this is possible for errors other than a config problem. First, any memory issue would cause this. Obviously, there's a good chance that the whole system has bigger issues in that case. Second, we do rely on the udev library to not fail in a number of places, notably path_discovery(). Third, we rely on filesystem code in places like is_failed_wwid(). None of these issue individually bothers me, but since we can protect the system from an unexpected failure here, it seems reasonable to do so. Obviously we can solve all of them except a memory failure or an init_config() issue by making sure check_path_valid() never returns failure in the uevent case, but it seems more reasonable to me to actually fail, and have udev handle that. But like I said, I don't think I've actually seen this issue in practice. I don't think the udev changes are that intrusive, they only happen in case where you are running with "find_multipaths smart" and getting a uevent for a MAYBE path, and aside from changing ENV{FIND_MULTIPATHS_WAIT_CANCELLED}!="?*", \ ENV{FIND_MULTIPATHS_WAIT_UNTIL}=="?*", \ ENV{FIND_MULTIPATHS_WAIT_UNTIL}!="0", \ ENV{FIND_MULTIPATHS_WAIT_CANCELLED}="1", \ RUN+="/usr/bin/systemctl stop cancel-multipath-wait-$kernel.timer" Into: ENV{FIND_MULTIPATHS_WAIT_CANCELLED}=="?*", GOTO="end_mpath" ENV{FIND_MULTIPATHS_WAIT_UNTIL}!="?*", GOTO="end_mpath" ENV{FIND_MULTIPATHS_WAIT_UNTIL}=="0", GOTO="end_mpath" ENV{FIND_MULTIPATHS_WAIT_CANCELLED}="1" RUN+="/usr/bin/systemctl stop cancel-multipath-wait-$kernel.timer" which is functionally the same, the only change is to check if "multipath -u" failed, and if so, set DM_MULTIPATH_DEVICE_PATH=0, instead of what we are currently doing implicitly with the IMPORT{db}="DM_MULTIPATH_DEVICE_PATH" which is to set DM_MULTIPATH_DEVICE_PATH=1 if "multipath -u" failed. Unless it turns out that my bug was acutally caused by this, I'm not super invested in these changes, but I do think they are a reasonable way to catch an unlikely issue, without having to design the multipath code around the possibility. -Ben > > Regards, > Martin -- dm-devel mailing list dm-devel@xxxxxxxxxx https://listman.redhat.com/mailman/listinfo/dm-devel