On Tue, 2022-12-06 at 18:37 -0600, Benjamin Marzinski wrote: > On Sat, Dec 03, 2022 at 12:43:38AM +0100, mwilck@xxxxxxxx wrote: > > From: Martin Wilck <mwilck@xxxxxxxx> > > > > To check whether we will be able to add a given device can be part > > of a multipath map, we have two tests in check_path_valid(): > > released_to_systemd() and the O_EXCL test. The former isn't helpful > > if "multipath -u" is called for the first time for a given device, > > and the latter is only used in the "find_multipaths smart" case, > > because > > actively opening the device with O_EXCL, even for a very short > > time, is prone > > to races with other processes. > > > > It turns out that this may cause issues in some scenarios. We saw > > problems in > > once case where "find_multipaths greedy" was used with a single > > non-multipahted root disk and a very large number of multipath > > LUNs. > > The root disk would first be classified as multipath device. > > multipathd > > would try to create a map, fail (because the disk was mounted) and > > trigger another uevent. But because of the very large number of > > multipath > > devices, this event was queued up behind thousands of other events, > > and > > the root device timed out eventually. > > > > While a simple workaround for the given problem would be proper > > blacklisting > > or using a different find_multipaths mode, I am proposing a > > different > > solution here. An additional test is added in is_path_valid() which > > checks whether the given device is currently in use by 1. sysfs > > holders, > > 2. mounts (from /proc/self/mountinfo) or 3. swaps (from > > /proc/swaps). 2. > > and 3. are similar to systemd's device detection after switching > > root. > > This must not only be done for the device itself, but also for all > > its > > partitions. For mountinfo and swaps, libmount is utilized. > > > > With this patch, "multipath -u" will make devices with mounted or > > otherwise > > used partitions available to systemd early, without waiting for > > multipathd > > to fail setting up the map and re-triggering an uevent. This should > > avoid > > the issue described above even without blacklisting. The downside > > of it > > is a longer runtime of "multipath -u" for almost all devices, in > > particular > > for real multipath devices. The runtime required for the new checks > > was in the > > order of 0.1ms-1ms in my tests. Moreover, there is a certain risk > > that devices may > > wrongly classified as non-multipath because of transient mounts or > > holders > > created by other processes. > > > > To make this code compile on older distributions, we need some > > additional > > checks in create-config.mk. > > > Two small issues below. > > > Signed-off-by: Martin Wilck <mwilck@xxxxxxxx> > > --- > > .github/workflows/build-and-unittest.yaml | 2 +- > > create-config.mk | 11 +- > > libmpathutil/libmpathutil.version | 6 + > > libmpathutil/util.c | 12 + > > libmpathutil/util.h | 2 + > > libmultipath/Makefile | 2 +- > > libmultipath/alias.c | 11 - > > libmultipath/valid.c | 267 > > ++++++++++++++++++++++ > > tests/Makefile | 2 +- > > tests/valid.c | 48 ++++ > > 10 files changed, 348 insertions(+), 15 deletions(-) > > > > > > +static int read_partitions(const char *syspath, vector parts) > > +{ > > + struct scandir_result sr = { .n = 0 }; > > + char path[PATH_MAX], *last; > > + char *prop; > > + int i; > > + > > + strlcpy(path, syspath, sizeof(path)); > > + sr.n = scandir(path, &sr.di, subdir_filter, NULL); > > + if (sr.n == -1) > > + return -errno; > > + > > + pthread_cleanup_push_cast(free_scandir_result, &sr); > > + > > + /* parts[0] is the whole disk */ > > + if (vector_alloc_slot(parts) && > > + (prop = strdup(strrchr(path, '/') + 1)) != NULL) > > + vector_set_slot(parts, prop); > > We should add the whole disk like we do for the partitions below, > where > we strdup() the name first, and free it if we can't allocate a slot. > Otherwise, if the strdup failed, we could leave an empty slot in the > vector, which would cause vector_foreach_slot() to stop early Sure. > > int > > is_path_valid(const char *name, struct config *conf, struct path > > *pp, > > bool check_multipathd) > > { > > int r; > > int fd; > > + const char *prop; > > > > if (!pp || !name || !conf) > > return PATH_IS_ERROR; > > @@ -80,6 +338,10 @@ is_path_valid(const char *name, struct config > > *conf, struct path *pp, > > if (!pp->udev) > > return PATH_IS_ERROR; > > > > + prop = udev_device_get_property_value(pp->udev, "DEVTYPE"); > > + if (prop == NULL || strcmp(prop, "disk")) > > + return PATH_IS_NOT_VALID; > > + > > r = pathinfo(pp, conf, DI_SYSFS | DI_WWID | DI_BLACKLIST); > > if (r == PATHINFO_SKIPPED) > > return PATH_IS_NOT_VALID; > > @@ -96,6 +358,11 @@ is_path_valid(const char *name, struct config > > *conf, struct path *pp, > > return PATH_IS_ERROR; > > } > > > > + if ((conf->find_multipaths == FIND_MULTIPATHS_GREEDY || > > + conf->find_multipaths == FIND_MULTIPATHS_SMART) && > > + is_device_in_use(pp->udev) > 0) > > + return PATH_IS_NOT_VALID; > > + > > Even if multipath is set to "smart", the common case will still be > that > the device is in the wwids file, so we know that it's valid. I don't > see > a reason to do this check in that case. I was thinking that for the > "smart" case, this check should go at the end of the function, before > we > return PATH_IS_MAYBE_VALID. Or do you have a reason why we should do > it > here that I'm missing? I'd argue the other way around: if the device is in use, it doesn't matter any more whether or not it's listed in the WWIDs file. This situation can only occur if something went wrong beforehand (most probably, an inconsistent WWIDs file in the initrd, or some other component misbehaving). If we tag such a device VALID although it's mounted, we'll almost certainly fall into emergency mode, regardless of the find_multipaths mode we're in. The reason is that multipathd will fail to map the mounted device, and systemd will consider the path device unusable, leading to the device being inaccessible either way. I agree that the device being in use while listed in the WWIDs file is an error and should probably be logged more prominently. Martin -- dm-devel mailing list dm-devel@xxxxxxxxxx https://listman.redhat.com/mailman/listinfo/dm-devel