On Thu, 2022-11-17 at 12:53 -0600, Benjamin Marzinski wrote: > On Wed, Nov 09, 2022 at 10:10:07PM +0100, mwilck@xxxxxxxx wrote: > > From: Martin Wilck <mwilck@xxxxxxxx> > > > > To check whether we will be able to add a given device can be part > > of a multipath map, we have two tests in check_path_valid(): > > released_to_systemd() and the O_EXCL test. The former isn't helpful > > if "multipath -u" is called for the first time for a given device, > > and the latter is only used in the "find_multipaths smart" case, > > because > > actively opening the device with O_EXCL, even for a very short > > time, is prone > > to races with other processes. > > > > It turns out that this may cause issues in some scenarios. We saw > > problems in > > once case where "find_multipaths greedy" was used with a single > > non-multipahted root disk and a very large number of multipath > > LUNs. > > The root disk would first be classified as multipath device. > > multipathd > > would try to create a map, fail (because the disk was mounted) and > > trigger another uevent. But because of the very large number of > > multipath > > devices, this event was queued up behind thousands of other events, > > and > > the root device timed out eventually. > > > > While a simple workaround for the given problem would be proper > > blacklisting > > or using a different find_multipaths mode, I am proposing a > > different > > solution here. An additional test is added in is_path_valid() which > > checks whether the given device is currently in use by 1. sysfs > > holders, > > 2. mounts (from /proc/self/mountinfo) or 3. swaps (from > > /proc/swaps). 2. > > and 3. are similar to systemd's device detection after switching > > root. > > This must not only be done for the device itself, but also for all > > its > > partitions. For mountinfo and swaps, libmount is utilized. > > > > With this patch, "multipath -u" will make devices with mounted or > > otherwise > > used partitions available to systemd early, without waiting for > > multipathd > > to fail setting up the map and re-triggering an uevent. This should > > avoid > > the issue described above even without blacklisting. The downside > > of it > > is a longer runtime of "multipath -u" for almost all devices, in > > particular > > for real multipath devices. The runtime required for the new checks > > was in the > > order of 0.1ms-1ms in my tests. Moreover, there is a certain risk > > that devices may > > wrongly classified as non-multipath because of transient mounts or > > holders > > created by other processes. > > > > With greedy, we expect that the blacklists must be correctly set up, > so > we're just slowing things down to deal with people not configuring > multipath correctly. Only in theory. Because of the failed-wwids logic, "greedy" works quite well actually, even if the blacklist is not correctly set up. With this as a special exception. > But since I rarely see greedy configurations, I > don't really have strong feelings about this trade-off. I've been wondering whether we could make this depend on a config option (yes I know, I've said often that we have too many of them). We could also have it depend on "greedy". But it might also be useful with "smart" if we have a lot of LUNs. > > More suggestions below. > > [...] > > > + pthread_cleanup_push_cast(free_scandir_result, &sr); > > + > > + /* parts[0] is the whole disk */ > > + if (vector_alloc_slot(parts) && > > + (prop = strdup(strrchr(path, '/') + 1)) != NULL) > > Since we always add 1, prop can never be NULL. Oops. I should rather check for prop != ONE then :-) > > > + vector_set_slot(parts, prop); > > + > > + last = path + strlen(path); > > + for (i = 0; i < sr.n; i++) { > > + struct stat st; > > + > > + /* only add dirs that have the "partition" > > attribute */ > > + snprintf(last, sizeof(path) - (last - path), > > "/%s/partition", > > + sr.di[i]->d_name); > > + > > + if (stat(path, &st) == 0) { > > + prop = strdup(sr.di[i]->d_name); > > + > > + if (vector_alloc_slot(parts) && prop != > > NULL) > > We should probably check "prop != NULL" first, so that we don't > allocate > a slot if we aren't going to use it. Sure. > > > + pthread_cleanup_push(cleanup_table, tbl); > > + cache = mnt_new_cache(); > > + if (cache) { > > + pthread_cleanup_push(cleanup_cache, cache); > > + if (mnt_table_set_cache(tbl, cache) == 0) { > > + stream = fopen(mountinfo, "r"); > > + if (stream != NULL) { > > + pthread_cleanup_push(cleanup_fclose > > , stream); > > + ret = mnt_table_parse_stream(tbl, > > stream, mountinfo); > > + pthread_cleanup_pop(1); > > + > > + if (ret == 0 && > > + (used = check_mnt_table(parts, > > tbl, "mountinfo"))) > > + break; > > instead of having a break here, shouldn't be just check ret and call > check_mkt_table if it's 0? Sure. I guess this "break" translates into a noop. Strange that the compiler didn't complain. Thanks for the review, Martin > > > + } > > + } > > + pthread_cleanup_pop(1); > > + } > > + pthread_cleanup_pop(1); > > + return used; > > +} > > + > > +static int check_swaps(const struct _vector *parts) > > +{ > > + struct libmnt_table *tbl; > > + struct libmnt_cache *cache; > > + int used = 0, ret; > > + > > + tbl = mnt_new_table(); > > + if (!tbl ) > > + return -errno; > > + > > + pthread_cleanup_push(cleanup_table, tbl); > > + cache = mnt_new_cache(); > > + if (cache) { > > + pthread_cleanup_push(cleanup_cache, cache); > > + if (mnt_table_set_cache(tbl, cache) == 0) { > > + ret = mnt_table_parse_swaps(tbl, NULL); > > + if (ret == 0 && > > + (used = check_mnt_table(parts, tbl, > > "swaps"))) > > + break; > > Same break issue. > > > -Ben > > > + } > > + pthread_cleanup_pop(1); > > + } > > + pthread_cleanup_pop(1); > > + return used; > > +} > > + > > +/* > > + * Given a block device, check if the device itself or any of its > > + * partitions is in use > > + * - by sysfs holders (e.g. LVM) > > + * - mounted according to /proc/self/mountinfo > > + * - used as swap > > + */ > > +static int is_device_in_use(struct udev_device *udevice) > > +{ > > + const char *syspath; > > + vector parts; > > + int used = 0, ret; > > + > > + syspath = udev_device_get_syspath(udevice); > > + if (!syspath) > > + return -ENOMEM; > > + > > + parts = vector_alloc(); > > + if (!parts) > > + return -ENOMEM; > > + > > + pthread_cleanup_push_cast(free_strvec, parts); > > + if ((ret = read_partitions(syspath, parts)) == 0) > > + used = check_all_holders(parts) > 0 || > > + check_mountinfo(parts) > 0 || > > + check_swaps(parts) > 0; > > + pthread_cleanup_pop(1); > > + > > + if (ret < 0) > > + return ret; > > + > > + condlog(3, "%s: %s is %sin use", __func__, syspath, used ? > > "" : "not "); > > + return used; > > +} > > + > > int > > is_path_valid(const char *name, struct config *conf, struct path > > *pp, > > bool check_multipathd) > > { > > int r; > > int fd; > > + const char *prop; > > > > if (!pp || !name || !conf) > > return PATH_IS_ERROR; > > @@ -80,6 +326,10 @@ is_path_valid(const char *name, struct config > > *conf, struct path *pp, > > if (!pp->udev) > > return PATH_IS_ERROR; > > > > + prop = udev_device_get_property_value(pp->udev, "DEVTYPE"); > > + if (prop == NULL || strcmp(prop, "disk")) > > + return PATH_IS_NOT_VALID; > > + > > r = pathinfo(pp, conf, DI_SYSFS | DI_WWID | DI_BLACKLIST); > > if (r == PATHINFO_SKIPPED) > > return PATH_IS_NOT_VALID; > > @@ -96,6 +346,9 @@ is_path_valid(const char *name, struct config > > *conf, struct path *pp, > > return PATH_IS_ERROR; > > } > > > > + if (is_device_in_use(pp->udev) > 0) > > + return PATH_IS_NOT_VALID; > > + > > Can we make this only apply to "greedy"? For "strict", "no" and "yes" > this makes the common case slower (you are running multipath on a > machine with multipath devices that you've seen before) with no real > benefit. > > It might also be useful to run this check before we return "maybe" > for > find_multipaths "smart", perhaps as an alternative to the O_EXCL test > we > currently use. > > > if (conf->find_multipaths == FIND_MULTIPATHS_GREEDY) > > return PATH_IS_VALID; > > > > diff --git a/tests/Makefile b/tests/Makefile > > index 3a5b161..a0d3e1b 100644 > > --- a/tests/Makefile > > +++ b/tests/Makefile > > @@ -64,7 +64,7 @@ vpd-test_LIBDEPS := -ludev -lpthread -ldl > > alias-test_TESTDEPS := test-log.o > > alias-test_LIBDEPS := -lpthread -ldl > > valid-test_OBJDEPS := $(multipathdir)/valid.o > > $(multipathdir)/discovery.o > > -valid-test_LIBDEPS := -ludev -lpthread -ldl > > +valid-test_LIBDEPS := -lmount -ludev -lpthread -ldl > > devt-test_LIBDEPS := -ludev > > mpathvalid-test_LIBDEPS := -ludev -lpthread -ldl > > mpathvalid-test_OBJDEPS := $(mpathvaliddir)/mpath_valid.o > > diff --git a/tests/valid.c b/tests/valid.c > > index 398b771..9e7f719 100644 > > --- a/tests/valid.c > > +++ b/tests/valid.c > > @@ -83,6 +83,13 @@ struct udev_device > > *__wrap_udev_device_new_from_subsystem_sysname(struct udev *u > > return NULL; > > } > > > > +/* For devtype check */ > > +const char *__wrap_udev_device_get_property_value(struct > > udev_device *udev_device, const char *property) > > +{ > > + check_expected(property); > > + return mock_ptr_type(char *); > > +} > > + > > /* For the "hidden" check in pathinfo() */ > > const char *__wrap_udev_device_get_sysattr_value(struct > > udev_device *udev_device, > > const char *sysattr) > > @@ -97,6 +104,12 @@ int __wrap_add_foreign(struct udev_device > > *udev_device) > > return mock_type(int); > > } > > > > +/* For is_device_used() */ > > +const char *__wrap_udev_device_get_sysname(struct udev_device > > *udev_device) > > +{ > > + return mock_ptr_type(char *); > > +} > > + > > /* called from pathinfo() */ > > int __wrap_filter_devnode(struct config *conf, const struct > > _vector *elist, > > const char *vendor, const char * product, > > const char *dev) > > @@ -165,6 +178,11 @@ int __wrap_is_failed_wwid(const char *wwid) > > return ret; > > } > > > > +const char *__wrap_udev_device_get_syspath(struct udev_device > > *udevice) > > +{ > > + return mock_ptr_type(char *); > > +} > > + > > int __wrap_check_wwids_file(char *wwid, int write_wwid) > > { > > bool passed = mock_type(bool); > > @@ -225,6 +243,8 @@ static void setup_passing(char *name, char > > *wwid, unsigned int check_multipathd, > > will_return(__wrap_udev_device_new_from_subsystem_sysname, > > true); > > will_return(__wrap_udev_device_new_from_subsystem_sysname, > > name); > > + expect_string(__wrap_udev_device_get_property_value, > > property, "DEVTYPE"); > > + will_return(__wrap_udev_device_get_property_value, "disk"); > > if (stage == STAGE_GET_UDEV_DEVICE) > > return; > > if (stage == STAGE_PATHINFO_REAL) { > > @@ -250,6 +270,8 @@ static void setup_passing(char *name, char > > *wwid, unsigned int check_multipathd, > > return; > > will_return(__wrap_is_failed_wwid, WWID_IS_NOT_FAILED); > > will_return(__wrap_is_failed_wwid, wwid); > > + /* avoid real is_device_in_use() check */ > > + will_return(__wrap_udev_device_get_syspath, NULL); > > if (stage == STAGE_IS_FAILED) > > return; > > will_return(__wrap_check_wwids_file, false); > > @@ -347,6 +369,30 @@ static void test_check_multipathd(void > > **state) > > assert_int_equal(is_path_valid(name, &conf, &pp, true), > > PATH_IS_ERROR); > > assert_string_equal(pp.dev, name); > > + > > + /* test pass because connect succeeded. succeed getting > > udev. Wrong DEVTYPE */ > > + memset(&pp, 0, sizeof(pp)); > > + setup_passing(name, NULL, CHECK_MPATHD_RUNNING, > > STAGE_CHECK_MULTIPATHD); > > + will_return(__wrap_udev_device_new_from_subsystem_sysname, > > true); > > + will_return(__wrap_udev_device_new_from_subsystem_sysname, > > + name); > > + expect_string(__wrap_udev_device_get_property_value, > > property, "DEVTYPE"); > > + will_return(__wrap_udev_device_get_property_value, > > "partition"); > > + assert_int_equal(is_path_valid(name, &conf, &pp, true), > > + PATH_IS_NOT_VALID); > > + assert_string_equal(pp.dev, name); > > + > > + /* test pass because connect succeeded. succeed getting > > udev. Bad DEVTYPE */ > > + memset(&pp, 0, sizeof(pp)); > > + setup_passing(name, NULL, CHECK_MPATHD_RUNNING, > > STAGE_CHECK_MULTIPATHD); > > + will_return(__wrap_udev_device_new_from_subsystem_sysname, > > true); > > + will_return(__wrap_udev_device_new_from_subsystem_sysname, > > + name); > > + expect_string(__wrap_udev_device_get_property_value, > > property, "DEVTYPE"); > > + will_return(__wrap_udev_device_get_property_value, NULL); > > + assert_int_equal(is_path_valid(name, &conf, &pp, true), > > + PATH_IS_NOT_VALID); > > + assert_string_equal(pp.dev, name); > > } > > > > static void test_pathinfo(void **state) > > -- > > 2.38.0 > -- dm-devel mailing list dm-devel@xxxxxxxxxx https://listman.redhat.com/mailman/listinfo/dm-devel