Re: [RFC PATCH] libmultipath: is_path_valid(): check if device is in use

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2022-11-17 at 12:53 -0600, Benjamin Marzinski wrote:
> On Wed, Nov 09, 2022 at 10:10:07PM +0100, mwilck@xxxxxxxx wrote:
> > From: Martin Wilck <mwilck@xxxxxxxx>
> > 
> > To check whether we will be able to add a given device can be part
> > of a multipath map, we have two tests in check_path_valid():
> > released_to_systemd() and the O_EXCL test. The former isn't helpful
> > if "multipath -u" is called for the first time for a given device,
> > and the latter is only used in the "find_multipaths smart" case,
> > because
> > actively opening the device with O_EXCL, even for a very short
> > time, is prone
> > to races with other processes.
> > 
> > It turns out that this may cause issues in some scenarios. We saw
> > problems in
> > once case where "find_multipaths greedy" was used with a single
> > non-multipahted root disk and a very large number of multipath
> > LUNs.
> > The root disk would first be classified as multipath device.
> > multipathd
> > would try to create a map, fail (because the disk was mounted) and
> > trigger another uevent. But because of the very large number of
> > multipath
> > devices, this event was queued up behind thousands of other events,
> > and
> > the root device timed out eventually.
> > 
> > While a simple workaround for the given problem would be proper
> > blacklisting
> > or using a different find_multipaths mode, I am proposing a
> > different
> > solution here. An additional test is added in is_path_valid() which
> > checks whether the given device is currently in use by 1. sysfs
> > holders,
> > 2. mounts (from /proc/self/mountinfo) or 3. swaps (from
> > /proc/swaps). 2.
> > and 3. are similar to systemd's device detection after switching
> > root.
> > This must not only be done for the device itself, but also for all
> > its
> > partitions. For mountinfo and swaps, libmount is utilized.
> > 
> > With this patch, "multipath -u" will make devices with mounted or
> > otherwise
> > used partitions available to systemd early, without waiting for
> > multipathd
> > to fail setting up the map and re-triggering an uevent. This should
> > avoid
> > the issue described above even without blacklisting. The downside
> > of it
> > is a longer runtime of "multipath -u" for almost all devices, in
> > particular
> > for real multipath devices. The runtime required for the new checks
> > was in the
> > order of 0.1ms-1ms in my tests. Moreover, there is a certain risk
> > that devices may
> > wrongly classified as non-multipath because of transient mounts or
> > holders
> > created by other processes.
> > 
> 
> With greedy, we expect that the blacklists must be correctly set up,
> so
> we're just slowing things down to deal with people not configuring
> multipath correctly. 

Only in theory. Because of the failed-wwids logic, "greedy" works quite
well actually, even if the blacklist is not correctly set up.
With this as a special exception.

> But since I rarely see greedy configurations, I
> don't really have strong feelings about this trade-off.

I've been wondering whether we could make this depend on a config
option (yes I know, I've said often that we have too many of them).
We could also have it depend on "greedy". But it might also be useful
with "smart" if we have a lot of LUNs.

> 
> More suggestions below.
> 
> [...]
> 
> > +       pthread_cleanup_push_cast(free_scandir_result, &sr);
> > +
> > +       /* parts[0] is the whole disk */
> > +       if (vector_alloc_slot(parts) &&
> > +           (prop = strdup(strrchr(path, '/') + 1)) != NULL)
> 
> Since we always add 1, prop can never be NULL.

Oops. I should rather check for prop != ONE then :-)

> 
> > +               vector_set_slot(parts, prop);
> > +
> > +       last = path + strlen(path);
> > +       for (i = 0; i < sr.n; i++) {
> > +               struct stat st;
> > +
> > +               /* only add dirs that have the "partition"
> > attribute */
> > +               snprintf(last, sizeof(path) - (last - path),
> > "/%s/partition",
> > +                        sr.di[i]->d_name);
> > +
> > +               if (stat(path, &st) == 0) {
> > +                       prop = strdup(sr.di[i]->d_name);
> > +
> > +                       if (vector_alloc_slot(parts) && prop !=
> > NULL)
> 
> We should probably check "prop != NULL" first, so that we don't
> allocate
> a slot if we aren't going to use it.

Sure.

> 
> > +       pthread_cleanup_push(cleanup_table, tbl);
> > +       cache = mnt_new_cache();
> > +       if (cache) {
> > +               pthread_cleanup_push(cleanup_cache, cache);
> > +               if (mnt_table_set_cache(tbl, cache) == 0) {
> > +                       stream = fopen(mountinfo, "r");
> > +                       if (stream != NULL) {
> > +                               pthread_cleanup_push(cleanup_fclose
> > , stream);
> > +                               ret = mnt_table_parse_stream(tbl,
> > stream, mountinfo);
> > +                               pthread_cleanup_pop(1);
> > +
> > +                               if (ret == 0 &&
> > +                                   (used = check_mnt_table(parts,
> > tbl, "mountinfo")))
> > +                                       break;
> 
> instead of having a break here, shouldn't be just check ret and call
> check_mkt_table if it's 0?

Sure. I guess this "break" translates into a noop. Strange that the
compiler didn't complain.

Thanks for the review,
Martin


> 
> > +                       }
> > +               }
> > +               pthread_cleanup_pop(1);
> > +       }
> > +       pthread_cleanup_pop(1);
> > +       return used;
> > +}
> > +
> > +static int check_swaps(const struct _vector *parts)
> > +{
> > +       struct libmnt_table *tbl;
> > +       struct libmnt_cache *cache;
> > +       int used = 0, ret;
> > +
> > +       tbl = mnt_new_table();
> > +       if (!tbl )
> > +               return -errno;
> > +
> > +       pthread_cleanup_push(cleanup_table, tbl);
> > +       cache = mnt_new_cache();
> > +       if (cache) {
> > +               pthread_cleanup_push(cleanup_cache, cache);
> > +               if (mnt_table_set_cache(tbl, cache) == 0) {
> > +                       ret = mnt_table_parse_swaps(tbl, NULL);
> > +                       if (ret == 0 &&
> > +                           (used = check_mnt_table(parts, tbl,
> > "swaps")))
> > +                               break;
> 
> Same break issue.
> 
> 
> -Ben
> 
> > +               }
> > +               pthread_cleanup_pop(1);
> > +       }
> > +       pthread_cleanup_pop(1);
> > +       return used;
> > +}
> > +
> > +/*
> > + * Given a block device, check if the device itself or any of its
> > + * partitions is in use
> > + * - by sysfs holders (e.g. LVM)
> > + * - mounted according to /proc/self/mountinfo
> > + * - used as swap
> > + */
> > +static int is_device_in_use(struct udev_device *udevice)
> > +{
> > +       const char *syspath;
> > +       vector parts;
> > +       int used = 0, ret;
> > +
> > +       syspath = udev_device_get_syspath(udevice);
> > +       if (!syspath)
> > +               return -ENOMEM;
> > +
> > +       parts = vector_alloc();
> > +       if (!parts)
> > +               return -ENOMEM;
> > +
> > +       pthread_cleanup_push_cast(free_strvec, parts);
> > +       if ((ret = read_partitions(syspath, parts)) == 0)
> > +               used =  check_all_holders(parts) > 0 ||
> > +                       check_mountinfo(parts) > 0 ||
> > +                       check_swaps(parts) > 0;
> > +       pthread_cleanup_pop(1);
> > +
> > +       if (ret < 0)
> > +               return ret;
> > +
> > +       condlog(3, "%s: %s is %sin use", __func__, syspath, used ?
> > "" : "not ");
> > +       return used;
> > +}
> > +
> >  int
> >  is_path_valid(const char *name, struct config *conf, struct path
> > *pp,
> >               bool check_multipathd)
> >  {
> >         int r;
> >         int fd;
> > +       const char *prop;
> >  
> >         if (!pp || !name || !conf)
> >                 return PATH_IS_ERROR;
> > @@ -80,6 +326,10 @@ is_path_valid(const char *name, struct config
> > *conf, struct path *pp,
> >         if (!pp->udev)
> >                 return PATH_IS_ERROR;
> >  
> > +       prop = udev_device_get_property_value(pp->udev, "DEVTYPE");
> > +       if (prop == NULL || strcmp(prop, "disk"))
> > +               return PATH_IS_NOT_VALID;
> > +
> >         r = pathinfo(pp, conf, DI_SYSFS | DI_WWID | DI_BLACKLIST);
> >         if (r == PATHINFO_SKIPPED)
> >                 return PATH_IS_NOT_VALID;
> > @@ -96,6 +346,9 @@ is_path_valid(const char *name, struct config
> > *conf, struct path *pp,
> >                 return PATH_IS_ERROR;
> >         }
> >  
> > +       if (is_device_in_use(pp->udev) > 0)
> > +               return PATH_IS_NOT_VALID;
> > +
> 
> Can we make this only apply to "greedy"? For "strict", "no" and "yes"
> this makes the common case slower (you are running multipath on a
> machine with multipath devices that you've seen before) with no real
> benefit.
> 
> It might also be useful to run this check before we return "maybe"
> for
> find_multipaths "smart", perhaps as an alternative to the O_EXCL test
> we
> currently use.
> 
> >         if (conf->find_multipaths == FIND_MULTIPATHS_GREEDY)
> >                 return PATH_IS_VALID;
> >  
> > diff --git a/tests/Makefile b/tests/Makefile
> > index 3a5b161..a0d3e1b 100644
> > --- a/tests/Makefile
> > +++ b/tests/Makefile
> > @@ -64,7 +64,7 @@ vpd-test_LIBDEPS := -ludev -lpthread -ldl
> >  alias-test_TESTDEPS := test-log.o
> >  alias-test_LIBDEPS := -lpthread -ldl
> >  valid-test_OBJDEPS := $(multipathdir)/valid.o
> > $(multipathdir)/discovery.o
> > -valid-test_LIBDEPS := -ludev -lpthread -ldl
> > +valid-test_LIBDEPS := -lmount -ludev -lpthread -ldl
> >  devt-test_LIBDEPS := -ludev
> >  mpathvalid-test_LIBDEPS := -ludev -lpthread -ldl
> >  mpathvalid-test_OBJDEPS := $(mpathvaliddir)/mpath_valid.o
> > diff --git a/tests/valid.c b/tests/valid.c
> > index 398b771..9e7f719 100644
> > --- a/tests/valid.c
> > +++ b/tests/valid.c
> > @@ -83,6 +83,13 @@ struct udev_device
> > *__wrap_udev_device_new_from_subsystem_sysname(struct udev *u
> >         return NULL;
> >  }
> >  
> > +/* For devtype check */
> > +const char *__wrap_udev_device_get_property_value(struct
> > udev_device *udev_device, const char *property)
> > +{
> > +       check_expected(property);
> > +       return mock_ptr_type(char *);
> > +}
> > +
> >  /* For the "hidden" check in pathinfo() */
> >  const char *__wrap_udev_device_get_sysattr_value(struct
> > udev_device *udev_device,
> >                                          const char *sysattr)
> > @@ -97,6 +104,12 @@ int __wrap_add_foreign(struct udev_device
> > *udev_device)
> >         return mock_type(int);
> >  }
> >  
> > +/* For is_device_used() */
> > +const char *__wrap_udev_device_get_sysname(struct udev_device
> > *udev_device)
> > +{
> > +       return mock_ptr_type(char *);
> > +}
> > +
> >  /* called from pathinfo() */
> >  int __wrap_filter_devnode(struct config *conf, const struct
> > _vector *elist,
> >                           const char *vendor, const char * product,
> > const char *dev)
> > @@ -165,6 +178,11 @@ int __wrap_is_failed_wwid(const char *wwid)
> >         return ret;
> >  }
> >  
> > +const char *__wrap_udev_device_get_syspath(struct udev_device
> > *udevice)
> > +{
> > +       return mock_ptr_type(char *);
> > +}
> > +
> >  int __wrap_check_wwids_file(char *wwid, int write_wwid)
> >  {
> >         bool passed = mock_type(bool);
> > @@ -225,6 +243,8 @@ static void setup_passing(char *name, char
> > *wwid, unsigned int check_multipathd,
> >         will_return(__wrap_udev_device_new_from_subsystem_sysname,
> > true);
> >         will_return(__wrap_udev_device_new_from_subsystem_sysname,
> >                     name);
> > +       expect_string(__wrap_udev_device_get_property_value,
> > property, "DEVTYPE");
> > +       will_return(__wrap_udev_device_get_property_value, "disk");
> >         if (stage == STAGE_GET_UDEV_DEVICE)
> >                 return;
> >         if  (stage == STAGE_PATHINFO_REAL) {
> > @@ -250,6 +270,8 @@ static void setup_passing(char *name, char
> > *wwid, unsigned int check_multipathd,
> >                 return;
> >         will_return(__wrap_is_failed_wwid, WWID_IS_NOT_FAILED);
> >         will_return(__wrap_is_failed_wwid, wwid);
> > +       /* avoid real is_device_in_use() check */
> > +       will_return(__wrap_udev_device_get_syspath, NULL);
> >         if (stage == STAGE_IS_FAILED)
> >                 return;
> >         will_return(__wrap_check_wwids_file, false);
> > @@ -347,6 +369,30 @@ static void test_check_multipathd(void
> > **state)
> >         assert_int_equal(is_path_valid(name, &conf, &pp, true),
> >                          PATH_IS_ERROR);
> >         assert_string_equal(pp.dev, name);
> > +
> > +       /* test pass because connect succeeded. succeed getting
> > udev. Wrong DEVTYPE  */
> > +       memset(&pp, 0, sizeof(pp));
> > +       setup_passing(name, NULL, CHECK_MPATHD_RUNNING,
> > STAGE_CHECK_MULTIPATHD);
> > +       will_return(__wrap_udev_device_new_from_subsystem_sysname,
> > true);
> > +       will_return(__wrap_udev_device_new_from_subsystem_sysname,
> > +                   name);
> > +       expect_string(__wrap_udev_device_get_property_value,
> > property, "DEVTYPE");
> > +       will_return(__wrap_udev_device_get_property_value,
> > "partition");
> > +       assert_int_equal(is_path_valid(name, &conf, &pp, true),
> > +                        PATH_IS_NOT_VALID);
> > +       assert_string_equal(pp.dev, name);
> > +
> > +       /* test pass because connect succeeded. succeed getting
> > udev. Bad DEVTYPE  */
> > +       memset(&pp, 0, sizeof(pp));
> > +       setup_passing(name, NULL, CHECK_MPATHD_RUNNING,
> > STAGE_CHECK_MULTIPATHD);
> > +       will_return(__wrap_udev_device_new_from_subsystem_sysname,
> > true);
> > +       will_return(__wrap_udev_device_new_from_subsystem_sysname,
> > +                   name);
> > +       expect_string(__wrap_udev_device_get_property_value,
> > property, "DEVTYPE");
> > +       will_return(__wrap_udev_device_get_property_value, NULL);
> > +       assert_int_equal(is_path_valid(name, &conf, &pp, true),
> > +                        PATH_IS_NOT_VALID);
> > +       assert_string_equal(pp.dev, name);
> >  }
> >  
> >  static void test_pathinfo(void **state)
> > -- 
> > 2.38.0
> 

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://listman.redhat.com/mailman/listinfo/dm-devel





[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux