Hi Lixiaokeng, On Mon, 2020-07-13 at 10:15 +0800, lixiaokeng wrote: > Hi > > Now the number of fd pointing /dev/mapper/control in multipathd > process > increases when iscsi device logs in. The reason is that wait_dmevents > thread and uevqloop thread call _open_and_assign_control_fd > concurrently. > If lock add to _open_and_assign_control_fd fun in lvm2/libdm/libdm- > iface.c, > the trouble is solved easily but Zdenek Kabelac said that libdm is > not > pthread aware/safe library.So the lock could not be added to libdm. > If > the lock add to multipath-tools, there will be a lot of positions > where > dm_task_run is called and the lock shuold be added. It may degrade > multipath-tools' performance. I don't have any other good idea about > this trouble. Do you have some good idea about it? There is an > another > problem to me. Multipathd is a process with multi-thread and libdm is > not pthread aware/safe library, why multipathd use libdm with no > protect? Thanks. > > The procedure details when fd leak occurs given as follows: > 1. wait_dmevents thread > wait_dmevents > ->dmevent_loop > ->dm_get_events (->dm_is_mpath) > ->dm_task_run > ->_open_control > ->_open_and_assign_control_fd > > 2. uevqloop thread > uevqloop > ->uevent_dispatch > ->service_uevq > ->ev_add_path > ->__setup_multipath > ->dm_get_info > ->dm_task_run > ->_open_control > ->_open_and_assign_control_fd > > Lixiaokeng Thanks for the report and the excellent analysis. The general problem may perhaps not so bad currently, as multipathd uses the "vecs lock" to protect its own data structures, and most libdm calls are made under this lock. dmevent_loop() is one example where the vecs lock is not taken. I'm attaching a tentative patch for the particular race you reported, which resorts to simply taking the vecs lock im dmevent_loop. Please review/test. This is tentative, I haven't tested it myself beyond making sure that it passes the unit test. To be sure we aren't missing anything we'd need to assess if there are other libdm calls which are not made under the vecs lock. Short-term I have to time for a complete assessment. My guess would be that there are a few, but not many, and most of them not prone to races. In the long run, we need to think about the general problem of libdm not being thread-safe. So far we've had very few reports like this (actually, I'm aware of none), so perhaps the vecs log protects us quite well already. OTOH, we've discussed repeatedly that the locking in multipathd is too monolithic and should be more fine-grained; as soon as we drop the monolithic lock, this might hit us hard. If we introduce an additional lock for libdm, we'll have to think about potential deadlock situations (probably easy, as the new lock would just warp libdm calls and would thus, by definition, be at the bottom of any lock hierarchy in libmultipath). @Zdenek, do we have to protect every libdm call, or is it sufficient to protect only dm_task_run(), as lixiaokeng suggested? @Ben, please take a look as well, as you're the main author of the dmevents code and know libdm better than me. Regards, Martin > > On 2020/7/9 16:54, Zdenek Kabelac wrote: > > Dne 09. 07. 20 v 9:02 lixiaokeng napsal(a): > > > When one iscsi device logs in and logs out several times, the > > > number of fd, which points to '/dev/mapper/control', increases in > > > /proc/<multipathd-pid>/fd as follows, > > > [root@localhost fd]# ll | grep control > > > diff --git a/libdm/ioctl/libdm-iface.c b/libdm/ioctl/libdm- > > > iface.c > > > index 7ad549c..7168369 100644 > > > --- a/libdm/ioctl/libdm-iface.c > > > +++ b/libdm/ioctl/libdm-iface.c > > > @@ -23,6 +23,7 @@ > > > #include <sys/ioctl.h> > > > #include <sys/utsname.h> > > > #include <limits.h> > > > +#include <pthread.h> > > > > > > #ifdef __linux__ > > > # include "libdm/misc/kdev_t.h" > > > @@ -81,6 +82,7 @@ static dm_bitset_t _dm_bitset = NULL; > > > static uint32_t _dm_device_major = 0; > > > > > > static int _control_fd = -1; > > > +static pthread_mutex_t _control_fd_mutex = > > > PTHREAD_MUTEX_INITIALIZER; > > > static int _hold_control_fd_open = 0; > > > static int _version_checked = 0; > > > static int _version_ok = 1; > > > @@ -404,10 +406,19 @@ static void _close_control_fd(void) > > > #ifdef DM_IOCTLS > > > static int _open_and_assign_control_fd(const char *control) > > > { > > > + pthread_mutex_lock(&_control_fd_mutex); > > > + > > > + if (_control_fd != -1) { > > > > > > Hi > > > > libdm is not pthread aware/safe library. > > > > So the fix needs to happen on libdm user-side to prevent race call > > of this function. > > > > > > Zdenek > > > > > > .
From 05115217286dbbd1679e71c4aaceabe42afb5f7f Mon Sep 17 00:00:00 2001 From: Martin Wilck <mwilck@xxxxxxxx> Date: Mon, 13 Jul 2020 11:03:43 +0200 Subject: [PATCH] libmultipath: dmevent_loop: hold vecs lock The libdm calls in the event waiter, in particular get_dm_events(), may race with libdm calls from other threads, leading to a fd leak for the libdm control fd. The procedure details when fd leak occurs given as follows: 1. wait_dmevents thread wait_dmevents ->dmevent_loop ->dm_get_events (->dm_is_mpath) ->dm_task_run ->_open_control ->_open_and_assign_control_fd 2. uevqloop thread uevqloop ->uevent_dispatch ->service_uevq ->ev_add_path ->__setup_multipath ->dm_get_info ->dm_task_run ->_open_control ->_open_and_assign_control_fd In the long run, we may need to protect libdm access with a separate lock. For now, work around this particular race by grabbing the vecs log for an entire round of events processing. This implies also changes in the dmevents test code, because poll() is not called from dmevent_loop() any more. Reported-by: lixiaokeng <lixiaokeng@xxxxxxxxxx> --- multipathd/dmevents.c | 32 ++++++++++++++++++-------------- tests/dmevents.c | 35 ++++------------------------------- 2 files changed, 22 insertions(+), 45 deletions(-) diff --git a/multipathd/dmevents.c b/multipathd/dmevents.c index b235dd4..be7e122 100644 --- a/multipathd/dmevents.c +++ b/multipathd/dmevents.c @@ -289,24 +289,15 @@ static void unwatch_dmevents(char *name) static int dmevent_loop (void) { int r, i = 0; - struct pollfd pfd; struct dev_event *dev_evt; - pfd.fd = waiter->fd; - pfd.events = POLLIN; - r = poll(&pfd, 1, -1); - if (r <= 0) { - condlog(0, "failed polling for dm events: %s", strerror(errno)); - /* sleep 1s and hope things get better */ - return 1; - } - if (arm_dm_event_poll(waiter->fd) != 0) { condlog(0, "Cannot re-arm event polling: %s", strerror(errno)); /* sleep 1s and hope things get better */ return 1; } + if (dm_get_events() != 0) { condlog(0, "failed getting dm events: %s", strerror(errno)); /* sleep 1s and hope things get better */ @@ -352,15 +343,12 @@ static int dmevent_loop (void) * 4) a path reinstate : nothing to do * 5) a switch group : nothing to do */ - pthread_cleanup_push(cleanup_lock, &waiter->vecs->lock); - lock(&waiter->vecs->lock); pthread_testcancel(); r = 0; if (curr_dev.action == EVENT_REMOVE) remove_map_by_alias(curr_dev.name, waiter->vecs, 1); else r = update_multipath(waiter->vecs, curr_dev.name, 1); - pthread_cleanup_pop(1); if (r) { condlog(2, "%s: stopped watching dmevents", @@ -392,7 +380,23 @@ void *wait_dmevents (__attribute__((unused)) void *unused) mlockall(MCL_CURRENT | MCL_FUTURE); while (1) { - r = dmevent_loop(); + struct pollfd pfd; + pfd.fd = waiter->fd; + + pfd.events = POLLIN; + r = poll(&pfd, 1, -1); + + if (r <= 0) { + condlog(0, "failed polling for dm events: %s", strerror(errno)); + /* sleep 1s and hope things get better */ + r = 1; + } else { + pthread_cleanup_push(cleanup_lock, &waiter->vecs->lock); + lock(&waiter->vecs->lock); + pthread_testcancel(); + r = dmevent_loop(); + pthread_cleanup_pop(1); + } if (r < 0) break; diff --git a/tests/dmevents.c b/tests/dmevents.c index bee117a..6857963 100644 --- a/tests/dmevents.c +++ b/tests/dmevents.c @@ -627,9 +627,8 @@ static void test_get_events_good1(void **state) assert_int_equal(VECTOR_SIZE(waiter->events), 3); } -/* poll does not return an event. nothing happens. The - * devices remain after this test */ -static void test_dmevent_loop_bad0(void **state) +/* arm_dm_event_poll's ioctl fails. Nothing happens */ +static void test_dmevent_loop_bad1(void **state) { struct dm_device *dev; struct dev_event *dev_evt; @@ -637,34 +636,13 @@ static void test_dmevent_loop_bad0(void **state) if (datap == NULL) skip(); + // will_return(__wrap_poll, 1); remove_all_dm_device_events(); unwatch_all_dmevents(); assert_int_equal(add_dm_device_event("foo", 1, 5), 0); will_return(__wrap_dm_geteventnr, 0); assert_int_equal(watch_dmevents("foo"), 0); assert_int_equal(add_dm_device_event("foo", 1, 6), 0); - will_return(__wrap_poll, 0); - assert_int_equal(dmevent_loop(), 1); - dev_evt = find_dmevents("foo"); - assert_ptr_not_equal(dev_evt, NULL); - assert_int_equal(dev_evt->evt_nr, 5); - assert_int_equal(dev_evt->action, EVENT_NOTHING); - dev = find_dm_device("foo"); - assert_ptr_not_equal(dev, NULL); - assert_int_equal(dev->evt_nr, 6); - assert_int_equal(dev->update_nr, 5); -} - -/* arm_dm_event_poll's ioctl fails. Nothing happens */ -static void test_dmevent_loop_bad1(void **state) -{ - struct dm_device *dev; - struct dev_event *dev_evt; - struct test_data *datap = (struct test_data *)(*state); - if (datap == NULL) - skip(); - - will_return(__wrap_poll, 1); will_return(__wrap_ioctl, -1); assert_int_equal(dmevent_loop(), 1); dev_evt = find_dmevents("foo"); @@ -686,7 +664,7 @@ static void test_dmevent_loop_bad2(void **state) if (datap == NULL) skip(); - will_return(__wrap_poll, 1); + // will_return(__wrap_poll, 1); will_return(__wrap_ioctl, 0); will_return(__wrap_libmp_dm_task_create, NULL); assert_int_equal(dmevent_loop(), 1); @@ -710,7 +688,6 @@ static void test_dmevent_loop_good0(void **state) remove_all_dm_device_events(); unwatch_all_dmevents(); - will_return(__wrap_poll, 1); will_return(__wrap_ioctl, 0); will_return(__wrap_libmp_dm_task_create, &data); will_return(__wrap_dm_task_no_open_count, 1); @@ -746,7 +723,6 @@ static void test_dmevent_loop_good1(void **state) assert_int_equal(watch_dmevents("xyzzy"), 0); assert_int_equal(add_dm_device_event("foo", 1, 6), 0); assert_int_equal(remove_dm_device_event("xyzzy"), 0); - will_return(__wrap_poll, 1); will_return(__wrap_ioctl, 0); will_return(__wrap_libmp_dm_task_create, &data); will_return(__wrap_dm_task_no_open_count, 1); @@ -794,7 +770,6 @@ static void test_dmevent_loop_good2(void **state) will_return(__wrap_dm_geteventnr, 0); assert_int_equal(watch_dmevents("baz"), 0); assert_int_equal(add_dm_device_event("baz", 1, 14), 0); - will_return(__wrap_poll, 1); will_return(__wrap_ioctl, 0); will_return(__wrap_libmp_dm_task_create, &data); will_return(__wrap_dm_task_no_open_count, 1); @@ -838,7 +813,6 @@ static void test_dmevent_loop_good3(void **state) assert_int_equal(remove_dm_device_event("foo"), 0); unwatch_dmevents("bar"); - will_return(__wrap_poll, 1); will_return(__wrap_ioctl, 0); will_return(__wrap_libmp_dm_task_create, &data); will_return(__wrap_dm_task_no_open_count, 1); @@ -896,7 +870,6 @@ int test_dmevents(void) cmocka_unit_test(test_get_events_good0), cmocka_unit_test(test_get_events_good1), cmocka_unit_test(test_arm_poll), - cmocka_unit_test(test_dmevent_loop_bad0), cmocka_unit_test(test_dmevent_loop_bad1), cmocka_unit_test(test_dmevent_loop_bad2), cmocka_unit_test(test_dmevent_loop_good0), -- 2.26.2
-- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel