Re: [lvm-devel] master - multipathd: fix fd leak when iscsi device logs in

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Lixiaokeng,

On Mon, 2020-07-13 at 10:15 +0800, lixiaokeng wrote:
> Hi
> 
> Now the number of fd pointing /dev/mapper/control in multipathd
> process
> increases when iscsi device logs in. The reason is that wait_dmevents
> thread and uevqloop thread call _open_and_assign_control_fd
> concurrently.
> If lock add to _open_and_assign_control_fd fun in lvm2/libdm/libdm-
> iface.c,
> the trouble is solved easily but Zdenek Kabelac said that libdm is
> not
> pthread aware/safe library.So the lock could not be added to libdm.
> If
> the lock add to multipath-tools, there will be a lot of positions
> where
> dm_task_run is called and the lock shuold be added. It may degrade
> multipath-tools' performance. I don't have any other good idea about
> this trouble. Do you have some good idea about it? There is an
> another
> problem to me. Multipathd is a process with multi-thread and libdm is
> not pthread aware/safe library, why multipathd use libdm with no
> protect? Thanks.
> 
> The procedure details when fd leak occurs given as follows:
> 1. wait_dmevents thread
> wait_dmevents
> ->dmevent_loop
>     ->dm_get_events (->dm_is_mpath)
>         ->dm_task_run
>             ->_open_control
>                 ->_open_and_assign_control_fd
> 
> 2. uevqloop thread
> uevqloop
> ->uevent_dispatch
>     ->service_uevq
>         ->ev_add_path
>             ->__setup_multipath
>                 ->dm_get_info
>                     ->dm_task_run
>                         ->_open_control
>                             ->_open_and_assign_control_fd
> 
> Lixiaokeng

Thanks for the report and the excellent analysis. The general problem
may perhaps not so bad currently, as multipathd uses the "vecs lock" to
protect its own data structures, and most libdm calls are made under
this lock. dmevent_loop() is one example where the vecs lock is not
taken.

I'm attaching a tentative patch for the particular race you reported, 
which resorts to simply taking the vecs lock im dmevent_loop. Please
review/test. This is tentative, I haven't tested it myself beyond
making sure that it passes the unit test.

To be sure we aren't missing anything we'd need to assess if there are
other libdm calls which are not made under the vecs lock. Short-term I
have to time for a complete assessment. My guess would be that there
are a few, but not many, and most of them not prone to races.

In the long run, we need to think about the general problem of libdm
not being thread-safe. So far we've had very few reports like this
(actually, I'm aware of none), so perhaps the vecs log protects us
quite well already. OTOH, we've discussed repeatedly that the locking
in multipathd is too monolithic and should be more fine-grained; as
soon as we drop the monolithic lock, this might hit us hard. If we
introduce an additional lock for libdm, we'll have to think about
potential deadlock situations (probably easy, as the new lock would
just warp libdm calls and would thus, by definition, be at the bottom
of any lock hierarchy in libmultipath).

@Zdenek, do we have to protect every libdm call, or is it sufficient
to protect only dm_task_run(), as lixiaokeng suggested?

@Ben, please take a look as well, as you're the main author of the
dmevents code and know libdm better than me.

Regards,
Martin


> 
> On 2020/7/9 16:54, Zdenek Kabelac wrote:
> > Dne 09. 07. 20 v 9:02 lixiaokeng napsal(a):
> > > When one iscsi device logs in and logs out several times, the
> > > number of fd, which points to '/dev/mapper/control', increases in
> > > /proc/<multipathd-pid>/fd as follows,
> > > [root@localhost fd]# ll | grep control
> > > diff --git a/libdm/ioctl/libdm-iface.c b/libdm/ioctl/libdm-
> > > iface.c
> > > index 7ad549c..7168369 100644
> > > --- a/libdm/ioctl/libdm-iface.c
> > > +++ b/libdm/ioctl/libdm-iface.c
> > > @@ -23,6 +23,7 @@
> > >   #include <sys/ioctl.h>
> > >   #include <sys/utsname.h>
> > >   #include <limits.h>
> > > +#include <pthread.h>
> > > 
> > >   #ifdef __linux__
> > >   #  include "libdm/misc/kdev_t.h"
> > > @@ -81,6 +82,7 @@ static dm_bitset_t _dm_bitset = NULL;
> > >   static uint32_t _dm_device_major = 0;
> > > 
> > >   static int _control_fd = -1;
> > > +static pthread_mutex_t _control_fd_mutex =
> > > PTHREAD_MUTEX_INITIALIZER;
> > >   static int _hold_control_fd_open = 0;
> > >   static int _version_checked = 0;
> > >   static int _version_ok = 1;
> > > @@ -404,10 +406,19 @@ static void _close_control_fd(void)
> > >   #ifdef DM_IOCTLS
> > >   static int _open_and_assign_control_fd(const char *control)
> > >   {
> > > +    pthread_mutex_lock(&_control_fd_mutex);
> > > +
> > > +    if (_control_fd != -1) {
> > 
> > 
> > Hi
> > 
> > libdm is not pthread aware/safe library.
> > 
> > So the fix needs to happen on libdm user-side to prevent race call
> > of this function.
> > 
> > 
> > Zdenek
> > 
> > 
> > .

From 05115217286dbbd1679e71c4aaceabe42afb5f7f Mon Sep 17 00:00:00 2001
From: Martin Wilck <mwilck@xxxxxxxx>
Date: Mon, 13 Jul 2020 11:03:43 +0200
Subject: [PATCH] libmultipath: dmevent_loop: hold vecs lock

The libdm calls in the event waiter, in particular get_dm_events(),
may race with libdm calls from other threads, leading to a fd leak
for the libdm control fd.

The procedure details when fd leak occurs given as follows:

1. wait_dmevents thread
wait_dmevents
->dmevent_loop
    ->dm_get_events (->dm_is_mpath)
        ->dm_task_run
            ->_open_control
                ->_open_and_assign_control_fd

2. uevqloop thread
uevqloop
->uevent_dispatch
    ->service_uevq
        ->ev_add_path
            ->__setup_multipath
                ->dm_get_info
                    ->dm_task_run
                        ->_open_control
                            ->_open_and_assign_control_fd

In the long run, we may need to protect libdm access with a separate
lock. For now, work around this particular race by grabbing the
vecs log for an entire round of events processing.

This implies also changes in the dmevents test code, because poll()
is not called from dmevent_loop() any more.

Reported-by: lixiaokeng <lixiaokeng@xxxxxxxxxx>
---
 multipathd/dmevents.c | 32 ++++++++++++++++++--------------
 tests/dmevents.c      | 35 ++++-------------------------------
 2 files changed, 22 insertions(+), 45 deletions(-)

diff --git a/multipathd/dmevents.c b/multipathd/dmevents.c
index b235dd4..be7e122 100644
--- a/multipathd/dmevents.c
+++ b/multipathd/dmevents.c
@@ -289,24 +289,15 @@ static void unwatch_dmevents(char *name)
 static int dmevent_loop (void)
 {
 	int r, i = 0;
-	struct pollfd pfd;
 	struct dev_event *dev_evt;
 
-	pfd.fd = waiter->fd;
-	pfd.events = POLLIN;
-	r = poll(&pfd, 1, -1);
-	if (r <= 0) {
-		condlog(0, "failed polling for dm events: %s", strerror(errno));
-		/* sleep 1s and hope things get better */
-		return 1;
-	}
-
 	if (arm_dm_event_poll(waiter->fd) != 0) {
 		condlog(0, "Cannot re-arm event polling: %s", strerror(errno));
 		/* sleep 1s and hope things get better */
 		return 1;
 	}
 
+
 	if (dm_get_events() != 0) {
 		condlog(0, "failed getting dm events: %s", strerror(errno));
 		/* sleep 1s and hope things get better */
@@ -352,15 +343,12 @@ static int dmevent_loop (void)
 		 * 4) a path reinstate : nothing to do
 		 * 5) a switch group : nothing to do
 		 */
-		pthread_cleanup_push(cleanup_lock, &waiter->vecs->lock);
-		lock(&waiter->vecs->lock);
 		pthread_testcancel();
 		r = 0;
 		if (curr_dev.action == EVENT_REMOVE)
 			remove_map_by_alias(curr_dev.name, waiter->vecs, 1);
 		else
 			r = update_multipath(waiter->vecs, curr_dev.name, 1);
-		pthread_cleanup_pop(1);
 
 		if (r) {
 			condlog(2, "%s: stopped watching dmevents",
@@ -392,7 +380,23 @@ void *wait_dmevents (__attribute__((unused)) void *unused)
 	mlockall(MCL_CURRENT | MCL_FUTURE);
 
 	while (1) {
-		r = dmevent_loop();
+		struct pollfd pfd;
+		pfd.fd = waiter->fd;
+
+		pfd.events = POLLIN;
+		r = poll(&pfd, 1, -1);
+
+		if (r <= 0) {
+			condlog(0, "failed polling for dm events: %s", strerror(errno));
+			/* sleep 1s and hope things get better */
+			r = 1;
+		} else {
+			pthread_cleanup_push(cleanup_lock, &waiter->vecs->lock);
+			lock(&waiter->vecs->lock);
+			pthread_testcancel();
+			r = dmevent_loop();
+			pthread_cleanup_pop(1);
+		}
 
 		if (r < 0)
 			break;
diff --git a/tests/dmevents.c b/tests/dmevents.c
index bee117a..6857963 100644
--- a/tests/dmevents.c
+++ b/tests/dmevents.c
@@ -627,9 +627,8 @@ static void test_get_events_good1(void **state)
 	assert_int_equal(VECTOR_SIZE(waiter->events), 3);
 }
 
-/* poll does not return an event. nothing happens. The
- * devices remain after this test */
-static void test_dmevent_loop_bad0(void **state)
+/* arm_dm_event_poll's ioctl fails. Nothing happens */
+static void test_dmevent_loop_bad1(void **state)
 {
 	struct dm_device *dev;
 	struct dev_event *dev_evt;
@@ -637,34 +636,13 @@ static void test_dmevent_loop_bad0(void **state)
 	if (datap == NULL)
 		skip();
 
+	// will_return(__wrap_poll, 1);
 	remove_all_dm_device_events();
 	unwatch_all_dmevents();
 	assert_int_equal(add_dm_device_event("foo", 1, 5), 0);
 	will_return(__wrap_dm_geteventnr, 0);
 	assert_int_equal(watch_dmevents("foo"), 0);
 	assert_int_equal(add_dm_device_event("foo", 1, 6), 0);
-	will_return(__wrap_poll, 0);
-	assert_int_equal(dmevent_loop(), 1);
-	dev_evt = find_dmevents("foo");
-	assert_ptr_not_equal(dev_evt, NULL);
-	assert_int_equal(dev_evt->evt_nr, 5);
-	assert_int_equal(dev_evt->action, EVENT_NOTHING);
-	dev = find_dm_device("foo");
-	assert_ptr_not_equal(dev, NULL);
-	assert_int_equal(dev->evt_nr, 6);
-	assert_int_equal(dev->update_nr, 5);
-}
-
-/* arm_dm_event_poll's ioctl fails. Nothing happens */
-static void test_dmevent_loop_bad1(void **state)
-{
-	struct dm_device *dev;
-	struct dev_event *dev_evt;
-	struct test_data *datap = (struct test_data *)(*state);
-	if (datap == NULL)
-		skip();
-
-	will_return(__wrap_poll, 1);
 	will_return(__wrap_ioctl, -1);
 	assert_int_equal(dmevent_loop(), 1);
 	dev_evt = find_dmevents("foo");
@@ -686,7 +664,7 @@ static void test_dmevent_loop_bad2(void **state)
 	if (datap == NULL)
 		skip();
 
-	will_return(__wrap_poll, 1);
+	// will_return(__wrap_poll, 1);
 	will_return(__wrap_ioctl, 0);
 	will_return(__wrap_libmp_dm_task_create, NULL);
 	assert_int_equal(dmevent_loop(), 1);
@@ -710,7 +688,6 @@ static void test_dmevent_loop_good0(void **state)
 
 	remove_all_dm_device_events();
 	unwatch_all_dmevents();
-	will_return(__wrap_poll, 1);
 	will_return(__wrap_ioctl, 0);
 	will_return(__wrap_libmp_dm_task_create, &data);
 	will_return(__wrap_dm_task_no_open_count, 1);
@@ -746,7 +723,6 @@ static void test_dmevent_loop_good1(void **state)
 	assert_int_equal(watch_dmevents("xyzzy"), 0);
 	assert_int_equal(add_dm_device_event("foo", 1, 6), 0);
 	assert_int_equal(remove_dm_device_event("xyzzy"), 0);
-	will_return(__wrap_poll, 1);
 	will_return(__wrap_ioctl, 0);
 	will_return(__wrap_libmp_dm_task_create, &data);
 	will_return(__wrap_dm_task_no_open_count, 1);
@@ -794,7 +770,6 @@ static void test_dmevent_loop_good2(void **state)
 	will_return(__wrap_dm_geteventnr, 0);
 	assert_int_equal(watch_dmevents("baz"), 0);
 	assert_int_equal(add_dm_device_event("baz", 1, 14), 0);
-	will_return(__wrap_poll, 1);
 	will_return(__wrap_ioctl, 0);
 	will_return(__wrap_libmp_dm_task_create, &data);
 	will_return(__wrap_dm_task_no_open_count, 1);
@@ -838,7 +813,6 @@ static void test_dmevent_loop_good3(void **state)
 
 	assert_int_equal(remove_dm_device_event("foo"), 0);
 	unwatch_dmevents("bar");
-	will_return(__wrap_poll, 1);
 	will_return(__wrap_ioctl, 0);
 	will_return(__wrap_libmp_dm_task_create, &data);
 	will_return(__wrap_dm_task_no_open_count, 1);
@@ -896,7 +870,6 @@ int test_dmevents(void)
 		cmocka_unit_test(test_get_events_good0),
 		cmocka_unit_test(test_get_events_good1),
 		cmocka_unit_test(test_arm_poll),
-		cmocka_unit_test(test_dmevent_loop_bad0),
 		cmocka_unit_test(test_dmevent_loop_bad1),
 		cmocka_unit_test(test_dmevent_loop_bad2),
 		cmocka_unit_test(test_dmevent_loop_good0),
-- 
2.26.2

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel

[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux