Re: [PATCH V2 0/6] allowing path checking to be interrupted.

Wu Guanghao <wuguanghao3@xxxxxxxxxx> · Thu, 15 Sep 2022 14:56:36 +0800

Sorry for the late feedback.

The version we are currently testing is 0.8.4, so we only merge the
first 3 patches in this series of patches. Then after the actual test,
it was found that the effect improvement is not very obvious.

Test environment: 1000 multipath devices, 16 paths per device
Test command:  Aggregate multipath devices using multipathd add path
Time consuming （time for 16 paths to aggregate 1 multipath device):
1. Original:
	< 4s:   76.9%;
	4s~10s: 18.4%;
	>10s:	 4.7%;
2. After using the patches:
	< 4s:	79.1%;
	4s~10s: 18.2%;
	>10s:	 2.7%;

>From the results, the time-consuming improvement of the patch is not obvious,
so we made a few changes to the patch and it worked as expected. The modification
of the patch is as follows:

Paths_checked is changed to configurable, current setting n = 2.
Removed judgment on diff_time.
Sleep change from 10us to 5ms

if (++paths_checked % n == 0 &&
	lock_has_waiters(&vecs->lock)) {
		get_monotonic_time(&end_time);
		timespecsub(&end_time, &chk_start_time,
			    &diff_time);
		if (diff_time.tv_sec > 0) // Delete the code, goto directly
			goto unlock;
}

if (checker_state != CHECKER_FINISHED) {
	/* Yield to waiters */
	//struct timespec wait = { .tv_nsec = 10000, };
	//nanosleep(&wait, NULL);
	usleep(5000);  // sleep change from 10us to 5ms
}

Time consuming（After making the above changes to the patch）:
< 4s: 99.5%;
= 4s: 0.5%;
> 4s: 0%;

在 2022/8/18 4:48, Benjamin Marzinski 写道:
> When there are a huge number of paths (> 10000) The amount of time that
> the checkerloop can hold the vecs lock for while checking the paths can
> get to be large enough that it starves other vecs lock users.  If path
> checking takes long enough, it's possible that uxlsnr threads will never
> run. To deal with this, this patchset makes it possible to drop the vecs
> lock while checking the paths, and then reacquire it and continue with
> the next path to check.
> 
> My choice of only checking if there are waiters every 128 paths checked
> and only interrupting if path checking has taken more than a second are
> arbitrary. I didn't want to slow down path checking in the common case
> where this isn't an issue, and I wanted to avoid path checking getting
> starved by other vecs->lock users. Having the checkerloop wait for 10000
> nsec was based on my own testing with a setup using 4K multipath devies
> with 4 paths each. This was almost always long enough for the uevent or
> uxlsnr client to grab the vecs lock, but I'm not sure how dependent this
> is on details of the system. For instance with my setup in never took
> more than 20 seconds to check the paths. and usually, a looping through
> all the paths took well under 10 seconds, most often under 5. I would
> only occasionally run into situations where a uxlsnr client would time
> out.
> 
> V2 Changes:
> 0003: Switched to a simpler method of determining the path to continue
>       checking at, as suggested by Martin Wilck. Also fixed a bug when
>       the previous index was larger than the current vector size.
> 
> Benjamin Marzinski (6):
>   multipathd: Use regular pthread_mutex_t for waiter_lock
>   multipathd: track waiters for mutex_lock
>   multipathd: Occasionally allow waiters to interrupt checking paths
>   multipathd: allow uxlsnr clients to interrupt checking paths
>   multipathd: fix uxlsnr timeout
>   multipathd: Don't check if timespec.tv_sec is zero
> 
>  libmultipath/lock.h    |  16 +++++
>  libmultipath/structs.h |   1 +
>  multipathd/main.c      | 147 ++++++++++++++++++++++++++---------------
>  multipathd/uxlsnr.c    |  23 +++++--
>  multipathd/uxlsnr.h    |   1 +
>  multipathd/waiter.c    |  14 ++--
>  6 files changed, 136 insertions(+), 66 deletions(-)
> 

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://listman.redhat.com/mailman/listinfo/dm-devel