Sorry for the late feedback. The version we are currently testing is 0.8.4, so we only merge the first 3 patches in this series of patches. Then after the actual test, it was found that the effect improvement is not very obvious. Test environment: 1000 multipath devices, 16 paths per device Test command: Aggregate multipath devices using multipathd add path Time consuming (time for 16 paths to aggregate 1 multipath device): 1. Original: < 4s: 76.9%; 4s~10s: 18.4%; >10s: 4.7%; 2. After using the patches: < 4s: 79.1%; 4s~10s: 18.2%; >10s: 2.7%; >From the results, the time-consuming improvement of the patch is not obvious, so we made a few changes to the patch and it worked as expected. The modification of the patch is as follows: Paths_checked is changed to configurable, current setting n = 2. Removed judgment on diff_time. Sleep change from 10us to 5ms if (++paths_checked % n == 0 && lock_has_waiters(&vecs->lock)) { get_monotonic_time(&end_time); timespecsub(&end_time, &chk_start_time, &diff_time); if (diff_time.tv_sec > 0) // Delete the code, goto directly goto unlock; } if (checker_state != CHECKER_FINISHED) { /* Yield to waiters */ //struct timespec wait = { .tv_nsec = 10000, }; //nanosleep(&wait, NULL); usleep(5000); // sleep change from 10us to 5ms } Time consuming(After making the above changes to the patch): < 4s: 99.5%; = 4s: 0.5%; > 4s: 0%; 在 2022/8/18 4:48, Benjamin Marzinski 写道: > When there are a huge number of paths (> 10000) The amount of time that > the checkerloop can hold the vecs lock for while checking the paths can > get to be large enough that it starves other vecs lock users. If path > checking takes long enough, it's possible that uxlsnr threads will never > run. To deal with this, this patchset makes it possible to drop the vecs > lock while checking the paths, and then reacquire it and continue with > the next path to check. > > My choice of only checking if there are waiters every 128 paths checked > and only interrupting if path checking has taken more than a second are > arbitrary. I didn't want to slow down path checking in the common case > where this isn't an issue, and I wanted to avoid path checking getting > starved by other vecs->lock users. Having the checkerloop wait for 10000 > nsec was based on my own testing with a setup using 4K multipath devies > with 4 paths each. This was almost always long enough for the uevent or > uxlsnr client to grab the vecs lock, but I'm not sure how dependent this > is on details of the system. For instance with my setup in never took > more than 20 seconds to check the paths. and usually, a looping through > all the paths took well under 10 seconds, most often under 5. I would > only occasionally run into situations where a uxlsnr client would time > out. > > V2 Changes: > 0003: Switched to a simpler method of determining the path to continue > checking at, as suggested by Martin Wilck. Also fixed a bug when > the previous index was larger than the current vector size. > > Benjamin Marzinski (6): > multipathd: Use regular pthread_mutex_t for waiter_lock > multipathd: track waiters for mutex_lock > multipathd: Occasionally allow waiters to interrupt checking paths > multipathd: allow uxlsnr clients to interrupt checking paths > multipathd: fix uxlsnr timeout > multipathd: Don't check if timespec.tv_sec is zero > > libmultipath/lock.h | 16 +++++ > libmultipath/structs.h | 1 + > multipathd/main.c | 147 ++++++++++++++++++++++++++--------------- > multipathd/uxlsnr.c | 23 +++++-- > multipathd/uxlsnr.h | 1 + > multipathd/waiter.c | 14 ++-- > 6 files changed, 136 insertions(+), 66 deletions(-) > -- dm-devel mailing list dm-devel@xxxxxxxxxx https://listman.redhat.com/mailman/listinfo/dm-devel