On Thu, Sep 15, 2022 at 02:56:36PM +0800, Wu Guanghao wrote: > Sorry for the late feedback. > > The version we are currently testing is 0.8.4, so we only merge the > first 3 patches in this series of patches. Then after the actual test, > it was found that the effect improvement is not very obvious. > > Test environment: 1000 multipath devices, 16 paths per device > Test command: Aggregate multipath devices using multipathd add path > Time consuming (time for 16 paths to aggregate 1 multipath device): > 1. Original: > < 4s: 76.9%; > 4s~10s: 18.4%; > >10s: 4.7%; > 2. After using the patches: > < 4s: 79.1%; > 4s~10s: 18.2%; > >10s: 2.7%; > > >From the results, the time-consuming improvement of the patch is not obvious, > so we made a few changes to the patch and it worked as expected. The modification > of the patch is as follows: > > Paths_checked is changed to configurable, current setting n = 2. > Removed judgment on diff_time. > Sleep change from 10us to 5ms I worry that this is giving too much deference to the uevents. If there is a uevent storm, checking will stop for 5ms every 2 paths checked. I'm worried that this will make it take significantly longer for the the path checker to complete a cycle. Making paths_checked configurable, so that this doesn't trigger too often does help to avoid the issue that checking the time from chk_start_time was dealing with, and makes the mechanics of this simpler. But 5ms seems like a long time to have to wait, just to make sure that another process had the time to grab the vecs lock. Perhaps it would be better to sleep for a shorter length of time, but in a loop, where we can check to see if progress has been made, perhaps by checking some counter of events and user commands serviced. This way, we aren't sleeping too long for no good reason. > if (++paths_checked % n == 0 && > lock_has_waiters(&vecs->lock)) { > get_monotonic_time(&end_time); > timespecsub(&end_time, &chk_start_time, > &diff_time); > if (diff_time.tv_sec > 0) // Delete the code, goto directly > goto unlock; > } > > if (checker_state != CHECKER_FINISHED) { > /* Yield to waiters */ > //struct timespec wait = { .tv_nsec = 10000, }; > //nanosleep(&wait, NULL); > usleep(5000); // sleep change from 10us to 5ms > } > > Time consuming(After making the above changes to the patch): > < 4s: 99.5%; > = 4s: 0.5%; > > 4s: 0%; Since I believe that this is likely causing a real impact on the checker speed, I'm not sure that we're looking for results like this. Are you seeing That "path checkers took longer than ..." message? How long does it say the checker is taking (and what do you have max_polling_interval set to)? -Ben > 在 2022/8/18 4:48, Benjamin Marzinski 写道: > > When there are a huge number of paths (> 10000) The amount of time that > > the checkerloop can hold the vecs lock for while checking the paths can > > get to be large enough that it starves other vecs lock users. If path > > checking takes long enough, it's possible that uxlsnr threads will never > > run. To deal with this, this patchset makes it possible to drop the vecs > > lock while checking the paths, and then reacquire it and continue with > > the next path to check. > > > > My choice of only checking if there are waiters every 128 paths checked > > and only interrupting if path checking has taken more than a second are > > arbitrary. I didn't want to slow down path checking in the common case > > where this isn't an issue, and I wanted to avoid path checking getting > > starved by other vecs->lock users. Having the checkerloop wait for 10000 > > nsec was based on my own testing with a setup using 4K multipath devies > > with 4 paths each. This was almost always long enough for the uevent or > > uxlsnr client to grab the vecs lock, but I'm not sure how dependent this > > is on details of the system. For instance with my setup in never took > > more than 20 seconds to check the paths. and usually, a looping through > > all the paths took well under 10 seconds, most often under 5. I would > > only occasionally run into situations where a uxlsnr client would time > > out. > > > > V2 Changes: > > 0003: Switched to a simpler method of determining the path to continue > > checking at, as suggested by Martin Wilck. Also fixed a bug when > > the previous index was larger than the current vector size. > > > > Benjamin Marzinski (6): > > multipathd: Use regular pthread_mutex_t for waiter_lock > > multipathd: track waiters for mutex_lock > > multipathd: Occasionally allow waiters to interrupt checking paths > > multipathd: allow uxlsnr clients to interrupt checking paths > > multipathd: fix uxlsnr timeout > > multipathd: Don't check if timespec.tv_sec is zero > > > > libmultipath/lock.h | 16 +++++ > > libmultipath/structs.h | 1 + > > multipathd/main.c | 147 ++++++++++++++++++++++++++--------------- > > multipathd/uxlsnr.c | 23 +++++-- > > multipathd/uxlsnr.h | 1 + > > multipathd/waiter.c | 14 ++-- > > 6 files changed, 136 insertions(+), 66 deletions(-) > > -- dm-devel mailing list dm-devel@xxxxxxxxxx https://listman.redhat.com/mailman/listinfo/dm-devel