Re: [PATCH V2 0/6] allowing path checking to be interrupted.

Wu Guanghao <wuguanghao3@xxxxxxxxxx> · Tue, 20 Sep 2022 17:20:55 +0800



在 2022/9/16 2:17, Benjamin Marzinski 写道:
> On Thu, Sep 15, 2022 at 02:56:36PM +0800, Wu Guanghao wrote:
>> Sorry for the late feedback.
>>
>> The version we are currently testing is 0.8.4, so we only merge the
>> first 3 patches in this series of patches. Then after the actual test,
>> it was found that the effect improvement is not very obvious.
>>
>> Test environment: 1000 multipath devices, 16 paths per device
>> Test command:  Aggregate multipath devices using multipathd add path
>> Time consuming （time for 16 paths to aggregate 1 multipath device):
>> 1. Original:
>> 	< 4s:   76.9%;
>> 	4s~10s: 18.4%;
>> 	>10s:	 4.7%;
>> 2. After using the patches:
>> 	< 4s:	79.1%;
>> 	4s~10s: 18.2%;
>> 	>10s:	 2.7%;
>>
>> >From the results, the time-consuming improvement of the patch is not obvious,
>> so we made a few changes to the patch and it worked as expected. The modification
>> of the patch is as follows:
>>
>> Paths_checked is changed to configurable, current setting n = 2.
>> Removed judgment on diff_time.
>> Sleep change from 10us to 5ms
> 
> I worry that this is giving too much deference to the uevents. If there
> is a uevent storm, checking will stop for 5ms every 2 paths checked. I'm
> worried that this will make it take significantly longer for the the
> path checker to complete a cycle.  Making paths_checked configurable, so
> that this doesn't trigger too often does help to avoid the issue that
> checking the time from chk_start_time was dealing with, and makes the
> mechanics of this simpler.  But 5ms seems like a long time to have to
> wait, just to make sure that another process had the time to grab the
> vecs lock.  Perhaps it would be better to sleep for a shorter length of
> time, but in a loop, where we can check to see if progress has been
> made, perhaps by checking some counter of events and user commands
> serviced. This way, we aren't sleeping too long for no good reason.
> I agree with this, we are also going to adjust the sleep time, and then
continue the test.

>> if (++paths_checked % n == 0 &&
>> 	lock_has_waiters(&vecs->lock)) {
>> 		get_monotonic_time(&end_time);
>> 		timespecsub(&end_time, &chk_start_time,
>> 			    &diff_time);
>> 		if (diff_time.tv_sec > 0) // Delete the code, goto directly
>> 			goto unlock;
>> }
>>
>> if (checker_state != CHECKER_FINISHED) {
>> 	/* Yield to waiters */
>> 	//struct timespec wait = { .tv_nsec = 10000, };
>> 	//nanosleep(&wait, NULL);
>> 	usleep(5000);  // sleep change from 10us to 5ms
>> }
>>
>> Time consuming（After making the above changes to the patch）:
>> < 4s: 99.5%;
>> = 4s: 0.5%;
>>> 4s: 0%;
> 
> Since I believe that this is likely causing a real impact on the checker
> speed, I'm not sure that we're looking for results like this. Are you
> seeing That "path checkers took longer than ..." message? How long does
> it say the checker is taking (and what do you have max_polling_interval
> set to)?
> Yes, I saw this message. In the current test environment, using the original
code, all path checks take at least 50s to complete. We currently set
max_polling_interval=20s, so it must print this message. Adding sleep just
makes this message print more often.

> -Ben
> 

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://listman.redhat.com/mailman/listinfo/dm-devel