Re: [PATCH] blk-mq: modify hybrid sleep time to aggressive

Pavel Begunkov <asml.silence@xxxxxxxxx> · Wed, 18 Nov 2020 09:26:10 +0000

On 18/11/2020 07:16, Damien Le Moal wrote:
> On 2020/11/18 16:07, Christoph Hellwig wrote:
>> Adding Damien who wrote this code.
> 
> Nope. It wasn't me. I think it was Stephen Bates:
> 
> commit 720b8ccc4500 ("blk-mq: Add a polling specific stats function")
> 
> So +Stephen.
>>
>> On Wed, Nov 18, 2020 at 09:47:46AM +0900, Dongjoo Seo wrote:
>>> Current sleep time for hybrid polling is half of mean time.
>>> The 'half' sleep time is good for minimizing the cpu utilization.
>>> But, the problem is that its cpu utilization is still high.
>>> this patch can help to minimize the cpu utilization side.

This won't work well. When I was experimenting I saw that half mean
is actually is too much for fast enough requests, like <20us 4K writes,
it's oversleeping them. Even more I'm afraid of getting in a vicious
cycle, when oversleeping increases statistical mean, that increases
sleep time, that again increases stat mean, and so on. That what
happened for me when the scheme was too aggressive.

I actually sent once patches [1] for automatic dynamic sleep time
adjustment, but nobody cared.

[1] https://lkml.org/lkml/2019/4/30/117

>>>
>>> Below 1,2 is my test hardware sets.
>>>
>>> 1. Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz + Samsung 970 pro 1Tb
>>> 2. Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz + INTEL SSDPED1D480GA 480G
>>>
>>>         |  Classic Polling | Hybrid Polling  | this Patch
>>> -----------------------------------------------------------------
>>>         cpu util | IOPS(k) | cpu util | IOPS | cpu util | IOPS  |
>>> -----------------------------------------------------------------
>>> 1.       99.96   |   491   |  56.98   | 467  | 35.98    | 442   |
>>> -----------------------------------------------------------------
>>> 2.       99.94   |   582   |  56.3    | 582  | 35.28    | 582   |
>>>
>>> cpu util means that sum of sys and user util.
>>>
>>> I used 4k rand read for this test.
>>> because that case is worst case of I/O performance side.
>>> below one is my fio setup.
>>>
>>> name=pollTest
>>> ioengine=pvsync2
>>> hipri
>>> direct=1
>>> size=100%
>>> randrepeat=0
>>> time_based
>>> ramp_time=0
>>> norandommap
>>> refill_buffers
>>> log_avg_msec=1000
>>> log_max_value=1
>>> group_reporting
>>> filename=/dev/nvme0n1
>>> [rd_rnd_qd_1_4k_1w]
>>> bs=4k
>>> iodepth=32
>>> numjobs=[num of cpus]
>>> rw=randread
>>> runtime=60
>>> write_bw_log=bw_rd_rnd_qd_1_4k_1w
>>> write_iops_log=iops_rd_rnd_qd_1_4k_1w
>>> write_lat_log=lat_rd_rnd_qd_1_4k_1w
>>>
>>> Thanks
>>>
>>> Signed-off-by: Dongjoo Seo <commisori28@xxxxxxxxx>
>>> ---
>>>  block/blk-mq.c | 3 +--
>>>  1 file changed, 1 insertion(+), 2 deletions(-)
>>>
>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>>> index 1b25ec2fe9be..c3d578416899 100644
>>> --- a/block/blk-mq.c
>>> +++ b/block/blk-mq.c
>>> @@ -3749,8 +3749,7 @@ static unsigned long blk_mq_poll_nsecs(struct request_queue *q,
>>>  		return ret;
>>>  
>>>  	if (q->poll_stat[bucket].nr_samples)
>>> -		ret = (q->poll_stat[bucket].mean + 1) / 2;
>>> -
>>> +		ret = (q->poll_stat[bucket].mean + 1) * 3 / 4;
>>>  	return ret;
>>>  }
>>>  
>>> -- 
>>> 2.17.1
>>>
>> ---end quoted text---
>>
> 
> 

-- 
Pavel Begunkov