Re: [PATCH] blk-mq: modify hybrid sleep time to aggressive

Pavel Begunkov <asml.silence@xxxxxxxxx> · Wed, 18 Nov 2020 14:17:13 +0000

On 18/11/2020 10:35, dongjoo seo wrote:
> I agree with your opinion. And your patch is also good approach.
> How about combine it? Adaptive solution with 3/4.

I couldn't disclose numbers back then, but thanks to a steep skewed
latency distribution of NAND/SSDs, it actually was automatically
adjusting it to ~3/4 for QD1 and long enough requests (~75+ us).
Also, if "max(sleep_ns, half_mean)" is removed, it was keeping the
time below 1/2 for fast requests (less than ~30us), and that is a
good thing because it was constantly oversleeping them.
Though new ultra low-latency SSDs came since then.

The real problem is to find anyone who actually uses it, otherwise
it's just a chunk of dead code. Do you? Anyone? I remember once it
was completely broken for months, but that was barely noticed.

> Because, if we get the intensive workloads then we need to 
> decrease the whole cpu utilization even with [1].
> 
> [1] https://lkml.org/lkml/2019/4/30/117 <https://lkml.org/lkml/2019/4/30/117>
> 
>> On Nov 18, 2020, at 6:26 PM, Pavel Begunkov <asml.silence@xxxxxxxxx> wrote:
>>
>> On 18/11/2020 07:16, Damien Le Moal wrote:
>>> On 2020/11/18 16:07, Christoph Hellwig wrote:
>>>> Adding Damien who wrote this code.
>>>
>>> Nope. It wasn't me. I think it was Stephen Bates:
>>>
>>> commit 720b8ccc4500 ("blk-mq: Add a polling specific stats function")
>>>
>>> So +Stephen.
>>>>
>>>> On Wed, Nov 18, 2020 at 09:47:46AM +0900, Dongjoo Seo wrote:
>>>>> Current sleep time for hybrid polling is half of mean time.
>>>>> The 'half' sleep time is good for minimizing the cpu utilization.
>>>>> But, the problem is that its cpu utilization is still high.
>>>>> this patch can help to minimize the cpu utilization side.
>>
>> This won't work well. When I was experimenting I saw that half mean
>> is actually is too much for fast enough requests, like <20us 4K writes,
>> it's oversleeping them. Even more I'm afraid of getting in a vicious
>> cycle, when oversleeping increases statistical mean, that increases
>> sleep time, that again increases stat mean, and so on. That what
>> happened for me when the scheme was too aggressive.
>>
>> I actually sent once patches [1] for automatic dynamic sleep time
>> adjustment, but nobody cared.
>>
>> [1] https://lkml.org/lkml/2019/4/30/117 <https://lkml.org/lkml/2019/4/30/117>
>>
>>>>>
>>>>> Below 1,2 is my test hardware sets.
>>>>>
>>>>> 1. Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz + Samsung 970 pro 1Tb
>>>>> 2. Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz + INTEL SSDPED1D480GA 480G
>>>>>
>>>>>        |  Classic Polling | Hybrid Polling  | this Patch
>>>>> -----------------------------------------------------------------
>>>>>        cpu util | IOPS(k) | cpu util | IOPS | cpu util | IOPS  |
>>>>> -----------------------------------------------------------------
>>>>> 1.       99.96   |   491   |  56.98   | 467  | 35.98    | 442   |
>>>>> -----------------------------------------------------------------
>>>>> 2.       99.94   |   582   |  56.3    | 582  | 35.28    | 582   |
>>>>>
>>>>> cpu util means that sum of sys and user util.
>>>>>
>>>>> I used 4k rand read for this test.
>>>>> because that case is worst case of I/O performance side.
>>>>> below one is my fio setup.
>>>>>
>>>>> name=pollTest
>>>>> ioengine=pvsync2
>>>>> hipri
>>>>> direct=1
>>>>> size=100%
>>>>> randrepeat=0
>>>>> time_based
>>>>> ramp_time=0
>>>>> norandommap
>>>>> refill_buffers
>>>>> log_avg_msec=1000
>>>>> log_max_value=1
>>>>> group_reporting
>>>>> filename=/dev/nvme0n1
>>>>> [rd_rnd_qd_1_4k_1w]
>>>>> bs=4k
>>>>> iodepth=32
>>>>> numjobs=[num of cpus]
>>>>> rw=randread
>>>>> runtime=60
>>>>> write_bw_log=bw_rd_rnd_qd_1_4k_1w
>>>>> write_iops_log=iops_rd_rnd_qd_1_4k_1w
>>>>> write_lat_log=lat_rd_rnd_qd_1_4k_1w
>>>>>
>>>>> Thanks
>>>>>
>>>>> Signed-off-by: Dongjoo Seo <commisori28@xxxxxxxxx>
>>>>> ---
>>>>> block/blk-mq.c | 3 +--
>>>>> 1 file changed, 1 insertion(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>>>>> index 1b25ec2fe9be..c3d578416899 100644
>>>>> --- a/block/blk-mq.c
>>>>> +++ b/block/blk-mq.c
>>>>> @@ -3749,8 +3749,7 @@ static unsigned long blk_mq_poll_nsecs(struct request_queue *q,
>>>>> 		return ret;
>>>>>
>>>>> 	if (q->poll_stat[bucket].nr_samples)
>>>>> -		ret = (q->poll_stat[bucket].mean + 1) / 2;
>>>>> -
>>>>> +		ret = (q->poll_stat[bucket].mean + 1) * 3 / 4;
>>>>> 	return ret;
>>>>> }
>>>>>
>>>>> -- 
>>>>> 2.17.1
>>>>>
>>>> ---end quoted text---
>>>>
>>>
>>>
>>
>> -- 
>> Pavel Begunkov
> 
> 

-- 
Pavel Begunkov