Re: [PATCH] swap: add block io poll in swapin path

Jens Axboe <axboe@xxxxxx> · Thu, 4 May 2017 20:24:48 -0600

On 05/04/2017 05:23 PM, Chen, Tim C wrote:
>> -----Original Message-----
>> From: Jens Axboe [mailto:axboe@xxxxxx]
>> Sent: Thursday, May 04, 2017 2:29 PM
>> To: Shaohua Li
>> Cc: linux-mm@xxxxxxxxx; Andrew Morton; Kernel-team@xxxxxx; Chen, Tim C;
>> Huang, Ying
>> Subject: Re: [PATCH] swap: add block io poll in swapin path
>>
>> On 05/04/2017 03:27 PM, Shaohua Li wrote:
>>> On Thu, May 04, 2017 at 02:53:59PM -0600, Jens Axboe wrote:
>>>> On 05/04/2017 02:42 PM, Shaohua Li wrote:
>>>>> For fast flash disk, async IO could introduce overhead because of
>>>>> context switch. block-mq now supports IO poll, which improves
>>>>> performance and latency a lot. swapin is a good place to use this
>>>>> technique, because the task is waitting for the swapin page to
>>>>> continue execution.
>>>>
>>>> Nitfy!
>>>>
>>>>> In my virtual machine, directly read 4k data from a NVMe with iopoll
>>>>> is about 60% better than that without poll. With iopoll support in
>>>>> swapin patch, my microbenchmark (a task does random memory write) is
>>>>> about 10% ~ 25% faster. CPU utilization increases a lot though, 2x
>>>>> and even 3x CPU utilization. This will depend on disk speed though.
>>>>> While iopoll in swapin isn't intended for all usage cases, it's a
>>>>> win for latency sensistive workloads with high speed swap disk.
>>>>> block layer has knob to control poll in runtime. If poll isn't
>>>>> enabled in block layer, there should be no noticeable change in swapin.
>>>>
>>>> Did you try with hybrid polling enabled? We should be able to achieve
>>>> most of the latency win at much less CPU cost with that.
>>>
>>> Hybrid poll is much slower than classic in my test, I tried different settings.
>>> maybe because this is a vm though.
>>
>> It's probably a vm issue, I bet the timed sleep are just too slow to be useful in a
>> vm.
>>
> 
> The speedup is quite nice.  The high CPU utilization is somewhat of a
> concern.   But this is directly proportional to the poll time or
> latency of the drive's response.  The latest generation of SSD drive's
> latency is a factor of 7 or more compared to the previous one, so the
> poll time could go down quite a bit, depending on what drive you were
> using in your test.

That was my point with the hybrid comment. In hybrid mode, there's no
reason why we can't get the same latencies as pure polling, at a
drastically reduced overhead. The latencies of the drive should not
matter, as we use the actual completion times to decide how long to
sleep and spin.

There's room for a bit of improvement, though. We should be tracking the
time it takes to do sleep+wakeup, and factor that into our wait cycle.
Currently we just blindly use half the average completion time. But even
with that, testing by others have shown basically identical latencies
with hybrid polling, burning only half a core instead of a full one.
Compared to strict sync irq driven mode, that's still a bit higher in
terms of CPU, but not really that much.

-- 
Jens Axboe

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>