Re: Re: io_uring: io_uring: releasing CPU resources when polling

hexue <xue01.he@xxxxxxxxxxx> · Mon, 22 Apr 2024 16:18:16 +0800

On 4/19/24 13:27, Pavel Begunkov wrote:
>On 4/18/24 10:31, hexue wrote:
>> This patch is intended to release the CPU resources of io_uring in
>> polling mode. When IO is issued, the program immediately polls for
>> check completion, which is a waste of CPU resources when IO commands
>> are executed on the disk.
>> 
>> I add the hybrid polling feature in io_uring, enables polling to
>> release a portion of CPU resources without affecting block layer.
>
>So that's basically the block layer hybrid polling, which, to
>remind, was removed not that long ago, but moved into io_uring.

The idea is based on the previous blcok layer hybrid poll, but
it's not just for single IO. I think hybrid poll is still an effective
CPU-saving solution, and I've tested it with good results on both PCIe
Gen4 and Gen5 nvme devices.

>> - Record the running time and context switching time of each
>>    IO, and use these time to determine whether a process continue
>>    to schedule.
>> 
>> - Adaptive adjustment to different devices. Due to the real-time
>>    nature of time recording, each device's IO processing speed is
>>    different, so the CPU optimization effect will vary.
>> 
>> - Set a interface (ctx->flag) enables application to choose whether
>>    or not to use this feature.
>> 
>> The CPU optimization in peak workload of patch is tested as follows:
>>    all CPU utilization of original polling is 100% for per CPU, after
>>    optimization, the CPU utilization drop a lot (per CPU);
>
>The first version was about cases that don't have iopoll queues.
>How many IO poll queues did you have to get these numbers?

The test enviroment has 8 CPU 16G mem, and I set 8 poll queues this case.
These data of the test from Gen4 disk.

>>     read(128k, QD64, 1Job)     37%   write(128k, QD64, 1Job)     40%
>>     randread(4k, QD64, 16Job)  52%   randwrite(4k, QD64, 16Job)  12%
>> 
>>    Compared to original polling, the optimised performance reduction
>>    with peak workload within 1%.
>> 
>>     read  0.29%     write  0.51%    randread  0.09%    randwrite  0%
>> 
>> Reviewed-by: KANCHAN JOSHI <joshi.k@xxxxxxxxxxx>
>
>Kanchan, did you _really_ take a look at the patch?
>

sorry I misunderstood the meaning of "reviewed", I've had some discussions
with Kanchan based on the test results, he just give some suggestions and
possible approach for changes but haven't reviewed the implementation yet.
This is my mistake, please ignore this "reviewed" message.

>> Signed-off-by: hexue <xue01.he@xxxxxxxxxxx>
>> ---
>>   include/linux/io_uring_types.h | 10 +++++
>>   include/uapi/linux/io_uring.h  |  1 +
>>   io_uring/io_uring.c            | 28 +++++++++++++-
>>   io_uring/io_uring.h            |  2 +
>>   io_uring/rw.c                  | 69 ++++++++++++++++++++++++++++++++++
>>   5 files changed, 109 insertions(+), 1 deletion(-)
>> 
>> diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
>> index 854ad67a5f70..7607fd8de91c 100644
>> --- a/include/linux/io_uring_types.h
>> +++ b/include/linux/io_uring_types.h
>> @@ -224,6 +224,11 @@ struct io_alloc_cache {
>>   	size_t			elem_size;
>>   };
>