Re: [dm-devel] [PATCH 4/4] dm: support I/O polling

JeffleXu <jefflexu@xxxxxxxxxxxxxxxxx> · Mon, 8 Mar 2021 11:54:06 +0800

On 3/6/21 1:56 AM, Heinz Mauelshagen wrote:
> 
> On 3/5/21 6:46 PM, Heinz Mauelshagen wrote:
>> On 3/5/21 10:52 AM, JeffleXu wrote:
>>>
>>> On 3/3/21 6:09 PM, Mikulas Patocka wrote:
>>>>
>>>> On Wed, 3 Mar 2021, JeffleXu wrote:
>>>>
>>>>>
>>>>> On 3/3/21 3:05 AM, Mikulas Patocka wrote:
>>>>>
>>>>>> Support I/O polling if submit_bio_noacct_mq_direct returned non-empty
>>>>>> cookie.
>>>>>>
>>>>>> Signed-off-by: Mikulas Patocka <mpatocka@xxxxxxxxxx>
>>>>>>
>>>>>> ---
>>>>>>   drivers/md/dm.c |    5 +++++
>>>>>>   1 file changed, 5 insertions(+)
>>>>>>
>>>>>> Index: linux-2.6/drivers/md/dm.c
>>>>>> ===================================================================
>>>>>> --- linux-2.6.orig/drivers/md/dm.c    2021-03-02
>>>>>> 19:26:34.000000000 +0100
>>>>>> +++ linux-2.6/drivers/md/dm.c    2021-03-02 19:26:34.000000000 +0100
>>>>>> @@ -1682,6 +1682,11 @@ static void __split_and_process_bio(stru
>>>>>>           }
>>>>>>       }
>>>>>>   +    if (ci.poll_cookie != BLK_QC_T_NONE) {
>>>>>> +        while (atomic_read(&ci.io->io_count) > 1 &&
>>>>>> +               blk_poll(ci.poll_queue, ci.poll_cookie, true)) ;
>>>>>> +    }
>>>>>> +
>>>>>>       /* drop the extra reference count */
>>>>>>       dec_pending(ci.io, errno_to_blk_status(error));
>>>>>>   }
>>>>> It seems that the general idea of your design is to
>>>>> 1) submit *one* split bio
>>>>> 2) blk_poll(), waiting the previously submitted split bio complets
>>>> No, I submit all the bios and poll for the last one.
>>>>
>>>>> and then submit next split bio, repeating the above process. I'm
>>>>> afraid
>>>>> the performance may be an issue here, since the batch every time
>>>>> blk_poll() reaps may decrease.
>>>> Could you benchmark it?
>>> I only tested dm-linear.
>>>
>>> The configuration (dm table) of dm-linear is:
>>> 0 1048576 linear /dev/nvme0n1 0
>>> 1048576 1048576 linear /dev/nvme2n1 0
>>> 2097152 1048576 linear /dev/nvme5n1 0
>>>
>>>
>>> fio script used is:
>>> ```
>>> $cat fio.conf
>>> [global]
>>> name=iouring-sqpoll-iopoll-1
>>> ioengine=io_uring
>>> iodepth=128
>>> numjobs=1
>>> thread
>>> rw=randread
>>> direct=1
>>> registerfiles=1
>>> hipri=1
>>> runtime=10
>>> time_based
>>> group_reporting
>>> randrepeat=0
>>> filename=/dev/mapper/testdev
>>> bs=4k
>>>
>>> [job-1]
>>> cpus_allowed=14
>>> ```
>>>
>>> IOPS (IRQ mode) | IOPS (iopoll mode (hipri=1))
>>> --------------- | --------------------
>>>             213k |           19k
>>>
>>> At least, it doesn't work well with io_uring interface.
>>>
>>>
>>
>>
>> Jeffle,
>>
>> I ran your above fio test on a linear LV split across 3 NVMes to
>> second your split mapping
>> (system: 32 core Intel, 256GiB RAM) comparing io engines sync, libaio
>> and io_uring,
>> the latter w/ and w/o hipri (sync+libaio obviously w/o registerfiles
>> and hipri) which resulted ok:
>>
>>
>>
>> sync  |  libaio  |  IRQ mode (hipri=0) | iopoll (hipri=1)
>> ------|----------|---------------------|----------------- 56.3K |   
>> 290K  |                329K |             351K I can't second your
>> drastic hipri=1 drop here...
> 
> 
> Sorry, email mess.
> 
> 
> sync   |  libaio  |  IRQ mode (hipri=0) | iopoll (hipri=1)
> -------|----------|---------------------|-----------------
> 56.3K  |    290K  |                329K |             351K
> 
> 
> 
> I can't second your drastic hipri=1 drop here...
> 

Hummm, that's indeed somewhat strange...

My test environment:
- CPU: 128 cores, though only one CPU core is used since
'cpus_allowed=14' in fio configuration
- memory: 983G memory free
- NVMe: Huawai ES3510P (HWE52P434T0L005N), with 'nvme.poll_queues=3'

Maybe you didn't specify 'nvme.poll_queues=XXX'? In this case, IO still
goes into IRQ mode, even you have specified 'hipri=1'?

-- 
Thanks,
Jeffle