Re: [dm-devel] [PATCH 4/4] dm: support I/O polling

Jens Axboe <axboe@xxxxxxxxx> · Sun, 7 Mar 2021 20:55:13 -0700

On 3/7/21 8:54 PM, JeffleXu wrote:
> 
> 
> On 3/6/21 1:56 AM, Heinz Mauelshagen wrote:
>>
>> On 3/5/21 6:46 PM, Heinz Mauelshagen wrote:
>>> On 3/5/21 10:52 AM, JeffleXu wrote:
>>>>
>>>> On 3/3/21 6:09 PM, Mikulas Patocka wrote:
>>>>>
>>>>> On Wed, 3 Mar 2021, JeffleXu wrote:
>>>>>
>>>>>>
>>>>>> On 3/3/21 3:05 AM, Mikulas Patocka wrote:
>>>>>>
>>>>>>> Support I/O polling if submit_bio_noacct_mq_direct returned non-empty
>>>>>>> cookie.
>>>>>>>
>>>>>>> Signed-off-by: Mikulas Patocka <mpatocka@xxxxxxxxxx>
>>>>>>>
>>>>>>> ---
>>>>>>>   drivers/md/dm.c |    5 +++++
>>>>>>>   1 file changed, 5 insertions(+)
>>>>>>>
>>>>>>> Index: linux-2.6/drivers/md/dm.c
>>>>>>> ===================================================================
>>>>>>> --- linux-2.6.orig/drivers/md/dm.c    2021-03-02
>>>>>>> 19:26:34.000000000 +0100
>>>>>>> +++ linux-2.6/drivers/md/dm.c    2021-03-02 19:26:34.000000000 +0100
>>>>>>> @@ -1682,6 +1682,11 @@ static void __split_and_process_bio(stru
>>>>>>>           }
>>>>>>>       }
>>>>>>>   +    if (ci.poll_cookie != BLK_QC_T_NONE) {
>>>>>>> +        while (atomic_read(&ci.io->io_count) > 1 &&
>>>>>>> +               blk_poll(ci.poll_queue, ci.poll_cookie, true)) ;
>>>>>>> +    }
>>>>>>> +
>>>>>>>       /* drop the extra reference count */
>>>>>>>       dec_pending(ci.io, errno_to_blk_status(error));
>>>>>>>   }
>>>>>> It seems that the general idea of your design is to
>>>>>> 1) submit *one* split bio
>>>>>> 2) blk_poll(), waiting the previously submitted split bio complets
>>>>> No, I submit all the bios and poll for the last one.
>>>>>
>>>>>> and then submit next split bio, repeating the above process. I'm
>>>>>> afraid
>>>>>> the performance may be an issue here, since the batch every time
>>>>>> blk_poll() reaps may decrease.
>>>>> Could you benchmark it?
>>>> I only tested dm-linear.
>>>>
>>>> The configuration (dm table) of dm-linear is:
>>>> 0 1048576 linear /dev/nvme0n1 0
>>>> 1048576 1048576 linear /dev/nvme2n1 0
>>>> 2097152 1048576 linear /dev/nvme5n1 0
>>>>
>>>>
>>>> fio script used is:
>>>> ```
>>>> $cat fio.conf
>>>> [global]
>>>> name=iouring-sqpoll-iopoll-1
>>>> ioengine=io_uring
>>>> iodepth=128
>>>> numjobs=1
>>>> thread
>>>> rw=randread
>>>> direct=1
>>>> registerfiles=1
>>>> hipri=1
>>>> runtime=10
>>>> time_based
>>>> group_reporting
>>>> randrepeat=0
>>>> filename=/dev/mapper/testdev
>>>> bs=4k
>>>>
>>>> [job-1]
>>>> cpus_allowed=14
>>>> ```
>>>>
>>>> IOPS (IRQ mode) | IOPS (iopoll mode (hipri=1))
>>>> --------------- | --------------------
>>>>             213k |           19k
>>>>
>>>> At least, it doesn't work well with io_uring interface.
>>>>
>>>>
>>>
>>>
>>> Jeffle,
>>>
>>> I ran your above fio test on a linear LV split across 3 NVMes to
>>> second your split mapping
>>> (system: 32 core Intel, 256GiB RAM) comparing io engines sync, libaio
>>> and io_uring,
>>> the latter w/ and w/o hipri (sync+libaio obviously w/o registerfiles
>>> and hipri) which resulted ok:
>>>
>>>
>>>
>>> sync  |  libaio  |  IRQ mode (hipri=0) | iopoll (hipri=1)
>>> ------|----------|---------------------|----------------- 56.3K |   
>>> 290K  |                329K |             351K I can't second your
>>> drastic hipri=1 drop here...
>>
>>
>> Sorry, email mess.
>>
>>
>> sync   |  libaio  |  IRQ mode (hipri=0) | iopoll (hipri=1)
>> -------|----------|---------------------|-----------------
>> 56.3K  |    290K  |                329K |             351K
>>
>>
>>
>> I can't second your drastic hipri=1 drop here...
>>
> 
> Hummm, that's indeed somewhat strange...
> 
> My test environment:
> - CPU: 128 cores, though only one CPU core is used since
> 'cpus_allowed=14' in fio configuration
> - memory: 983G memory free
> - NVMe: Huawai ES3510P (HWE52P434T0L005N), with 'nvme.poll_queues=3'
> 
> Maybe you didn't specify 'nvme.poll_queues=XXX'? In this case, IO still
> goes into IRQ mode, even you have specified 'hipri=1'?

That would be my guess too, and the patches also have a very suspicious
clear of HIPRI which shouldn't be there (which would let that fly through).

-- 
Jens Axboe