On 3/7/21 8:54 PM, JeffleXu wrote: > > > On 3/6/21 1:56 AM, Heinz Mauelshagen wrote: >> >> On 3/5/21 6:46 PM, Heinz Mauelshagen wrote: >>> On 3/5/21 10:52 AM, JeffleXu wrote: >>>> >>>> On 3/3/21 6:09 PM, Mikulas Patocka wrote: >>>>> >>>>> On Wed, 3 Mar 2021, JeffleXu wrote: >>>>> >>>>>> >>>>>> On 3/3/21 3:05 AM, Mikulas Patocka wrote: >>>>>> >>>>>>> Support I/O polling if submit_bio_noacct_mq_direct returned non-empty >>>>>>> cookie. >>>>>>> >>>>>>> Signed-off-by: Mikulas Patocka <mpatocka@xxxxxxxxxx> >>>>>>> >>>>>>> --- >>>>>>> drivers/md/dm.c | 5 +++++ >>>>>>> 1 file changed, 5 insertions(+) >>>>>>> >>>>>>> Index: linux-2.6/drivers/md/dm.c >>>>>>> =================================================================== >>>>>>> --- linux-2.6.orig/drivers/md/dm.c 2021-03-02 >>>>>>> 19:26:34.000000000 +0100 >>>>>>> +++ linux-2.6/drivers/md/dm.c 2021-03-02 19:26:34.000000000 +0100 >>>>>>> @@ -1682,6 +1682,11 @@ static void __split_and_process_bio(stru >>>>>>> } >>>>>>> } >>>>>>> + if (ci.poll_cookie != BLK_QC_T_NONE) { >>>>>>> + while (atomic_read(&ci.io->io_count) > 1 && >>>>>>> + blk_poll(ci.poll_queue, ci.poll_cookie, true)) ; >>>>>>> + } >>>>>>> + >>>>>>> /* drop the extra reference count */ >>>>>>> dec_pending(ci.io, errno_to_blk_status(error)); >>>>>>> } >>>>>> It seems that the general idea of your design is to >>>>>> 1) submit *one* split bio >>>>>> 2) blk_poll(), waiting the previously submitted split bio complets >>>>> No, I submit all the bios and poll for the last one. >>>>> >>>>>> and then submit next split bio, repeating the above process. I'm >>>>>> afraid >>>>>> the performance may be an issue here, since the batch every time >>>>>> blk_poll() reaps may decrease. >>>>> Could you benchmark it? >>>> I only tested dm-linear. >>>> >>>> The configuration (dm table) of dm-linear is: >>>> 0 1048576 linear /dev/nvme0n1 0 >>>> 1048576 1048576 linear /dev/nvme2n1 0 >>>> 2097152 1048576 linear /dev/nvme5n1 0 >>>> >>>> >>>> fio script used is: >>>> ``` >>>> $cat fio.conf >>>> [global] >>>> name=iouring-sqpoll-iopoll-1 >>>> ioengine=io_uring >>>> iodepth=128 >>>> numjobs=1 >>>> thread >>>> rw=randread >>>> direct=1 >>>> registerfiles=1 >>>> hipri=1 >>>> runtime=10 >>>> time_based >>>> group_reporting >>>> randrepeat=0 >>>> filename=/dev/mapper/testdev >>>> bs=4k >>>> >>>> [job-1] >>>> cpus_allowed=14 >>>> ``` >>>> >>>> IOPS (IRQ mode) | IOPS (iopoll mode (hipri=1)) >>>> --------------- | -------------------- >>>> 213k | 19k >>>> >>>> At least, it doesn't work well with io_uring interface. >>>> >>>> >>> >>> >>> Jeffle, >>> >>> I ran your above fio test on a linear LV split across 3 NVMes to >>> second your split mapping >>> (system: 32 core Intel, 256GiB RAM) comparing io engines sync, libaio >>> and io_uring, >>> the latter w/ and w/o hipri (sync+libaio obviously w/o registerfiles >>> and hipri) which resulted ok: >>> >>> >>> >>> sync | libaio | IRQ mode (hipri=0) | iopoll (hipri=1) >>> ------|----------|---------------------|----------------- 56.3K | >>> 290K | 329K | 351K I can't second your >>> drastic hipri=1 drop here... >> >> >> Sorry, email mess. >> >> >> sync | libaio | IRQ mode (hipri=0) | iopoll (hipri=1) >> -------|----------|---------------------|----------------- >> 56.3K | 290K | 329K | 351K >> >> >> >> I can't second your drastic hipri=1 drop here... >> > > Hummm, that's indeed somewhat strange... > > My test environment: > - CPU: 128 cores, though only one CPU core is used since > 'cpus_allowed=14' in fio configuration > - memory: 983G memory free > - NVMe: Huawai ES3510P (HWE52P434T0L005N), with 'nvme.poll_queues=3' > > Maybe you didn't specify 'nvme.poll_queues=XXX'? In this case, IO still > goes into IRQ mode, even you have specified 'hipri=1'? That would be my guess too, and the patches also have a very suspicious clear of HIPRI which shouldn't be there (which would let that fly through). -- Jens Axboe