On 3/6/21 1:56 AM, Heinz Mauelshagen wrote: > > On 3/5/21 6:46 PM, Heinz Mauelshagen wrote: >> On 3/5/21 10:52 AM, JeffleXu wrote: >>> >>> On 3/3/21 6:09 PM, Mikulas Patocka wrote: >>>> >>>> On Wed, 3 Mar 2021, JeffleXu wrote: >>>> >>>>> >>>>> On 3/3/21 3:05 AM, Mikulas Patocka wrote: >>>>> >>>>>> Support I/O polling if submit_bio_noacct_mq_direct returned non-empty >>>>>> cookie. >>>>>> >>>>>> Signed-off-by: Mikulas Patocka <mpatocka@xxxxxxxxxx> >>>>>> >>>>>> --- >>>>>> drivers/md/dm.c | 5 +++++ >>>>>> 1 file changed, 5 insertions(+) >>>>>> >>>>>> Index: linux-2.6/drivers/md/dm.c >>>>>> =================================================================== >>>>>> --- linux-2.6.orig/drivers/md/dm.c 2021-03-02 >>>>>> 19:26:34.000000000 +0100 >>>>>> +++ linux-2.6/drivers/md/dm.c 2021-03-02 19:26:34.000000000 +0100 >>>>>> @@ -1682,6 +1682,11 @@ static void __split_and_process_bio(stru >>>>>> } >>>>>> } >>>>>> + if (ci.poll_cookie != BLK_QC_T_NONE) { >>>>>> + while (atomic_read(&ci.io->io_count) > 1 && >>>>>> + blk_poll(ci.poll_queue, ci.poll_cookie, true)) ; >>>>>> + } >>>>>> + >>>>>> /* drop the extra reference count */ >>>>>> dec_pending(ci.io, errno_to_blk_status(error)); >>>>>> } >>>>> It seems that the general idea of your design is to >>>>> 1) submit *one* split bio >>>>> 2) blk_poll(), waiting the previously submitted split bio complets >>>> No, I submit all the bios and poll for the last one. >>>> >>>>> and then submit next split bio, repeating the above process. I'm >>>>> afraid >>>>> the performance may be an issue here, since the batch every time >>>>> blk_poll() reaps may decrease. >>>> Could you benchmark it? >>> I only tested dm-linear. >>> >>> The configuration (dm table) of dm-linear is: >>> 0 1048576 linear /dev/nvme0n1 0 >>> 1048576 1048576 linear /dev/nvme2n1 0 >>> 2097152 1048576 linear /dev/nvme5n1 0 >>> >>> >>> fio script used is: >>> ``` >>> $cat fio.conf >>> [global] >>> name=iouring-sqpoll-iopoll-1 >>> ioengine=io_uring >>> iodepth=128 >>> numjobs=1 >>> thread >>> rw=randread >>> direct=1 >>> registerfiles=1 >>> hipri=1 >>> runtime=10 >>> time_based >>> group_reporting >>> randrepeat=0 >>> filename=/dev/mapper/testdev >>> bs=4k >>> >>> [job-1] >>> cpus_allowed=14 >>> ``` >>> >>> IOPS (IRQ mode) | IOPS (iopoll mode (hipri=1)) >>> --------------- | -------------------- >>> 213k | 19k >>> >>> At least, it doesn't work well with io_uring interface. >>> >>> >> >> >> Jeffle, >> >> I ran your above fio test on a linear LV split across 3 NVMes to >> second your split mapping >> (system: 32 core Intel, 256GiB RAM) comparing io engines sync, libaio >> and io_uring, >> the latter w/ and w/o hipri (sync+libaio obviously w/o registerfiles >> and hipri) which resulted ok: >> >> >> >> sync | libaio | IRQ mode (hipri=0) | iopoll (hipri=1) >> ------|----------|---------------------|----------------- 56.3K | >> 290K | 329K | 351K I can't second your >> drastic hipri=1 drop here... > > > Sorry, email mess. > > > sync | libaio | IRQ mode (hipri=0) | iopoll (hipri=1) > -------|----------|---------------------|----------------- > 56.3K | 290K | 329K | 351K > > > > I can't second your drastic hipri=1 drop here... > Hummm, that's indeed somewhat strange... My test environment: - CPU: 128 cores, though only one CPU core is used since 'cpus_allowed=14' in fio configuration - memory: 983G memory free - NVMe: Huawai ES3510P (HWE52P434T0L005N), with 'nvme.poll_queues=3' Maybe you didn't specify 'nvme.poll_queues=XXX'? In this case, IO still goes into IRQ mode, even you have specified 'hipri=1'? -- Thanks, Jeffle