On Fri, Mar 05 2021 at 12:56pm -0500, Heinz Mauelshagen <heinzm@xxxxxxxxxx> wrote: > > On 3/5/21 6:46 PM, Heinz Mauelshagen wrote: > >On 3/5/21 10:52 AM, JeffleXu wrote: > >> > >>On 3/3/21 6:09 PM, Mikulas Patocka wrote: > >>> > >>>On Wed, 3 Mar 2021, JeffleXu wrote: > >>> > >>>> > >>>>On 3/3/21 3:05 AM, Mikulas Patocka wrote: > >>>> > >>>>>Support I/O polling if submit_bio_noacct_mq_direct returned non-empty > >>>>>cookie. > >>>>> > >>>>>Signed-off-by: Mikulas Patocka <mpatocka@xxxxxxxxxx> > >>>>> > >>>>>--- > >>>>> drivers/md/dm.c | 5 +++++ > >>>>> 1 file changed, 5 insertions(+) > >>>>> > >>>>>Index: linux-2.6/drivers/md/dm.c > >>>>>=================================================================== > >>>>>--- linux-2.6.orig/drivers/md/dm.c 2021-03-02 > >>>>>19:26:34.000000000 +0100 > >>>>>+++ linux-2.6/drivers/md/dm.c 2021-03-02 19:26:34.000000000 +0100 > >>>>>@@ -1682,6 +1682,11 @@ static void __split_and_process_bio(stru > >>>>> } > >>>>> } > >>>>> + if (ci.poll_cookie != BLK_QC_T_NONE) { > >>>>>+ while (atomic_read(&ci.io->io_count) > 1 && > >>>>>+ blk_poll(ci.poll_queue, ci.poll_cookie, true)) ; > >>>>>+ } > >>>>>+ > >>>>> /* drop the extra reference count */ > >>>>> dec_pending(ci.io, errno_to_blk_status(error)); > >>>>> } > >>>>It seems that the general idea of your design is to > >>>>1) submit *one* split bio > >>>>2) blk_poll(), waiting the previously submitted split bio complets > >>>No, I submit all the bios and poll for the last one. > >>> > >>>>and then submit next split bio, repeating the above process. > >>>>I'm afraid > >>>>the performance may be an issue here, since the batch every time > >>>>blk_poll() reaps may decrease. > >>>Could you benchmark it? > >>I only tested dm-linear. > >> > >>The configuration (dm table) of dm-linear is: > >>0 1048576 linear /dev/nvme0n1 0 > >>1048576 1048576 linear /dev/nvme2n1 0 > >>2097152 1048576 linear /dev/nvme5n1 0 > >> > >> > >>fio script used is: > >>``` > >>$cat fio.conf > >>[global] > >>name=iouring-sqpoll-iopoll-1 > >>ioengine=io_uring > >>iodepth=128 > >>numjobs=1 > >>thread > >>rw=randread > >>direct=1 > >>registerfiles=1 > >>hipri=1 > >>runtime=10 > >>time_based > >>group_reporting > >>randrepeat=0 > >>filename=/dev/mapper/testdev > >>bs=4k > >> > >>[job-1] > >>cpus_allowed=14 > >>``` > >> > >>IOPS (IRQ mode) | IOPS (iopoll mode (hipri=1)) > >>--------------- | -------------------- > >> 213k | 19k > >> > >>At least, it doesn't work well with io_uring interface. > >> > >> > > > > > >Jeffle, > > > >I ran your above fio test on a linear LV split across 3 NVMes to > >second your split mapping > >(system: 32 core Intel, 256GiB RAM) comparing io engines sync, > >libaio and io_uring, > >the latter w/ and w/o hipri (sync+libaio obviously w/o > >registerfiles and hipri) which resulted ok: > > > > > > > >sync | libaio | IRQ mode (hipri=0) | iopoll (hipri=1) > >------|----------|---------------------|----------------- 56.3K > >| 290K | 329K | 351K I can't second > >your drastic hipri=1 drop here... > > > Sorry, email mess. > > > sync | libaio | IRQ mode (hipri=0) | iopoll (hipri=1) > -------|----------|---------------------|----------------- > 56.3K | 290K | 329K | 351K > > > > I can't second your drastic hipri=1 drop here... I think your result is just showcasing your powerful system's ability to poll every related HW queue.. whereas Jeffle's system is likely somehow more constrained (on a cpu level, memory, whatever). My basis for this is that Mikulas' changes simply always return an invalid cookie (BLK_QC_T_NONE) for purposes of intelligent IO polling. Such an implementation is completely invalid. I discussed with Jens and he said: "it needs to return something that f_op->iopoll() can make sense of. otherwise you have no option but to try everything." Mike