Polled IO is always reaped in the context of the process itself, so it does not need to be punted to a workqueue for the completion. This is different than IRQ driven IO, where iomap_dio_bio_end_io() will be invoked from hard/soft IRQ context. For those cases we currently need to punt to a workqueue for further processing. For the polled case, since it's the task itself reaping completions, we're already in task context. That makes it identical to the sync completion case. Testing a basic QD 1..8 dio random write with polled IO with the following fio job: fio --name=polled-dio-write --filename=/data1/file --time_based=1 \ --runtime=10 --bs=4096 --rw=randwrite --norandommap --buffered=0 \ --cpus_allowed=4 --ioengine=io_uring --iodepth=$depth --hipri=1 yields: Stock Patched Diff ======================================= QD1 180K 201K +11% QD2 356K 394K +10% QD4 608K 650K +7% QD8 827K 831K +0.5% which shows a nice win, particularly for lower queue depth writes. This is expected, as higher queue depths will be busy polling completions while the offloaded workqueue completions can happen in parallel. Signed-off-by: Jens Axboe <axboe@xxxxxxxxx> --- fs/iomap/direct-io.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index ea3b868c8355..343bde5d50d3 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -161,15 +161,16 @@ void iomap_dio_bio_end_io(struct bio *bio) struct task_struct *waiter = dio->submit.waiter; WRITE_ONCE(dio->submit.waiter, NULL); blk_wake_io_task(waiter); - } else if (dio->flags & IOMAP_DIO_WRITE) { + } else if ((bio->bi_opf & REQ_POLLED) || + !(dio->flags & IOMAP_DIO_WRITE)) { + WRITE_ONCE(dio->iocb->private, NULL); + iomap_dio_complete_work(&dio->aio.work); + } else { struct inode *inode = file_inode(dio->iocb->ki_filp); WRITE_ONCE(dio->iocb->private, NULL); INIT_WORK(&dio->aio.work, iomap_dio_complete_work); queue_work(inode->i_sb->s_dio_done_wq, &dio->aio.work); - } else { - WRITE_ONCE(dio->iocb->private, NULL); - iomap_dio_complete_work(&dio->aio.work); } } -- 2.40.1