On top of perf-wip branch, ./io_uring -d32 -s32 -c32 -b512 -p1 /dev/nullb0 ~3.3 MIOPS vs 3.5 MIOPS, so gives around extra ~4-5%. The main part is caching struct block_device + some inlining. Pavel Begunkov (6): block: cache bdev in struct file for raw bdev IO block: inline BDEV_I and friends blk-mq: optimise *end_request non-stat path block: inline hot paths of blk_account_io_*() blk-mq: inline hot part of __blk_mq_sched_restart block: convert ->bd_inode to container_of() block/bdev.c | 16 ---------------- block/blk-core.c | 30 +++++++++--------------------- block/blk-mq-sched.c | 4 +--- block/blk-mq-sched.h | 8 +++++++- block/blk-mq.c | 9 ++++----- block/blk.h | 24 +++++++++++++++++++++--- block/fops.c | 40 ++++++++++++++++++++++------------------ include/linux/blkdev.h | 31 +++++++++++++++++++++++++------ 8 files changed, 89 insertions(+), 73 deletions(-) -- 2.33.0