On Tue, Mar 11, 2025 at 09:33:01AM +0800, Ming Lei wrote: > On Mon, Mar 10, 2025 at 12:14:44PM +0100, Christoph Hellwig wrote: > > On Sun, Mar 09, 2025 at 12:23:08AM +0800, Ming Lei wrote: > > > Try to handle loop aio command via NOWAIT IO first, then we can avoid to > > > queue the aio command into workqueue. > > > > > > Fallback to workqueue in case of -EAGAIN. > > > > > > BLK_MQ_F_BLOCKING has to be set for calling into .read_iter() or > > > .write_iter() which might sleep even though it is NOWAIT. > > > > This needs performance numbers (or other reasons) justifying the > > change, especially as BLK_MQ_F_BLOCKING is a bit of an overhead. > > The difference is just rcu_read_lock() vs. srcu_read_lock(), and not Not, it also includes offloading to kblockd in more cases. > > > see any difference in typical fio workload on loop device, and the gain > is pretty obvious, bandwidth is increased by > 4X in aio workloads: > > https://lore.kernel.org/linux-block/f7c9d956-2b9b-8bb4-aa49-d57323fc8eb0@xxxxxxxxxx/T/#md3a6154218cb6619d8af5432cf2dd3a4a7a3dcc6 Please document that in the commit log. > > > + if (cmd->ret == -EAGAIN) { > > > + struct loop_device *lo = rq->q->queuedata; > > > + > > > + loop_queue_work(lo, cmd); > > > + return; > > > + } > > > > This looks like the wrong place for the rety, as -EAGAIN can only come from > > the submissions path. i.e. we should never make it to the full completion > > path for that case. > > That is not true, at least for XFS: Your trace sees lo_rw_aio_complete called from the block layer splitting called from loop, I see nothing about XFS there. But yes, this shows the issue discussed last week in the iomap IOCB_NOWAIT thread.