Hello Jens, This patchset improves loop aio perf by using IOCB_NOWAIT for avoiding to queue aio command to workqueue context, meantime refactor lo_rw_aio() a bit. In my test VM, loop disk perf becomes very close to perf of the backing block device(nvme/mq virtio-scsi). And Mikulas verified that this way can improve 12jobs sequential rw io by ~5X, and basically solve the reported problem together with loop MQ change. https://lore.kernel.org/linux-block/a8e5c76a-231f-07d1-a394-847de930f638@xxxxxxxxxx/ The loop MQ change will be posted as standalone patch, because it needs losetup change. Thanks, Ming V2: - patch style fix & cleanup (Christoph) - fix randwrite perf regression on sparse backing file - drop MQ change Ming Lei (5): loop: simplify do_req_filebacked() loop: cleanup lo_rw_aio() loop: move command blkcg/memcg initialization into loop_queue_work loop: try to handle loop aio command via NOWAIT IO first loop: add hint for handling aio via IOCB_NOWAIT drivers/block/loop.c | 232 ++++++++++++++++++++++++++++++++++--------- 1 file changed, 186 insertions(+), 46 deletions(-) -- 2.47.0