On Wed, 12 Mar 2025, Ming Lei wrote: > > > It isn't perfect, sometime it may be slower than running on io-wq > > > directly. > > > > > > But is there any better way for covering everything? > > > > Yes - fix the loop queue workers. > > What you suggested is threaded aio by submitting IO concurrently from > different task context, this way is not the most efficient one, otherwise > modern language won't invent async/.await. > > In my test VM, by running Mikulas's fio script on loop/nvme by the attached > threaded_aio patch: > > NOWAIT with MQ 4 : 70K iops(read), 70K iops(write), cpu util: 40% > threaded_aio with MQ 4 : 64k iops(read), 64K iops(write), cpu util: 52% > in tree loop(SQ) : 58K iops(read), 58K iops(write) > > Mikulas, please feel free to run your tests with threaded_aio: > > modprobe loop nr_hw_queues=4 threaded_aio=1 > > by applying the attached the patch over the loop patchset. > > The performance gap could be more obvious in fast hardware. With "threaded_aio=1": Sync io fio --direct=1 --bs=4k --runtime=10 --time_based --numjobs=12 --ioengine=psync --iodepth=1 --group_reporting=1 --filename=/mnt/test2/l -name=job --rw=rw xfs/loop/xfs READ: bw=300MiB/s (315MB/s), 300MiB/s-300MiB/s (315MB/s-315MB/s), io=3001MiB (3147MB), run=10001-10001msec WRITE: bw=300MiB/s (315MB/s), 300MiB/s-300MiB/s (315MB/s-315MB/s), io=3004MiB (3149MB), run=10001-10001msec Async io fio --direct=1 --bs=4k --runtime=10 --time_based --numjobs=12 --ioengine=libaio --iodepth=16 --group_reporting=1 --filename=/mnt/test2/l -name=job --rw=rw xfs/loop/xfs READ: bw=869MiB/s (911MB/s), 869MiB/s-869MiB/s (911MB/s-911MB/s), io=8694MiB (9116MB), run=10002-10002msec WRITE: bw=870MiB/s (913MB/s), 870MiB/s-870MiB/s (913MB/s-913MB/s), io=8706MiB (9129MB), run=10002-10002msec Without "threaded_aio=1": Sync io fio --direct=1 --bs=4k --runtime=10 --time_based --numjobs=12 --ioengine=psync --iodepth=1 --group_reporting=1 --filename=/mnt/test2/l -name=job --rw=rw xfs/loop/xfs READ: bw=348MiB/s (365MB/s), 348MiB/s-348MiB/s (365MB/s-365MB/s), io=3481MiB (3650MB), run=10001-10001msec WRITE: bw=348MiB/s (365MB/s), 348MiB/s-348MiB/s (365MB/s-365MB/s), io=3484MiB (3653MB), run=10001-10001msec Async io fio --direct=1 --bs=4k --runtime=10 --time_based --numjobs=12 --ioengine=libaio --iodepth=16 --group_reporting=1 --filename=/mnt/test2/l -name=job --rw=rw xfs/loop/xfs READ: bw=1186MiB/s (1244MB/s), 1186MiB/s-1186MiB/s (1244MB/s-1244MB/s), io=11.6GiB (12.4GB), run=10001-10001msec WRITE: bw=1187MiB/s (1245MB/s), 1187MiB/s-1187MiB/s (1245MB/s-1245MB/s), io=11.6GiB (12.5GB), run=10001-10001msec Mikulas