Re: [PATCH] the dm-loop target

Ming Lei <ming.lei@xxxxxxxxxx> · Thu, 13 Mar 2025 09:36:22 +0800

On Wed, Mar 12, 2025 at 04:27:12PM +0800, Ming Lei wrote:
> On Wed, Mar 12, 2025 at 01:34:02PM +1100, Dave Chinner wrote:

...

> 
> block layer/storage has many optimization for batching handling, if IOs
> are submitted from many contexts:
> 
> - this batch handling optimization is gone
> 
> - IO is re-ordered from underlying hardware viewpoint
> 
> - more contention from FS write lock, because loop has single back file.
> 
> That is why the single task context is taken from the beginning of loop aio,
> and it performs pretty well for sequential IO workloads, as I shown
> in the zloop example.
> 
> > 
> > > It isn't perfect, sometime it may be slower than running on io-wq
> > > directly.
> > > 
> > > But is there any better way for covering everything?
> > 
> > Yes - fix the loop queue workers.
> 
> What you suggested is threaded aio by submitting IO concurrently from
> different task context, this way is not the most efficient one, otherwise
> modern language won't invent async/.await.
> 
> In my test VM, by running Mikulas's fio script on loop/nvme by the attached
> threaded_aio patch:
> 
> NOWAIT with MQ 4		:   70K iops(read), 70K iops(write), cpu util: 40%
> threaded_aio with MQ 4	:	64k iops(read), 64K iops(write), cpu util: 52% 
> in tree loop(SQ)		:   58K	iops(read), 58K iops(write)	
> 
> Mikulas, please feel free to run your tests with threaded_aio:
> 
> 	modprobe loop nr_hw_queues=4 threaded_aio=1
> 
> by applying the attached the patch over the loop patchset.
> 
> The performance gap could be more obvious in fast hardware.

For the normal single job sequential WRITE workload, on same test VM, still
loop over /dev/nvme0n1, and running fio over loop directly:

fio --direct=1 --bs=4k --runtime=40 --time_based --numjobs=1 --ioengine=libaio \
	--iodepth=16 --group_reporting=1 --filename=/dev/loop0 -name=job --rw=write

threaded_aio(SQ)	:	81k  iops(write), cpu util: 20% 
in tree loop(SQ)	:   100K iops(write), cpu util: 7%	

Thanks,
Ming