Re: [PATCH V3 00/16] io_uring/ublk: add IORING_OP_FUSED_CMD

Jens Axboe <axboe@xxxxxxxxx> · Sat, 18 Mar 2023 10:09:52 -0600

On 3/14/23 6:57?AM, Ming Lei wrote:
> Basically userspace can specify any sub-buffer of the ublk block request
> buffer from the fused command just by setting 'offset/len'
> in the slave SQE for running slave OP. This way is flexible to implement
> io mapping: mirror, stripped, ...
> 
> The 3th & 4th patches enable fused slave support for the following OPs:
> 
> 	OP_READ/OP_WRITE
> 	OP_SEND/OP_RECV/OP_SEND_ZC
> 
> The other ublk patches cleans ublk driver and implement fused command
> for supporting zero copy.
> 
> Follows userspace code:
> 
> https://github.com/ming1/ubdsrv/tree/fused-cmd-zc-v2

Ran some quick testing here with qcow2. This is just done on my laptop
in kvm, so take them with a grain of salt, results may be better
elsewhere.

Basline:

64k reads       98-100K IOPS    6-6.1GB/sec     (ublk 100%, io_uring 9%)
4k reads        670-680K IOPS   2.6GB/sec       (ublk 65%, io_uring 44%)

and with zerocopy enabled:

64k reads       184K IOPS       11.5GB/sec      (ublk 91%, io_uring 12%)
4k reads        730K IOPS       2.8GB/sec       (ublk 73%, io_uring 48%)

and with zerocopy and using SINGLE_ISSUER|COOP_TASKRUN for the ring:

64k reads       205K IOPS       12.8GB/sec      (ublk 91%, io_uring 12%)
4k reads        730K IOPS       2.8GB/sec       (ublk 66%, io_uring 42%)

Don't put too much into the CPU utilization numbers, they are just
indicative and not super accurate. But overall a nice win for larger
block sizes with zero copy. We seem to be IOPS limited on this
particular setup, which is most likely why 4k isn't showing any major
wins here. Eg running 8k with zero copy, I get the same IOPS limit, just
obviously doubling the bandwidth of the 4k run:

IOPS=732.26K, BW=5.72GiB/s, IOS/call=32/32
IOPS=733.38K, BW=5.73GiB/s, IOS/call=32/32

I also tried using DEFER_TASKRUN, but it stalls on setup. Most likely
something trivial, didn't poke any further at that.

-- 
Jens Axboe