Re: [PATCH V3 00/16] io_uring/ublk: add IORING_OP_FUSED_CMD

Ziyang Zhang <ZiyangZhang@xxxxxxxxxxxxxxxxx> · Tue, 21 Mar 2023 17:17:56 +0800

On 2023/3/19 00:23, Pavel Begunkov wrote:
> On 3/16/23 03:13, Xiaoguang Wang wrote:
>>> Add IORING_OP_FUSED_CMD, it is one special URING_CMD, which has to
>>> be SQE128. The 1st SQE(master) is one 64byte URING_CMD, and the 2nd
>>> 64byte SQE(slave) is another normal 64byte OP. For any OP which needs
>>> to support slave OP, io_issue_defs[op].fused_slave needs to be set as 1,
>>> and its ->issue() can retrieve/import buffer from master request's
>>> fused_cmd_kbuf. The slave OP is actually submitted from kernel, part of
>>> this idea is from Xiaoguang's ublk ebpf patchset, but this patchset
>>> submits slave OP just like normal OP issued from userspace, that said,
>>> SQE order is kept, and batching handling is done too.
>> Thanks for this great work, seems that we're now in the right direction
>> to support ublk zero copy, I believe this feature will improve io throughput
>> greatly and reduce ublk's cpu resource usage.
>>
>> I have gone through your 2th patch, and have some little concerns here:
>> Say we have one ublk loop target device, but it has 4 backend files,
>> every file will carry 25% of device capacity and it's implemented in stripped
>> way, then for every io request, current implementation will need issed 4
>> fused_cmd, right? 4 slave sqes are necessary, but it would be better to
>> have just one master sqe, so I wonder whether we can have another
>> method. The key point is to let io_uring support register various kernel
>> memory objects, which come from kernel, such as ITER_BVEC or
>> ITER_KVEC. so how about below actions:
>> 1. add a new infrastructure in io_uring, which will support to register
>> various kernel memory objects in it, this new infrastructure could be
>> maintained in a xarray structure, every memory objects in it will have
>> a unique id. This registration could be done in a ublk uring cmd, io_uring
>> offers registration interface.
>> 2. then any sqe can use these memory objects freely, so long as it
>> passes above unique id in sqe properly.
>> Above are just rough ideas, just for your reference.
> 
> It precisely hints on what I proposed a bit earlier, that makes
> me not alone thinking that it's a good idea to have a design allowing
> 1) multiple ops using a buffer and 2) to limiting it to one single
> submission because the userspace might want to preprocess a part
> of the data, multiplex it or on the opposite divide. I was mostly
> coming from non ublk cases, and one example would be such zc recv,
> parsing the app level headers and redirecting the rest of the data
> somewhere.
> 
> I haven't got a chance to work on it but will return to it in
> a week. The discussion was here:
> 
> https://lore.kernel.org/all/ce96f7e7-1315-7154-f540-1a3ff0215674@xxxxxxxxx/
> 

Hi Pavel and all,

I think it is a good idea to register some kernel objects(such as bvec)
in io_uring and return a cookie(such as buf_idx) for READ/WRITE/SEND/RECV sqes.
There are some ways to register user's buffer such as IORING_OP_PROVIDE_BUFFERS
and IORING_REGISTER_PBUF_RING but there is not a way to register kernel buffer(bvec).

I do not think reusing splice is a good idea because splice should run in io-wq.
If we have a big sq depth there may be lots of io-wqs. Then lots of context switch
may lower the IO performance especially for small IO size.

Here are some rough ideas:
(1) design a new OPCODE such as IORING_REGISTER_KOBJ to register kernel objects in
    io_uring or
(2) reuse uring-cmd. We can send uring-cmd to drivers(opcode may be CMD_REGISTER_KBUF)
    and let drivers call io_uring_provide_kbuf() to register kbuf. io_uring_provide_kbuf()
    is a new function provided by io_uring for drivers.
(3) let the driver call io_uring_provide_kbuf() directly. For ublk, this function is called
    before io_uring_cmd_done().

Regards,
Zhang