Re: [PATCH V3 00/16] io_uring/ublk: add IORING_OP_FUSED_CMD

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Mar 27, 2023 at 05:04:01PM +0100, Pavel Begunkov wrote:
> On 3/21/23 09:17, Ziyang Zhang wrote:
> > On 2023/3/19 00:23, Pavel Begunkov wrote:
> > > On 3/16/23 03:13, Xiaoguang Wang wrote:
> > > > > Add IORING_OP_FUSED_CMD, it is one special URING_CMD, which has to
> > > > > be SQE128. The 1st SQE(master) is one 64byte URING_CMD, and the 2nd
> > > > > 64byte SQE(slave) is another normal 64byte OP. For any OP which needs
> > > > > to support slave OP, io_issue_defs[op].fused_slave needs to be set as 1,
> > > > > and its ->issue() can retrieve/import buffer from master request's
> > > > > fused_cmd_kbuf. The slave OP is actually submitted from kernel, part of
> > > > > this idea is from Xiaoguang's ublk ebpf patchset, but this patchset
> > > > > submits slave OP just like normal OP issued from userspace, that said,
> > > > > SQE order is kept, and batching handling is done too.
> > > > Thanks for this great work, seems that we're now in the right direction
> > > > to support ublk zero copy, I believe this feature will improve io throughput
> > > > greatly and reduce ublk's cpu resource usage.
> > > > 
> > > > I have gone through your 2th patch, and have some little concerns here:
> > > > Say we have one ublk loop target device, but it has 4 backend files,
> > > > every file will carry 25% of device capacity and it's implemented in stripped
> > > > way, then for every io request, current implementation will need issed 4
> > > > fused_cmd, right? 4 slave sqes are necessary, but it would be better to
> > > > have just one master sqe, so I wonder whether we can have another
> > > > method. The key point is to let io_uring support register various kernel
> > > > memory objects, which come from kernel, such as ITER_BVEC or
> > > > ITER_KVEC. so how about below actions:
> > > > 1. add a new infrastructure in io_uring, which will support to register
> > > > various kernel memory objects in it, this new infrastructure could be
> > > > maintained in a xarray structure, every memory objects in it will have
> > > > a unique id. This registration could be done in a ublk uring cmd, io_uring
> > > > offers registration interface.
> > > > 2. then any sqe can use these memory objects freely, so long as it
> > > > passes above unique id in sqe properly.
> > > > Above are just rough ideas, just for your reference.
> > > 
> > > It precisely hints on what I proposed a bit earlier, that makes
> > > me not alone thinking that it's a good idea to have a design allowing
> > > 1) multiple ops using a buffer and 2) to limiting it to one single
> > > submission because the userspace might want to preprocess a part
> > > of the data, multiplex it or on the opposite divide. I was mostly
> > > coming from non ublk cases, and one example would be such zc recv,
> > > parsing the app level headers and redirecting the rest of the data
> > > somewhere.
> > > 
> > > I haven't got a chance to work on it but will return to it in
> > > a week. The discussion was here:
> > > 
> > > https://lore.kernel.org/all/ce96f7e7-1315-7154-f540-1a3ff0215674@xxxxxxxxx/
> > > 
> > 
> > Hi Pavel and all,
> > 
> > I think it is a good idea to register some kernel objects(such as bvec)
> > in io_uring and return a cookie(such as buf_idx) for READ/WRITE/SEND/RECV sqes.
> > There are some ways to register user's buffer such as IORING_OP_PROVIDE_BUFFERS
> > and IORING_REGISTER_PBUF_RING but there is not a way to register kernel buffer(bvec).
> > 
> > I do not think reusing splice is a good idea because splice should run in io-wq.
> 
> The reason why I disabled inline splice execution is because do_splice()
> and below the stack doesn't support nowait well enough, which is not a
> problem when we hook directly under the ->splice_read() callback and
> operate only with one file at a time at the io_uring level.

I believe I have explained several times[1][2] it isn't good solution for ublk
zero copy.

But if you insist on reusing splice for this feature, please share your code and
I'm happy to give an review.

[1] https://lore.kernel.org/linux-block/ZB8B8cr1%2FqIcPdRM@xxxxxxxxxxxxxxxxxxxxxxxxx/T/#m1bfa358524b6af94731bcd5be28056f9f4408ecf
[2] https://github.com/ming1/linux/blob/my_v6.3-io_uring_fuse_cmd_v4/Documentation/block/ublk.rst#zero-copy

Thanks,
Ming




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux