Re: [PATCH V6 7/8] io_uring/uring_cmd: support provide group kernel buffer

Ming Lei <ming.lei@xxxxxxxxxx> · Fri, 11 Oct 2024 11:07:32 +0800

On Thu, Oct 10, 2024 at 08:39:12PM -0600, Jens Axboe wrote:
> On 10/10/24 8:30 PM, Ming Lei wrote:
> > Hi Jens,
> > 
> > On Thu, Oct 10, 2024 at 01:31:21PM -0600, Jens Axboe wrote:
> >> Hi,
> >>
> >> Discussed this with Pavel, and on his suggestion, I tried prototyping a
> >> "buffer update" opcode. Basically it works like
> >> IORING_REGISTER_BUFFERS_UPDATE in that it can update an existing buffer
> >> registration. But it works as an sqe rather than being a sync opcode.
> >>
> >> The idea here is that you could do that upfront, or as part of a chain,
> >> and have it be generically available, just like any other buffer that
> >> was registered upfront. You do need an empty table registered first,
> >> which can just be sparse. And since you can pick the slot it goes into,
> >> you can rely on that slot afterwards (either as a link, or just the
> >> following sqe).
> >>
> >> Quick'n dirty obviously, but I did write a quick test case too to verify
> >> that:
> >>
> >> 1) It actually works (it seems to)
> > 
> > It doesn't work for ublk zc since ublk needs to provide one kernel buffer
> > for fs rw & net send/recv to consume, and the kernel buffer is invisible
> > to userspace. But  __io_register_rsrc_update() only can register userspace
> > buffer.
> 
> I'd be surprised if this simple one was enough! In terms of user vs
> kernel buffer, you could certainly use the same mechanism, and just
> ensure that buffers are tagged appropriately. I need to think about that
> a little bit.

It is actually same with IORING_OP_PROVIDE_BUFFERS, so the following
consumer OPs have to wait until this OP_BUF_UPDATE is completed.

Suppose we have N consumers OPs which depends on OP_BUF_UPDATE.

1) all N OPs are linked with OP_BUF_UPDATE

Or

2) submit OP_BUF_UPDATE first, and wait its completion, then submit N
OPs concurrently.

But 1) and 2) may slow the IO handing.  In 1) all N OPs are serialized,
and 1 extra syscall is introduced in 2).

The same thing exists in the next OP_BUF_UPDATE which has to wait until
all the previous buffer consumers are done. So the same slow thing are
doubled. Not mention the application will become more complicated.

Here the provided buffer is only visible among the N OPs wide, and making
it global isn't necessary, and slow things down. And has kbuf lifetime
issue.

Also it makes error handling more complicated, io_uring has to remove
the kernel buffer when the current task is exit, dependency or order with
buffer provider is introduced.

There could be more problems, will try to remember all related stuff
thought before.

> 
> There are certainly many different ways that can get propagated which
> would not entail a complicated mechanism. I really like the aspect of
> having the identifier being the same thing that we already use, and
> hence not needing to be something new on the side.
> 
> > Also multiple OPs may consume the buffer concurrently, which can't be
> > supported by buffer select.
> 
> Why not? You can certainly have multiple ops using the same registered
> buffer concurrently right now.

Please see the above problem.

Also I remember that the selected buffer is removed from buffer list,
see io_provided_buffer_select(), but maybe I am wrong.

Thanks,
Ming