On Mon, Jun 10, 2024 at 03:00:23AM +0100, Pavel Begunkov wrote: > On 5/11/24 01:12, Ming Lei wrote: > > SQE group with REQ_F_SQE_GROUP_DEP introduces one new mechanism to share > > resource among one group of requests, and all member requests can consume > > the resource provided by group lead efficiently in parallel. > > > > This patch uses the added sqe group feature REQ_F_SQE_GROUP_DEP to share > > kernel buffer in sqe group: > > > > - the group lead provides kernel buffer to member requests > > > > - member requests use the provided buffer to do FS or network IO, or more > > operations in future > > > > - this kernel buffer is returned back after member requests use it up > > > > This way looks a bit similar with kernel's pipe/splice, but there are some > > important differences: > > > > - splice is for transferring data between two FDs via pipe, and fd_out can > > only read data from pipe; this feature can borrow buffer from group lead to > > members, so member request can write data to this buffer if the provided > > buffer is allowed to write to. > > > > - splice implements data transfer by moving pages between subsystem and > > pipe, that means page ownership is transferred, and this way is one of the > > most complicated thing of splice; this patch supports scenarios in which > > the buffer can't be transferred, and buffer is only borrowed to member > > requests, and is returned back after member requests consume the provided > > buffer, so buffer lifetime is simplified a lot. Especially the buffer is > > guaranteed to be returned back. > > > > - splice can't run in async way basically > > > > It can help to implement generic zero copy between device and related > > operations, such as ublk, fuse, vdpa, even network receive or whatever. > > > > Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx> > > --- > > include/linux/io_uring_types.h | 33 +++++++++++++++++++ > > io_uring/io_uring.c | 10 +++++- > > io_uring/io_uring.h | 5 +++ > > io_uring/kbuf.c | 60 ++++++++++++++++++++++++++++++++++ > > io_uring/kbuf.h | 13 ++++++++ > > io_uring/net.c | 31 +++++++++++++++++- > > io_uring/opdef.c | 5 +++ > > io_uring/opdef.h | 2 ++ > > io_uring/rw.c | 20 +++++++++++- > > 9 files changed, 176 insertions(+), 3 deletions(-) > > > ... > > diff --git a/io_uring/net.c b/io_uring/net.c > > index 070dea9a4eda..83fd5879082e 100644 > > --- a/io_uring/net.c > > +++ b/io_uring/net.c > > @@ -79,6 +79,13 @@ struct io_sr_msg { > ... > > retry_bundle: > > if (io_do_buffer_select(req)) { > > struct buf_sel_arg arg = { > > @@ -1132,6 +1148,11 @@ int io_recv(struct io_kiocb *req, unsigned int issue_flags) > > if (unlikely(ret)) > > goto out_free; > > sr->buf = NULL; > > + } else if (req->flags & REQ_F_GROUP_KBUF) { > > + ret = io_import_group_kbuf(req, user_ptr_to_u64(sr->buf), > > + sr->len, ITER_DEST, &kmsg->msg.msg_iter); > > + if (unlikely(ret)) > > + goto out_free; > > } > > kmsg->msg.msg_inq = -1; > > @@ -1334,6 +1355,14 @@ static int io_send_zc_import(struct io_kiocb *req, struct io_async_msghdr *kmsg) > > if (unlikely(ret)) > > return ret; > > kmsg->msg.sg_from_iter = io_sg_from_iter; > > + } else if (req->flags & REQ_F_GROUP_KBUF) { > > + struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg); > > + > > + ret = io_import_group_kbuf(req, user_ptr_to_u64(sr->buf), > > + sr->len, ITER_SOURCE, &kmsg->msg.msg_iter); > > + if (unlikely(ret)) > > + return ret; > > + kmsg->msg.sg_from_iter = io_sg_from_iter; > > Not looking here too deeply I'm pretty sure it's buggy. > The buffer can only be reused once the notification > CQE completes, and there is nothing in regards to it. OK. It isn't triggered in ublk-nbd because the buffer is still valid until the peer reply is received, when the notification is definitely ready. I will remove send zc support in the enablement series, and it can be added in future without much difficulty. Thanks, Ming