Pavel Begunkov wrote: > On 4/4/24 23:17, Oliver Crumrine wrote: > > In his patch to enable zerocopy networking for io_uring, Pavel Begunkov > > specifically disabled REQ_F_CQE_SKIP, as (at least from my > > understanding) the userspace program wouldn't receive the > > IORING_CQE_F_MORE flag in the result value. > > No. IORING_CQE_F_MORE means there will be another CQE from this > request, so a single CQE without IORING_CQE_F_MORE is trivially > fine. > > The problem is the semantics, because by suppressing the first > CQE you're loosing the result value. You might rely on WAITALL That's already happening with io_send. > as other sends and "fail" (in terms of io_uring) the request > in case of a partial send posting 2 CQEs, but that's not a great > way and it's getting userspace complicated pretty easily. > > In short, it was left out for later because there is a > better way to implement it, but it should be done carefully Maybe we could put the return values in the notifs? That would be a discrepancy between io_send and io_send_zc, though. > > > > To fix this, instead of keeping track of how many CQEs have been > > received, and subtracting notifs from that, programs can keep track of > > That's a benchmark way of doing it, more realistically > it'd be more like > > event_loop() { > cqe = wait_cqe(); > struct req *r = (struct req *)cqe->user_data; > r->callback(r, cqe); > > > send_zc_callback(req, cqe) { > if (cqe->flags & F_MORE) { > // don't free the req > // we should wait for another CQE > ... > } > } > > > how many SQEs they have issued, and if a CQE is returned with an error, > > they can simply subtract from how many notifs they expect to receive. > > The design specifically untangles those two notions, i.e. there can > be a notification even when the main CQE fails (ret<0). It's safer > this way, even though AFAIK relying on errors would be fine with > current users (TCP/UDP). > > > > Signed-off-by: Oliver Crumrine <ozlinuxc@xxxxxxxxx> > > --- > > io_uring/net.c | 6 ++---- > > 1 file changed, 2 insertions(+), 4 deletions(-) > > > > diff --git a/io_uring/net.c b/io_uring/net.c > > index 1e7665ff6ef7..822f49809b68 100644 > > --- a/io_uring/net.c > > +++ b/io_uring/net.c > > @@ -1044,9 +1044,6 @@ int io_send_zc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) > > > > if (unlikely(READ_ONCE(sqe->__pad2[0]) || READ_ONCE(sqe->addr3))) > > return -EINVAL; > > - /* we don't support IOSQE_CQE_SKIP_SUCCESS just yet */ > > - if (req->flags & REQ_F_CQE_SKIP) > > - return -EINVAL; > > > > notif = zc->notif = io_alloc_notif(ctx); > > if (!notif) > > @@ -1342,7 +1339,8 @@ void io_sendrecv_fail(struct io_kiocb *req) > > req->cqe.res = sr->done_io; > > > > if ((req->flags & REQ_F_NEED_CLEANUP) && > > - (req->opcode == IORING_OP_SEND_ZC || req->opcode == IORING_OP_SENDMSG_ZC)) > > + (req->opcode == IORING_OP_SEND_ZC || req->opcode == IORING_OP_SENDMSG_ZC) && > > + !(req->flags & REQ_F_CQE_SKIP)) > > req->cqe.flags |= IORING_CQE_F_MORE; > > } > > > > -- > Pavel Begunkov