On 3/13/24 20:25, Jens Axboe wrote:
On 3/12/24 3:44 PM, David Wei wrote:
Add an io_uring opcode OP_RECV_ZC for doing ZC reads from a socket that
is set up for ZC Rx. The request reads skbs from a socket. Completions
are posted into the main CQ for each page frag read.
Big CQEs (CQE32) is required as the OP_RECV_ZC specific metadata (ZC
region, offset, len) are stored in the extended 16 bytes as a
struct io_uring_rbuf_cqe.
For now there is no limit as to how much work each OP_RECV_ZC request
does. It will attempt to drain a socket of all available data.
Multishot requests are also supported. The first time an io_recvzc
request completes, EAGAIN is returned which arms an async poll. Then, in
subsequent runs in task work, IOU_ISSUE_SKIP_COMPLETE is returned to
continue async polling.
I'd probably drop that last paragraph, this is how all multishot
requests work and is implementation detail that need not go in the
commit message. Probably suffices just to say it supports multishot.
@@ -695,7 +701,7 @@ static inline bool io_recv_finish(struct io_kiocb *req, int *ret,
unsigned int cflags;
cflags = io_put_kbuf(req, issue_flags);
- if (msg->msg_inq && msg->msg_inq != -1)
+ if (msg && msg->msg_inq && msg->msg_inq != -1)
cflags |= IORING_CQE_F_SOCK_NONEMPTY;
if (!(req->flags & REQ_F_APOLL_MULTISHOT)) {
@@ -723,7 +729,7 @@ static inline bool io_recv_finish(struct io_kiocb *req, int *ret,
goto enobufs;
/* Known not-empty or unknown state, retry */
- if (cflags & IORING_CQE_F_SOCK_NONEMPTY || msg->msg_inq == -1) {
+ if (cflags & IORING_CQE_F_SOCK_NONEMPTY || (msg && msg->msg_inq == -1)) {
if (sr->nr_multishot_loops++ < MULTISHOT_MAX_RETRY)
return false;
/* mshot retries exceeded, force a requeue */
Maybe refactor this a bit so that you don't need to add these NULL
checks? That seems pretty fragile, hard to read, and should be doable
without extra checks.
That chunk can be completely thrown away, we're not using
io_recv_finish() here anymore
@@ -1053,6 +1058,85 @@ struct io_zc_rx_ifq *io_zc_verify_sock(struct io_kiocb *req,
return ifq;
}
+int io_recvzc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
+{
+ struct io_recvzc *zc = io_kiocb_to_cmd(req, struct io_recvzc);
+
+ /* non-iopoll defer_taskrun only */
+ if (!req->ctx->task_complete)
+ return -EINVAL;
What's the reasoning behind this?
CQ locking, see the comment a couple lines below
+ struct io_recvzc *zc = io_kiocb_to_cmd(req, struct io_recvzc);
+ struct io_zc_rx_ifq *ifq;
+ struct socket *sock;
+ int ret;
+
+ /*
+ * We're posting CQEs deeper in the stack, and to avoid taking CQ locks
+ * we serialise by having only the master thread modifying the CQ with
+ * DEFER_TASkRUN checked earlier and forbidding executing it from io-wq.
+ * That's similar to io_check_multishot() for multishot CQEs.
+ */
This one ^^, though it doesn't read well, I should reword it for
next time.
+ if (issue_flags & IO_URING_F_IOWQ)
+ return -EAGAIN;
+ if (WARN_ON_ONCE(!(issue_flags & IO_URING_F_NONBLOCK)))
+ return -EAGAIN;
If rebased on the current tree, does this go away?
It's just a little behind not to have that change
--
Pavel Begunkov