On Wed, Dec 04, 2024 at 09:21:48AM -0800, David Wei wrote: > From: David Wei <davidhwei@xxxxxxxx> > > Add a new object called an interface queue (ifq) that represents a net > rx queue that has been configured for zero copy. Each ifq is registered > using a new registration opcode IORING_REGISTER_ZCRX_IFQ. > > The refill queue is allocated by the kernel and mapped by userspace > using a new offset IORING_OFF_RQ_RING, in a similar fashion to the main > SQ/CQ. It is used by userspace to return buffers that it is done with, > which will then be re-used by the netdev again. > > The main CQ ring is used to notify userspace of received data by using > the upper 16 bytes of a big CQE as a new struct io_uring_zcrx_cqe. Each > entry contains the offset + len to the data. > > For now, each io_uring instance only has a single ifq. > > Signed-off-by: David Wei <dw@xxxxxxxxxxx> ... > diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c ... > +int io_register_zcrx_ifq(struct io_ring_ctx *ctx, > + struct io_uring_zcrx_ifq_reg __user *arg) > +{ > + struct io_uring_zcrx_ifq_reg reg; > + struct io_uring_region_desc rd; > + struct io_zcrx_ifq *ifq; > + size_t ring_sz, rqes_sz; > + int ret; > + > + /* > + * 1. Interface queue allocation. > + * 2. It can observe data destined for sockets of other tasks. > + */ > + if (!capable(CAP_NET_ADMIN)) > + return -EPERM; > + > + /* mandatory io_uring features for zc rx */ > + if (!(ctx->flags & IORING_SETUP_DEFER_TASKRUN && > + ctx->flags & IORING_SETUP_CQE32)) > + return -EINVAL; > + if (ctx->ifq) > + return -EBUSY; > + if (copy_from_user(®, arg, sizeof(reg))) > + return -EFAULT; > + if (copy_from_user(&rd, u64_to_user_ptr(reg.region_ptr), sizeof(rd))) > + return -EFAULT; > + if (memchr_inv(®.__resv, 0, sizeof(reg.__resv))) > + return -EINVAL; > + if (reg.if_rxq == -1 || !reg.rq_entries || reg.flags) > + return -EINVAL; > + if (reg.rq_entries > IO_RQ_MAX_ENTRIES) { > + if (!(ctx->flags & IORING_SETUP_CLAMP)) > + return -EINVAL; > + reg.rq_entries = IO_RQ_MAX_ENTRIES; > + } > + reg.rq_entries = roundup_pow_of_two(reg.rq_entries); > + > + if (!reg.area_ptr) > + return -EFAULT; > + > + ifq = io_zcrx_ifq_alloc(ctx); > + if (!ifq) > + return -ENOMEM; > + > + ret = io_allocate_rbuf_ring(ifq, ®, &rd); > + if (ret) > + goto err; > + > + ifq->rq_entries = reg.rq_entries; > + ifq->if_rxq = reg.if_rxq; > + > + ring_sz = sizeof(struct io_uring); > + rqes_sz = sizeof(struct io_uring_zcrx_rqe) * ifq->rq_entries; Hi David, A minor nit from my side: rqes_sz is set but otherwise unused in this function. Perhaps it can be removed? Flagged by W=1 builds. > + reg.offsets.rqes = ring_sz; > + reg.offsets.head = offsetof(struct io_uring, head); > + reg.offsets.tail = offsetof(struct io_uring, tail); > + > + if (copy_to_user(arg, ®, sizeof(reg)) || > + copy_to_user(u64_to_user_ptr(reg.region_ptr), &rd, sizeof(rd))) { > + ret = -EFAULT; > + goto err; > + } > + > + ctx->ifq = ifq; > + return 0; > +err: > + io_zcrx_ifq_free(ifq); > + return ret; > +} ...