On 10/7/24 4:15 PM, David Wei wrote: > From: David Wei <davidhwei@xxxxxxxx> > > Add a new object called an interface queue (ifq) that represents a net rx queue > that has been configured for zero copy. Each ifq is registered using a new > registration opcode IORING_REGISTER_ZCRX_IFQ. > > The refill queue is allocated by the kernel and mapped by userspace using a new > offset IORING_OFF_RQ_RING, in a similar fashion to the main SQ/CQ. It is used > by userspace to return buffers that it is done with, which will then be re-used > by the netdev again. > > The main CQ ring is used to notify userspace of received data by using the > upper 16 bytes of a big CQE as a new struct io_uring_zcrx_cqe. Each entry > contains the offset + len to the data. > > For now, each io_uring instance only has a single ifq. Looks pretty straight forward to me, but please wrap your commit messages at ~72 chars or it doesn't read so well in the git log. A few minor comments... > diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h > index adc2524fd8e3..567cdb89711e 100644 > --- a/include/uapi/linux/io_uring.h > +++ b/include/uapi/linux/io_uring.h > @@ -595,6 +597,9 @@ enum io_uring_register_op { > IORING_REGISTER_NAPI = 27, > IORING_UNREGISTER_NAPI = 28, > > + /* register a netdev hw rx queue for zerocopy */ > + IORING_REGISTER_ZCRX_IFQ = 29, > + Will need to change as the current tree has moved a bit beyond this. Not a huge deal, just an FYI as it obviously impacts userspace too. > +struct io_uring_zcrx_rqe { > + __u64 off; > + __u32 len; > + __u32 __pad; > +}; > + > +struct io_uring_zcrx_cqe { > + __u64 off; > + __u64 __pad; > +}; Would be nice to avoid padding for this one as it doubles its size. But at the same time, always nice to have padding for future proofing... Always a good idea to add padding, but > diff --git a/io_uring/Makefile b/io_uring/Makefile > index 61923e11c767..1a1184f3946a 100644 > --- a/io_uring/Makefile > +++ b/io_uring/Makefile > @@ -10,6 +10,7 @@ obj-$(CONFIG_IO_URING) += io_uring.o opdef.o kbuf.o rsrc.o notif.o \ > epoll.o statx.o timeout.o fdinfo.o \ > cancel.o waitid.o register.o \ > truncate.o memmap.o > +obj-$(CONFIG_PAGE_POOL) += zcrx.o > obj-$(CONFIG_IO_WQ) += io-wq.o > obj-$(CONFIG_FUTEX) += futex.o > obj-$(CONFIG_NET_RX_BUSY_POLL) += napi.o I wonder if this should be expressed a bit differently. Probably have a CONFIG_IO_URING_ZCRX which depends on CONFIG_INET and CONFIG_PAGE_POOL. And then you can also use that rather than doing: #if defined(CONFIG_PAGE_POOL) && defined(CONFIG_INET) in some spots. Not a big deal, it'll work as-is. And honestly should probably cleanup the existing IO_WQ symbol while at it, so perhaps better left for after the fact. > +static int io_allocate_rbuf_ring(struct io_zcrx_ifq *ifq, > + struct io_uring_zcrx_ifq_reg *reg) > +{ > + size_t off, size; > + void *ptr; > + > + off = sizeof(struct io_uring); > + size = off + sizeof(struct io_uring_zcrx_rqe) * reg->rq_entries; > + > + ptr = io_pages_map(&ifq->rqe_pages, &ifq->n_rqe_pages, size); > + if (IS_ERR(ptr)) > + return PTR_ERR(ptr); > + > + ifq->rq_ring = (struct io_uring *)ptr; > + ifq->rqes = (struct io_uring_zcrx_rqe *)((char *)ptr + off); > + return 0; > +} No need to cast that ptr to char *. Rest looks fine to me. -- Jens Axboe