On Wed, Oct 9, 2024 at 9:57 AM Jens Axboe <axboe@xxxxxxxxx> wrote: > > On 10/9/24 10:55 AM, Mina Almasry wrote: > > On Mon, Oct 7, 2024 at 3:16?PM David Wei <dw@xxxxxxxxxxx> wrote: > >> > >> This patchset adds support for zero copy rx into userspace pages using > >> io_uring, eliminating a kernel to user copy. > >> > >> We configure a page pool that a driver uses to fill a hw rx queue to > >> hand out user pages instead of kernel pages. Any data that ends up > >> hitting this hw rx queue will thus be dma'd into userspace memory > >> directly, without needing to be bounced through kernel memory. 'Reading' > >> data out of a socket instead becomes a _notification_ mechanism, where > >> the kernel tells userspace where the data is. The overall approach is > >> similar to the devmem TCP proposal. > >> > >> This relies on hw header/data split, flow steering and RSS to ensure > >> packet headers remain in kernel memory and only desired flows hit a hw > >> rx queue configured for zero copy. Configuring this is outside of the > >> scope of this patchset. > >> > >> We share netdev core infra with devmem TCP. The main difference is that > >> io_uring is used for the uAPI and the lifetime of all objects are bound > >> to an io_uring instance. > > > > I've been thinking about this a bit, and I hope this feedback isn't > > too late, but I think your work may be useful for users not using > > io_uring. I.e. zero copy to host memory that is not dependent on page > > aligned MSS sizing. I.e. AF_XDP zerocopy but using the TCP stack. > > Not David, but come on, let's please get this moving forward. It's been > stuck behind dependencies for seemingly forever, which are finally > resolved. Part of the reason this has been stuck behind dependencies for so long is because the dependency took the time to implement things very generically (memory providers, net_iovs) and provided you with the primitives that enable your work. And dealt with nacks in this area you now don't have to deal with. > I don't think this is a reasonable ask at all for this > patchset. If you want to work on that after the fact, then that's > certainly an option. I think this work is extensible to sockets and the implementation need not be heavily tied to io_uring; yes at least leaving things open for a socket extension to be done easier in the future would be good, IMO. I'll look at the series more closely to see if I actually have any concrete feedback along these lines. I hope you're open to some of it :-) -- Thanks, Mina