On Tue, May 7, 2024 at 9:55 AM Pavel Begunkov <asml.silence@xxxxxxxxx> wrote: > > On 5/7/24 17:23, Christoph Hellwig wrote: > > On Tue, May 07, 2024 at 01:18:57PM -0300, Jason Gunthorpe wrote: > >> On Tue, May 07, 2024 at 05:05:12PM +0100, Pavel Begunkov wrote: > >>>> even in tree if you give them enough rope, and they should not have > >>>> that rope when the only sensible options are page/folio based kernel > >>>> memory (incuding large/huge folios) and dmabuf. > >>> > >>> I believe there is at least one deep confusion here, considering you > >>> previously mentioned Keith's pre-mapping patches. The "hooks" are not > >>> that about in what format you pass memory, it's arguably the least > >>> interesting part for page pool, more or less it'd circulate whatever > >>> is given. It's more of how to have a better control over buffer lifetime > >>> and implement a buffer pool passing data to users and empty buffers > >>> back. > >> > >> Isn't that more or less exactly what dmabuf is? Why do you need > >> another almost dma-buf thing for another project? > > > > That's the exact point I've been making since the last round of > > the series. We don't need to reinvent dmabuf poorly in every > > subsystem, but instead fix the odd parts in it and make it suitable > > for everyone. > > Someone would need to elaborate how dma-buf is like that addition > to page pool infra. I think I understand what Jason is requesting here, and I'll take a shot at elaborating. AFAICT what he's saying is technically feasible and addresses the nack while giving you the uapi you want. It just requires a bit (a lot?) of work on your end unfortunately. CONFIG_UDMABUF takes in a memfd, converts it to a dmabuf, and returns it to userspace. See udmabuf_create(). I think what Jason is saying here, is that you can write similar code to udmabuf_creat() that takes in a io_uring memory region, and converts it to a dmabuf inside the kernel. I haven't looked at your series yet too closely (sorry!), but I assume you currently have a netlink API that binds an io_uring memory region to the NIC rx-queue page_pool, right? That netlink API would need to be changed to: 1. Take in the io_uring memory. 2. Convert it to a dmabuf like udmabuf_create() does. 3. Bind the resulting dmabuf to the rx-queue page_pool. There would be more changes needed vis-a-vis the clean up path and lifetime management, but I think this is the general idea. This would give you the uapi you want, while the page_pool never seen non-dmabuf memory (addresses the nack, I think). -- Thanks, Mina