On 5/7/24 18:15, Mina Almasry wrote:
On Tue, May 7, 2024 at 9:55 AM Pavel Begunkov <asml.silence@xxxxxxxxx> wrote:
On 5/7/24 17:23, Christoph Hellwig wrote:
On Tue, May 07, 2024 at 01:18:57PM -0300, Jason Gunthorpe wrote:
On Tue, May 07, 2024 at 05:05:12PM +0100, Pavel Begunkov wrote:
even in tree if you give them enough rope, and they should not have
that rope when the only sensible options are page/folio based kernel
memory (incuding large/huge folios) and dmabuf.
I believe there is at least one deep confusion here, considering you
previously mentioned Keith's pre-mapping patches. The "hooks" are not
that about in what format you pass memory, it's arguably the least
interesting part for page pool, more or less it'd circulate whatever
is given. It's more of how to have a better control over buffer lifetime
and implement a buffer pool passing data to users and empty buffers
back.
Isn't that more or less exactly what dmabuf is? Why do you need
another almost dma-buf thing for another project?
That's the exact point I've been making since the last round of
the series. We don't need to reinvent dmabuf poorly in every
subsystem, but instead fix the odd parts in it and make it suitable
for everyone.
Someone would need to elaborate how dma-buf is like that addition
to page pool infra.
I think I understand what Jason is requesting here, and I'll take a
shot at elaborating. AFAICT what he's saying is technically feasible
and addresses the nack while giving you the uapi you want. It just
requires a bit (a lot?) of work on your end unfortunately.
CONFIG_UDMABUF takes in a memfd, converts it to a dmabuf, and returns
it to userspace. See udmabuf_create().
I think what Jason is saying here, is that you can write similar code
to udmabuf_creat() that takes in a io_uring memory region, and
converts it to a dmabuf inside the kernel.
I haven't looked at your series yet too closely (sorry!), but I assume
you currently have a netlink API that binds an io_uring memory region
to the NIC rx-queue page_pool, right? That netlink API would need to
be changed to:
No, it's different, I'll skip details, but the main problem is
that those callbacks are used to implement the user api returning
buffers via a ring, where the callback grabs them (in napi context)
and feeds into page pool. That replaces SO_DEVMEM_DONTNEED and the
need for ioctl/setsockopt.
1. Take in the io_uring memory.
2. Convert it to a dmabuf like udmabuf_create() does.
3. Bind the resulting dmabuf to the rx-queue page_pool.
There would be more changes needed vis-a-vis the clean up path and
lifetime management, but I think this is the general idea.
This would give you the uapi you want, while the page_pool never seen
non-dmabuf memory (addresses the nack, I think).
--
Pavel Begunkov