Re: [RFC PATCH net-next v8 02/14] net: page_pool: create hooks for custom page providers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 5/7/24 17:42, Mina Almasry wrote:
On Tue, May 7, 2024 at 9:24 AM Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote:

On Tue, May 07, 2024 at 01:18:57PM -0300, Jason Gunthorpe wrote:
On Tue, May 07, 2024 at 05:05:12PM +0100, Pavel Begunkov wrote:
even in tree if you give them enough rope, and they should not have
that rope when the only sensible options are page/folio based kernel
memory (incuding large/huge folios) and dmabuf.

I believe there is at least one deep confusion here, considering you
previously mentioned Keith's pre-mapping patches. The "hooks" are not
that about in what format you pass memory, it's arguably the least
interesting part for page pool, more or less it'd circulate whatever
is given. It's more of how to have a better control over buffer lifetime
and implement a buffer pool passing data to users and empty buffers
back.

Isn't that more or less exactly what dmabuf is? Why do you need
another almost dma-buf thing for another project?

That's the exact point I've been making since the last round of
the series.  We don't need to reinvent dmabuf poorly in every
subsystem, but instead fix the odd parts in it and make it suitable
for everyone.



FWIW the change Christoph is requesting is straight forward from my
POV and doesn't really hurt the devmem use case. I'd basically remove
the ops and add an if statement in the slow path where the ops are
being used to alloc/free from dmabuf instead of alloc_pages().
Something like (very rough, doesn't compile):

diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 92be1aaf18ccc..2cc986455bce6 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -557,8 +557,8 @@ netmem_ref page_pool_alloc_netmem(struct page_pool
*pool, gfp_t gfp)
                 return netmem;

         /* Slow-path: cache empty, do real allocation */
-       if (static_branch_unlikely(&page_pool_mem_providers) && pool->mp_ops)
-               netmem = pool->mp_ops->alloc_pages(pool, gfp);
+       if (page_pool_is_dmabuf(pool))
+               netmem = mp_dmabuf_devmem_alloc_pages():
         else
                 netmem = __page_pool_alloc_pages_slow(pool, gfp);
         return netmem;


The folks that will be negatively impacted by this are
Jakub/Pavel/David. I think all were planning to extend the hooks for
io_uring or other memory types.

Pavel/David, AFAICT you have these options here (but maybe you can
think of more):

1. Align with devmem TCP to use udmabuf for your io_uring memory. I
think in the past you said it's a uapi you don't link but in the face
of this pushback you may want to reconsider.

If the argument would be that we have to switch to a less efficient
and less consistent api for io_uring (fast path handling used buffers
back to kernel) just because it has to has dmabuf and without direct
relation to dmabuf, then no, it's not the way anything can be sanely
developed.

2. Follow the example of devmem TCP and add another if statement to
alloc from io_uring, so something like:

diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 92be1aaf18ccc..3545bb82c7d05 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -557,8 +557,10 @@ netmem_ref page_pool_alloc_netmem(struct
page_pool *pool, gfp_t gfp)
                 return netmem;

         /* Slow-path: cache empty, do real allocation */
-       if (static_branch_unlikely(&page_pool_mem_providers) && pool->mp_ops)
-               netmem = pool->mp_ops->alloc_pages(pool, gfp);
+       if (page_pool_is_dmabuf(pool))
+               netmem = mp_dmabuf_devmem_alloc_pages():
+       else if (page_pool_is_io_uring(pool))
+               netmem = mp_io_uring_alloc_pages():
         else
                 netmem = __page_pool_alloc_pages_slow(pool, gfp);

I don't see why we'd do that instead instead of having a
well made function table, which is equivalent.

         return netmem;

Note that Christoph/Jason may not like you adding non-dmabuf io_uring
backing memory in the first place, so there may be pushback against
this approach.

Christoph mentioned pages, we're using pages, I don't think it's
too fancy. I don't believe that's it, which would be equivalent to
"let's remove user pointers from the kernel and mandate passing
dmabuf only".


3. Pushback on the nack on this thread. It seems you're already
discussing this. I'll see what happens.

To be honest the GVE queue-API has just been merged I think, so I'm
now unblocked on sending non-RFCs of this work and I'm hoping to send
the next version soon. I may apply these changes on the next version
for more discussion or leave as is and carry the nack until the
conversation converges.


--
Pavel Begunkov




[Index of Archives]     [LKML Archive]     [Linux ARM Kernel]     [Linux ARM]     [Git]     [Yosemite News]     [Linux SCSI]     [Linux Hams]

  Powered by Linux