Re: [PATCH v1 11/15] io_uring/zcrx: implement zerocopy receive pp memory provider

Pavel Begunkov <asml.silence@xxxxxxxxx> · Fri, 11 Oct 2024 02:49:17 +0100

On 10/11/24 01:32, Mina Almasry wrote:
On Thu, Oct 10, 2024 at 2:22 PM Pavel Begunkov <asml.silence@xxxxxxxxx> wrote:

page_pool. To make matters worse, the bypass is only there if the
netmems are returned from io_uring, and not bypassed when the netmems
are returned from driver/tcp stack. I'm guessing if you reused the
page_pool recycling in the io_uring return path then it would remove
the need for your provider to implement its own recycling for the
io_uring return case.

Is letting providers bypass and override the page_pool's recycling in
some code paths OK? IMO, no. A maintainer will make the judgement call

Mina, frankly, that's nonsense. If we extend the same logic,
devmem overrides page allocation rules with callbacks, devmem
overrides and violates page pool buffer lifetimes by extending
it to user space, devmem violates and overrides the page pool
object lifetime by binding buffers to sockets. And all of it
I'd rather name extends and enhances to fit in the devmem use
case.

and speak authoritatively here and I will follow, but I do think it's
a (much) worse design.

Sure, I have a completely opposite opinion, that's a much
better approach than returning through a syscall, but I will
agree with you that ultimately the maintainers will say if
that's acceptable for the networking or not.

Right, I'm not suggesting that you return the pages through a syscall.
That will add syscall overhead when it's better not to have that
especially in io_uring context. Devmem TCP needed a syscall because I
couldn't figure out a non-syscall way with sockets for the userspace
to tell the kernel that it's done with some netmems. You do not need
to follow that at all. Sorry if I made it seem like so.

However, I'm suggesting that when io_uring figures out that the
userspace is done with a netmem, that you feed that netmem back to the
pp, and utilize the pp's recycling, rather than adding your own
recycling in the provider.

I should spell it out somewhere in commits, the difference is that we
let the page pool to pull buffers instead of having a syscall to push
like devmem TCP does. With pushing, you'll be doing it from some task
context, and it'll need to find a way back into the page pool, via ptr
ring or with the opportunistic optimisations napi_pp_put_page() provides.
And if you do it this way, the function is very useful.

With pulling though, returning already happens from within the page
pool's allocation path, just in the right context that doesn't need
any additional locking / sync to access page pool's napi/bh protected
caches/etc.. That's why it has a potential to be faster, and why
optimisation wise napi_pp_put_page() doesn't make sense for this
case, i.e. no need to jump through hoops of finding how to transfer
a buffer to the page pool's context because we're already in there.

 From your commit message:

"we extend the lifetime by recycling buffers only after the user space
acknowledges that it's done processing the data via the refill queue"

It seems to me that you get some signal from the userspace that data

You don't even need to signal it, the page pool will take buffers
when it needs to allocate memory.

is ready to be reuse via that refill queue (whatever it is, very
sorry, I'm not that familiar with io_uring). My suggestion here is
when the userspace tells you that a netmem is ready for reuse (however
it does that), that you feed that page back to the pp via something
like napi_pp_put_page() or page_pool_put_page_bulk() if that makes
sense to you. FWIW I'm trying to look through your code to understand
what that refill queue is and where - if anywhere - it may be possible
to feed pages back to the pp, rather than directly to the provider.

--
Pavel Begunkov