Re: [PATCH v1 06/15] net: page_pool: add ->scrub mem provider callback

Pavel Begunkov <asml.silence@xxxxxxxxx> · Mon, 14 Oct 2024 14:37:26 +0100

On 10/13/24 18:25, David Wei wrote:
On 2024-10-10 10:54, Mina Almasry wrote:
On Wed, Oct 9, 2024 at 2:58 PM Pavel Begunkov <asml.silence@xxxxxxxxx> wrote:

On 10/9/24 22:00, Mina Almasry wrote:
On Mon, Oct 7, 2024 at 3:16 PM David Wei <dw@xxxxxxxxxxx> wrote:

From: Pavel Begunkov <asml.silence@xxxxxxxxx>

page pool is now waiting for all ppiovs to return before destroying
itself, and for that to happen the memory provider might need to push
some buffers, flush caches and so on.

todo: we'll try to get by without it before the final release

Is the intention to drop this todo and stick with this patch, or to
move ahead with this patch?

Heh, I overlooked this todo. The plan is to actually leave it
as is, it's by far the simplest way and doesn't really gets
into anyone's way as it's a slow path.

To be honest, I think I read in a follow up patch that you want to
unref all the memory on page_pool_destory, which is not how the
page_pool is used today. Tdoay page_pool_destroy does not reclaim
memory. Changing that may be OK.

It doesn't because it can't (not breaking anything), which is a
problem as the page pool might never get destroyed. io_uring
doesn't change that, a buffer can't be reclaimed while anything
in the kernel stack holds it. It's only when it's given to the
user we can force it back out of there.

The page pool will definitely be destroyed, the call to
netdev_rx_queue_restart() with mp_ops/mp_priv set to null and netdev
core will ensure that.

And it has to happen one way or another, we can't trust the
user to put buffers back, it's just devmem does that by temporarily
attaching the lifetime of such buffers to a socket.

(noob question) does io_uring not have a socket equivalent that you
can tie the lifetime of the buffers to? I'm thinking there must be
one, because in your patches IIRC you have the fill queues and the
memory you bind from the userspace, there should be something that
tells you that the userspace has exited/crashed and it's time to now
destroy the fill queue and unbind the memory, right?

I'm thinking you may want to bind the lifetime of the buffers to that,
instead of the lifetime of the pool. The pool will not be destroyed
until the next driver/reset reconfiguration happens, right? That could
be long long after the userspace has stopped using the memory.

Yes, there are io_uring objects e.g. interface queue that hold
everything together. IIRC page pool destroy doesn't unref but it waits
for all pages that are handed out to skbs to be returned. So for us,
below might work:

1. Call netdev_rx_queue_restart() which allocates a new pp for the rx
    queue and tries to free the old pp
2. At this point we're guaranteed that any packets hitting this rx queue
    will not go to user pages from our memory provider
3. Assume userspace is gone (either crash or gracefully terminating),
    unref the uref for all pages, same as what scrub() is doing today
4. Any pages that are still in skb frags will get freed when the sockets
    etc are closed
5. Rely on the pp delay release to eventually terminate and clean up

Let me know what you think Pavel.

I'll get to this comment a bit later when I get some time to
remember what races we have to deal with without the callback.

--
Pavel Begunkov