On Sun, Oct 13, 2024 at 8:25 PM David Wei <dw@xxxxxxxxxxx> wrote: > > On 2024-10-10 10:54, Mina Almasry wrote: > > On Wed, Oct 9, 2024 at 2:58 PM Pavel Begunkov <asml.silence@xxxxxxxxx> wrote: > >> > >> On 10/9/24 22:00, Mina Almasry wrote: > >>> On Mon, Oct 7, 2024 at 3:16 PM David Wei <dw@xxxxxxxxxxx> wrote: > >>>> > >>>> From: Pavel Begunkov <asml.silence@xxxxxxxxx> > >>>> > >>>> page pool is now waiting for all ppiovs to return before destroying > >>>> itself, and for that to happen the memory provider might need to push > >>>> some buffers, flush caches and so on. > >>>> > >>>> todo: we'll try to get by without it before the final release > >>>> > >>> > >>> Is the intention to drop this todo and stick with this patch, or to > >>> move ahead with this patch? > >> > >> Heh, I overlooked this todo. The plan is to actually leave it > >> as is, it's by far the simplest way and doesn't really gets > >> into anyone's way as it's a slow path. > >> > >>> To be honest, I think I read in a follow up patch that you want to > >>> unref all the memory on page_pool_destory, which is not how the > >>> page_pool is used today. Tdoay page_pool_destroy does not reclaim > >>> memory. Changing that may be OK. > >> > >> It doesn't because it can't (not breaking anything), which is a > >> problem as the page pool might never get destroyed. io_uring > >> doesn't change that, a buffer can't be reclaimed while anything > >> in the kernel stack holds it. It's only when it's given to the > >> user we can force it back out of there. > > The page pool will definitely be destroyed, the call to > netdev_rx_queue_restart() with mp_ops/mp_priv set to null and netdev > core will ensure that. > > >> > >> And it has to happen one way or another, we can't trust the > >> user to put buffers back, it's just devmem does that by temporarily > >> attaching the lifetime of such buffers to a socket. > >> > > > > (noob question) does io_uring not have a socket equivalent that you > > can tie the lifetime of the buffers to? I'm thinking there must be > > one, because in your patches IIRC you have the fill queues and the > > memory you bind from the userspace, there should be something that > > tells you that the userspace has exited/crashed and it's time to now > > destroy the fill queue and unbind the memory, right? > > > > I'm thinking you may want to bind the lifetime of the buffers to that, > > instead of the lifetime of the pool. The pool will not be destroyed > > until the next driver/reset reconfiguration happens, right? That could > > be long long after the userspace has stopped using the memory. > > > > Yes, there are io_uring objects e.g. interface queue that hold > everything together. IIRC page pool destroy doesn't unref but it waits > for all pages that are handed out to skbs to be returned. So for us, > below might work: > > 1. Call netdev_rx_queue_restart() which allocates a new pp for the rx > queue and tries to free the old pp > 2. At this point we're guaranteed that any packets hitting this rx queue > will not go to user pages from our memory provider > 3. Assume userspace is gone (either crash or gracefully terminating), > unref the uref for all pages, same as what scrub() is doing today > 4. Any pages that are still in skb frags will get freed when the sockets > etc are closed > 5. Rely on the pp delay release to eventually terminate and clean up > > Let me know what you think Pavel. Something roughly along those lines sounds more reasonable to me. The critical point is as I said above, if you free the memory only when the pp is destroyed, then the memory lives from 1 io_uring ZC instance to the next. The next instance will see a reduced address space because the previously destroyed io_uring ZC connection did not free the memory. You could have users in production opening thousands of io_uring ZC connections between rxq resets, and not cleaning up those connections. In that case I think eventually they'll run out of memory as the memory leaks until it's cleaned up with a pp destroy (driver reset?). -- Thanks, Mina