On 3/9/25 12:42, Toke Høiland-Jørgensen wrote:
Mina Almasry <almasrymina@xxxxxxxxxx> writes:
On Sat, Mar 8, 2025 at 6:55 AM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote:
When enabling DMA mapping in page_pool, pages are kept DMA mapped until
they are released from the pool, to avoid the overhead of re-mapping the
pages every time they are used. This causes problems when a device is
torn down, because the page pool can't unmap the pages until they are
returned to the pool. This causes resource leaks and/or crashes when
there are pages still outstanding while the device is torn down, because
page_pool will attempt an unmap of a non-existent DMA device on the
subsequent page return.
To fix this, implement a simple tracking of outstanding dma-mapped pages
in page pool using an xarray. This was first suggested by Mina[0], and
turns out to be fairly straight forward: We simply store pointers to
pages directly in the xarray with xa_alloc() when they are first DMA
mapped, and remove them from the array on unmap. Then, when a page pool
is torn down, it can simply walk the xarray and unmap all pages still
present there before returning, which also allows us to get rid of the
get/put_device() calls in page_pool.
THANK YOU!! I had been looking at the other proposals to fix this here
and there and I had similar feelings to you. They add lots of code
changes and the code changes themselves were hard for me to
understand. I hope we can make this simpler approach work.
You're welcome :)
And yeah, me too!
Using xa_cmpxchg(), no additional
synchronisation is needed, as a page will only ever be unmapped once.
Very clever. I had been wondering how to handle the concurrency. I
also think this works.
Thanks!
To avoid having to walk the entire xarray on unmap to find the page
reference, we stash the ID assigned by xa_alloc() into the page
structure itself, in the field previously called '_pp_mapping_pad' in
the page_pool struct inside struct page. This field overlaps with the
page->mapping pointer, which may turn out to be problematic, so an
alternative is probably needed. Sticking the ID into some of the upper
bits of page->pp_magic may work as an alternative, but that requires
further investigation. Using the 'mapping' field works well enough as
a demonstration for this RFC, though.
I'm unsure about this. I think page->mapping may be used when we map
the page to the userspace in TCP zerocopy, but I'm really not sure.
Yes, finding somewhere else to put the id would be ideal. Do we really
need a full unsigned long for the pp_magic?
No, pp_magic was also my backup plan (see the other thread). Tried
actually doing that now, and while there's a bit of complication due to
the varying definitions of POISON_POINTER_DELTA across architectures,
but it seems that this can be defined at compile time. I'll send a v2
RFC with this change.
FWIW, personally I like this one much more than an extra indirection
to pp.
If we're out of space in the page, why can't we use struct page *
as indices into the xarray? Ala
struct page *p = ...;
xa_store(xarray, index=(unsigned long)p, p);
Indices wouldn't be nicely packed, but it's still a map. Is there
a problem with that I didn't consider?
--
Pavel Begunkov