Mina Almasry <almasrymina@xxxxxxxxxx> writes: > On Sun, Mar 9, 2025 at 5:50 AM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote: >> >> When enabling DMA mapping in page_pool, pages are kept DMA mapped until >> they are released from the pool, to avoid the overhead of re-mapping the >> pages every time they are used. This causes problems when a device is >> torn down, because the page pool can't unmap the pages until they are >> returned to the pool. This causes resource leaks and/or crashes when >> there are pages still outstanding while the device is torn down, because >> page_pool will attempt an unmap of a non-existent DMA device on the >> subsequent page return. >> >> To fix this, implement a simple tracking of outstanding dma-mapped pages >> in page pool using an xarray. This was first suggested by Mina[0], and >> turns out to be fairly straight forward: We simply store pointers to >> pages directly in the xarray with xa_alloc() when they are first DMA >> mapped, and remove them from the array on unmap. Then, when a page pool >> is torn down, it can simply walk the xarray and unmap all pages still >> present there before returning, which also allows us to get rid of the >> get/put_device() calls in page_pool. Using xa_cmpxchg(), no additional >> synchronisation is needed, as a page will only ever be unmapped once. >> >> To avoid having to walk the entire xarray on unmap to find the page >> reference, we stash the ID assigned by xa_alloc() into the page >> structure itself, using the upper bits of the pp_magic field. This >> requires a couple of defines to avoid conflicting with the >> POINTER_POISON_DELTA define, but this is all evaluated at compile-time, >> so should not affect run-time performance. >> >> Since all the tracking is performed on DMA map/unmap, no additional code >> is needed in the fast path, meaning the performance overhead of this >> tracking is negligible. The extra memory needed to track the pages is >> neatly encapsulated inside xarray, which uses the 'struct xa_node' >> structure to track items. This structure is 576 bytes long, with slots >> for 64 items, meaning that a full node occurs only 9 bytes of overhead >> per slot it tracks (in practice, it probably won't be this efficient, >> but in any case it should be an acceptable overhead). >> >> [0] https://lore.kernel.org/all/CAHS8izPg7B5DwKfSuzz-iOop_YRbk3Sd6Y4rX7KBG9DcVJcyWg@xxxxxxxxxxxxxx/ >> >> Fixes: ff7d6b27f894 ("page_pool: refurbish version of page_pool code") >> Reported-by: Yonglong Liu <liuyonglong@xxxxxxxxxx> >> Suggested-by: Mina Almasry <almasrymina@xxxxxxxxxx> >> Reviewed-by: Jesper Dangaard Brouer <hawk@xxxxxxxxxx> >> Tested-by: Jesper Dangaard Brouer <hawk@xxxxxxxxxx> >> Signed-off-by: Toke Høiland-Jørgensen <toke@xxxxxxxxxx> > > I only have nits and suggestions for improvement. With and without those: > > Reviewed-by: Mina Almasry <almasrymina@xxxxxxxxxx> Thanks! Fixed your nits and a couple of others and pushed here: https://git.kernel.org/toke/c/df6248a71f85 I'll subject it to some testing and submit a non-RFC version once I've verified that it works and doesn't introduce any new problems :) -Toke