Re: [PATCH net v2 2/2] page_pool: fix IOMMU crash when driver has already unbound

Yunsheng Lin <yunshenglin0825@xxxxxxxxx> · Wed, 2 Oct 2024 10:34:34 +0800

On 10/1/2024 9:32 PM, Paolo Abeni wrote:
On 9/25/24 09:57, Yunsheng Lin wrote:
Networking driver with page_pool support may hand over page
still with dma mapping to network stack and try to reuse that
page after network stack is done with it and passes it back
to page_pool to avoid the penalty of dma mapping/unmapping.
With all the caching in the network stack, some pages may be
held in the network stack without returning to the page_pool
soon enough, and with VF disable causing the driver unbound,
the page_pool does not stop the driver from doing it's
unbounding work, instead page_pool uses workqueue to check
if there is some pages coming back from the network stack
periodically, if there is any, it will do the dma unmmapping
related cleanup work.

As mentioned in [1], attempting DMA unmaps after the driver
has already unbound may leak resources or at worst corrupt
memory. Fundamentally, the page pool code cannot allow DMA
mappings to outlive the driver they belong to.

Currently it seems there are at least two cases that the page
is not released fast enough causing dma unmmapping done after
driver has already unbound:
1. ipv4 packet defragmentation timeout: this seems to cause
    delay up to 30 secs.
2. skb_defer_free_flush(): this may cause infinite delay if
    there is no triggering for net_rx_action().

In order not to do the dma unmmapping after driver has already
unbound and stall the unloading of the networking driver, add
the pool->items array to record all the pages including the ones
which are handed over to network stack, so the page_pool can
do the dma unmmapping for those pages when page_pool_destroy()
is called. As the pool->items need to be large enough to avoid
performance degradation, add a 'item_full' stat to indicate the
allocation failure due to unavailability of pool->items.

This looks really invasive, with room for potentially large performance 
regressions or worse. At very least it does not look suitable for net.

I am open to targetting this to net-next, it can be backported when some
testing is done through one or two kernel versions and there is still
some interest to backport it too.

Or if there is some non-invasive way to fix this.

Is the problem only tied to VFs drivers? It's a pity all the page_pool 
users will have to pay a bill for it...

I am afraid it is not only tied to VFs drivers, as:
attempting DMA unmaps after the driver has already unbound may leak
resources or at worst corrupt memory.

Unloading PFs driver might cause the above problems too, I guess the
probability of crashing is low for the PF as PF can not be disable
unless it can be hot-unplug'ed, but the probability of leaking resources
behind the dma mapping might be similar.

/P