On Wed, Feb 12, 2025 at 1:34 AM Yunsheng Lin <linyunsheng@xxxxxxxxxx> wrote: > > Networking driver with page_pool support may hand over page > still with dma mapping to network stack and try to reuse that > page after network stack is done with it and passes it back > to page_pool to avoid the penalty of dma mapping/unmapping. > With all the caching in the network stack, some pages may be > held in the network stack without returning to the page_pool > soon enough, and with VF disable causing the driver unbound, > the page_pool does not stop the driver from doing it's > unbounding work, instead page_pool uses workqueue to check > if there is some pages coming back from the network stack > periodically, if there is any, it will do the dma unmmapping > related cleanup work. > > As mentioned in [1], attempting DMA unmaps after the driver > has already unbound may leak resources or at worst corrupt > memory. Fundamentally, the page pool code cannot allow DMA > mappings to outlive the driver they belong to. > > Currently it seems there are at least two cases that the page > is not released fast enough causing dma unmmapping done after > driver has already unbound: > 1. ipv4 packet defragmentation timeout: this seems to cause > delay up to 30 secs. > 2. skb_defer_free_flush(): this may cause infinite delay if > there is no triggering for net_rx_action(). > > In order not to call DMA APIs to do DMA unmmapping after driver > has already unbound and stall the unloading of the networking > driver, use some pre-allocated item blocks to record inflight > pages including the ones which are handed over to network stack, > so the page_pool can do the DMA unmmapping for those pages when > page_pool_destroy() is called. As the pre-allocated item blocks > need to be large enough to avoid performance degradation, add a > 'item_fast_empty' stat to indicate the unavailability of the > pre-allocated item blocks. > > By using the 'struct page_pool_item' referenced by page->pp_item, > page_pool is not only able to keep track of the inflight page to > do dma unmmaping if some pages are still handled in networking > stack when page_pool_destroy() is called, and networking stack is > also able to find the page_pool owning the page when returning > pages back into page_pool: > 1. When a page is added to the page_pool, an item is deleted from > pool->hold_items and set the 'pp_netmem' pointing to that page > and set item->state and item->pp_netmem accordingly in order to > keep track of that page, refill from pool->release_items when > pool->hold_items is empty or use the item from pool->slow_items > when fast items run out. > 2. When a page is released from the page_pool, it is able to tell > which page_pool this page belongs to by masking off the lower > bits of the pointer to page_pool_item *item, as the 'struct > page_pool_item_block' is stored in the top of a struct page. And > after clearing the pp_item->state', the item for the released page > is added back to pool->release_items so that it can be reused for > new pages or just free it when it is from the pool->slow_items. > 3. When page_pool_destroy() is called, item->state is used to tell if > a specific item is being used/dma mapped or not by scanning all the > item blocks in pool->item_blocks, then item->netmem can be used to > do the dma unmmaping if the corresponding inflight page is dma > mapped. > > The overhead of tracking of inflight pages is about 10ns~20ns, > which causes about 10% performance degradation for the test case > of time_bench_page_pool03_slow() in [2]. > > Note, the devmem patchset seems to make the bug harder to fix, > and may make backporting harder too. As there is no actual user > for the devmem and the fixing for devmem is unclear for now, > this patch does not consider fixing the case for devmem yet. > > 1. https://lore.kernel.org/lkml/8067f204-1380-4d37-8ffd-007fc6f26738@xxxxxxxxxx/T/ > 2. https://github.com/netoptimizer/prototype-kernel > CC: Robin Murphy <robin.murphy@xxxxxxx> > CC: Alexander Duyck <alexander.duyck@xxxxxxxxx> > CC: IOMMU <iommu@xxxxxxxxxxxxxxx> > Fixes: f71fec47c2df ("page_pool: make sure struct device is stable") > Signed-off-by: Yunsheng Lin <linyunsheng@xxxxxxxxxx> > Tested-by: Yonglong Liu <liuyonglong@xxxxxxxxxx> [...] > + > +/* The size of item_block is always PAGE_SIZE, so that the address of item_block > + * for a specific item can be calculated using 'item & PAGE_MASK' > + */ > +struct page_pool_item_block { > + struct page_pool *pp; > + struct list_head list; > + struct page_pool_item items[]; > +}; > + I think this feedback was mentioned in earlier iterations of the series: Can we not hold a struct list_head in the page_pool that keeps track of inflight netmems that we need to dma-unmap on page_pool_destroy? Why do we have to modify the pp entry in the struct page and struct net_iov? The decision to modify pp entry in struct page and struct net_iov is making this patchset bigger and harder to review IMO. -- Thanks, Mina