On 2024/11/21 21:44, Robin Murphy wrote: > On 21/11/2024 8:04 am, Yunsheng Lin wrote: >> On 2024/11/21 0:17, Robin Murphy wrote: >>> On 20/11/2024 10:34 am, Yunsheng Lin wrote: >>>> Skip dma sync operation for inflight pages before the >>>> page_pool_destroy() returns to the driver as DMA API >>>> expects to be called with a valid device bound to a >>>> driver as mentioned in [1]. >>>> >>>> After page_pool_destroy() is called, the page is not >>>> expected to be recycled back to pool->alloc cache and >>>> dma sync operation is not needed when the page is not >>>> recyclable or pool->ring is full, so only skip the dma >>>> sync operation for the infilght pages by clearing the >>>> pool->dma_sync under protection of rcu lock when page >>>> is recycled to pool->ring to ensure that there is no >>>> dma sync operation called after page_pool_destroy() is >>>> returned. >>> >>> Something feels off here - either this is a micro-optimisation which I wouldn't really expect to be meaningful, or it means patch #2 doesn't actually do what it claims. If it really is possible to attempt to dma_sync a page *after* page_pool_inflight_unmap() has already reclaimed and unmapped it, that represents yet another DMA API lifecycle issue, which as well as being even more obviously incorrect usage-wise, could also still lead to the same crash (if the device is non-coherent). >> >> For a page_pool owned page, it mostly goes through the below steps: >> 1. page_pool calls buddy allocator API to allocate a page, call DMA mapping >> and sync_for_device API for it if its pool is empty. Or reuse the page in >> pool. >> >> 2. Driver calls the page_pool API to allocate the page, and pass the page >> to network stack after packet is dma'ed into the page and the sync_for_cpu >> API is called. >> >> 3. Network stack is done with page and called page_pool API to free the page. >> >> 4. page_pool releases the page back to buddy allocator if the page is not >> recyclable before doing the dma unmaping. Or do the sync_for_device >> and put the page in the its pool, the page might go through step 1 >> again if the driver calls the page_pool allocate API. >> >> The calling of dma mapping and dma sync API is controlled by pool->dma_map >> and pool->dma_sync respectively, the previous patch only clear pool->dma_map >> after doing the dma unmapping. This patch ensures that there is no dma_sync >> for recycle case of step 4 by clearing pool->dma_sync. > > But *why* does it want to ensure that? Is there some possible race where one thread can attempt to sync and recycle a page while another thread is attempting to unmap and free it, such that you can't guarantee the correctness of dma_sync calls after page_pool_inflight_unmap() has started, and skipping them is a workaround for that? If so, then frankly I think that would want solving properly, but at the very least this change would need to come before patch #2. The racing window is something like below. page_pool_destroy() and page_pool_put_page() can be called concurrently, patch 2 only use a spinlock to synchronise page_pool_inflight_unmap() with page_pool_return_page() called by page_pool_put_page() to avoid concurrent dma unmapping, there is no synchronization between page_pool_destroy() and page_pool_dma_sync_for_device() called by page_pool_put_page(): CPU0 CPU1 . . page_pool_destroy() page_pool_put_page() . . synchronize_rcu() . . . page_pool_inflight_unmap() . . . . __page_pool_put_page() . . . page_pool_dma_sync_for_device() . . After this patch, page_pool_dma_sync_for_device() is protected by rcu lock and pool->dma_sync is cleared before synchronize_rcu and page_pool_inflight_unmap() is called after synchronize_rcu to ensure page_pool_dma_sync_for_device() will not call dma sync API after synchronize_rcu(): CPU0 CPU1 . . page_pool_destroy() CPU page_pool_put_page() CPU . . pool->dma_sync = false . . . synchronize_rcu() . . . page_pool_inflight_unmap() . . . . page_pool_recycle_in_ring() . . . rcu_read_lock() . page_pool_dma_sync_for_device() . rcu_read_unlock() Previously patch 2&3 was combined as one patch, this version splits it out to make it more reviewable. I am not sure if it matters that much about the patch order as the fix doesn't seem to be completed unless both patches are included.