On 2024/10/15 8:14, Jakub Kicinski wrote: > On Sat, 12 Oct 2024 20:05:31 +0800 Yunsheng Lin wrote: >> 1. Semantics changing of supporting unlimited inflight pages >> to limited inflight pages that are as large as the pool_size >> of page_pool. > > How can this possibly work? As a similar comment in [1], do we really need unlimited inflight pages for the page_pool to work? If we do, it seems there is something really need fixing here. I am agreed changing of semantics here might introduce regressions here because there may be some subsystem depending on the previous semantics or incorrect calculating of how many inflight pages it is needed, so I am agreed that it might be better to target the net-next tree to give some cycles of testing before backporting it. 1. https://lore.kernel.org/all/842c8cc6-f716-437a-bc98-70bc26d6fd38@xxxxxxxxxx/ > > The main thing stopping me from reposting my fix that it'd be nice to > figure out if a real IOMMU device is bound or not. If we don't have device_iommu_mapped() might be used to check if a real IOMMU device is bound or not. I am afraid it is not just about IOMMU here as there might be other resource behind the dma mapping, like the bounce buffer memory as below: https://elixir.bootlin.com/linux/v6.7-rc8/source/drivers/iommu/dma-iommu.c#L1204 https://elixir.bootlin.com/linux/v6.7-rc8/source/kernel/dma/direct.h#L125 And we may argue is_swiotlb_active() can be used check if there is any bounce buffer memory behind the dma mapping as the device_iommu_mapped() does, but I am not sure if there is any other resource besides iommu and bounce buffer. > real per-device mappings we presumably don't have to wait. If we can For not having to wait part: I am not sure if the page_pool_destroy()/__page_pool_release_page_dma() need to synchronize with arch_teardown_dma_ops() or how to synchronize with it, as it seems to be called when driver unloading even if we have ensured that there is no IOMMU or bounce buffer memory behind the device by the above checking: __device_release_driver -> device_unbind_cleanup -> arch_teardown_dma_ops > check this condition we are guaranteed not to introduce regressions, > since we would be replacing a crash by a wait, which is strictly better. For the waiting part: The problem is how much time we need to wait when device_iommu_mapped() or is_swiotlb_active() return true here, as mentioned in [2], [3]. And currently the waiting might be infinite as the testing in [4]. > > If we'd need to fiddle with too many internals to find out if we have > to wait - let's just always wait and see if anyone complains. Does the testing report in [4] classify as someone complaining? As the driver unloading seems to be stalling forever, and the cause of the infinite stalling is skb_attempt_defer_free() by debugging as mentioned in [2]. 2. https://lore.kernel.org/all/2c5ccfff-6ab4-4aea-bff6-3679ff72cc9a@xxxxxxxxxx/ 3. https://lore.kernel.org/netdev/d50ac1a9-f1e2-49ee-b89b-05dac9bc6ee1@xxxxxxxxxx/ 4. https://lore.kernel.org/netdev/758b4d47-c980-4f66-b4a4-949c3fc4b040@xxxxxxxxxx/ >