Hi Will, As a follow-up of the VFIO/IOMMU/PCI "Dual Stage SMMUv3 Status" session, please find some further justifications about the SMMUv3 nested stage enablement series. In the text below, I only talk about use cases featuring VFIO assigned devices where the physical IOMMU is actually involved. The virtio-iommu solution, as currently specified, is expected to work efficiently as long as guest IOMMU mappings are static. This hopefully actually corresponds to the DPDK use case. The overhead of trapping on each MAP/UNMAP is then close to 0. I see 2 main use cases where guest uses dynamic mappings: 1) native drivers using DMA ops are used on the guest 2) shared virtual address on guest. 1) can be addressed with current virtio-iommu spec. However the performance will be very poor: it behaves as Intel IOMMU with the driver operating with caching mode and strict mode set (80% perf downgrade is observed versus no iommu). This use case can be tested very easily. Dual stage implementation should bring much better results here. 2) natural implementation for that is nested. Jean planned to introduce extensions to the current virtio-iommu spec to setup stage 1 config. As far as I understand this will require the exact same SMMUv3 driver modifications I introduced in my series. If this happens, after the specification process, the virtio-iommu driver upgrade, the virtio-iommu QEMU device upgrade, we will face the same problematics as the ones encountered in my series. This use case cannot be tested easily. There are in-flight series to support substream IDs in the SMMU driver and SVA/ARM but none of that code is upstream. Also I don't know if there is any PASID capable device easily available at the moment. So during the uC you said you would prefer this use case to be addressed first but according to me, this brings a lot of extra complexity and dependencies and the above series are also stalled due to that exact same issue. HW nested paging should satisfy all use cases including guest static mappings. At the moment it is difficult to run comparative benchmarks. First you may know virtio-iommu also suffer some FW integration delays, its QEMU VFIO integration needs to be rebased. Also I have access to some systems that feature a dual stage SMMUv3 but I am not sure their cache/TLB structures are dimensionned for exercising the 2 stages (that's a chicken and egg issue: no SW integration, no HW). If you consider those use cases are not sufficient to invest time now, I have no problem pausing this development. We can re-open the topic later when actual users show up, are interested to review and test with production HW and workloads. Of course if there are any people/company interested in getting this upstream in a decent timeframe, that's the right moment to let us know! Thanks Eric References: [1] [PATCH v9 00/11] SMMUv3 Nested Stage Setup (IOMMU part) https://patchwork.kernel.org/cover/11039871/ [2] [PATCH v9 00/14] SMMUv3 Nested Stage Setup (VFIO part) https://patchwork.kernel.org/cover/11039995/ _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/mailman/listinfo/kvmarm