On Thu, Feb 10, 2022 at 12:15:58PM +0100, Niklas Schnelle wrote: > In a KVM or z/VM guest the guest is informed that IOMMU translations > need to be refreshed even for previously invalid IOVAs. With this the > guest builds it's IOMMU translation tables as normal but then does a > RPCIT for the IOVA range it touched. In the hypervisor we can then > simply walk the translation tables, pin the guest pages and map them in > the host IOMMU. Prior to this series this happened in QEMU which does > the map via vfio-iommu-type1 from user-space. This works and will > remain as a fallback. Sadly it is quite slow and has a large impact on > performance as we need to do a lot of mapping operations as the DMA API > of the guest goes through the virtual IOMMU. This series thus adds the > same functionality but as a KVM intercept of RPCIT. Now I think this > neatly fits into KVM, we're emulating an instruction after all and most > of its work is KVM specific pinning of guest pages. Importantly all > other handling like IOMMU domain attachment still goes through vfio- > iommu-type1 and we just fast path the map/unmap operations. So you create an iommu_domain and then hand it over to kvm which then does map/unmap operations on it under the covers? How does the page pinning work? In the design we are trying to reach I would say this needs to be modeled as a special iommu_domain that has this automatic map/unmap behavior from following user pages. Creating it would specify the kvm and the in-guest base address of the guest's page table. Then the magic kernel code you describe can operate on its own domain without becoming confused with a normal map/unmap domain. It is like the HW nested translation other CPUs are doing, but instead of HW nested, it is SW nested. Jason