On Thu, Feb 10, 2022 at 03:06:35PM +0100, Niklas Schnelle wrote: > > How does the page pinning work? > > The pinning is done directly in the RPCIT interception handler pinning > both the IOMMU tables and the guest pages mapped for DMA. And if pinning fails? > > Then the > > magic kernel code you describe can operate on its own domain without > > becoming confused with a normal map/unmap domain. > > This sounds like an interesting idea. Looking at > drivers/iommu/s390_iommu.c most of that is pretty trivial domain > handling. I wonder if we could share this by marking the existing > s390_iommu_domain type with kind of a "lent out to KVM" flag. Lu has posted a series here: https://lore.kernel.org/linux-iommu/20220208012559.1121729-1-baolu.lu@xxxxxxxxxxxxxxx Which allows the iommu driver to create a domain with unique ops, so you'd just fork the entire thing, have your own struct s390_kvm_iommu_domain and related ops. When the special creation flow is triggered you'd just create one of these with the proper ops already setup. We are imagining a special ioctl to create these things and each IOMMU HW driver can supply a unique implementation suited to their HW design. > KVM RPCIT intercept and vice versa. I.e. while the domain is under > control of KVM's RPCIT handling we make all IOMMU map/unmap fail. It is not "under the control of" the domain would be created as linked to kvm and would never, ever, be anything else. > To me this more direct involvement of IOMMU and KVM on s390x is also a > direct consequence of it using special instructions. Naturally those > instructions can be intercepted or run under hardware accelerated > virtualization. Well, no, you've just created a kernel-side SW emulated nested translation scheme. Other CPUs have talked about doing this too, but nobody has attempted it. You can make the same argument for any CPU's scheme, a trapped mmio store is not fundamentally any different from a special instruction that traps, other than how the information is transferred. > Yes very good analogy. Has any of that nested IOMMU translations work > been merged yet? No. We are making quiet progress, slowly though. I'll add your interest to my list > too. Basically we would then execute RPCIT without leaving the > hardware virtualization mode (SIE). We believe that that would > require pinning all of guest memory though because HW can't really > pin pages. Right, this is what other iommu HW will have to do. Jason