> From: Alex Williamson <alex.williamson@xxxxxxxxxx> > Sent: Friday, June 4, 2021 5:44 AM > > > Based on that observation we can say as soon as the user wants to use > > an IOMMU that does not support DMA_PTE_SNP in the guest we can still > > share the IO page table with IOMMUs that do support DMA_PTE_SNP. page table sharing between incompatible IOMMUs is not a critical thing. I prefer to disallowing sharing in such case as the starting point, i.e. the user needs to create separate IOASIDs for such devices. > > If your goal is to prioritize IO page table sharing, sure. But because > we cannot atomically transition from one to the other, each device is > stuck with the pages tables it has, so the history of the VM becomes a > factor in the performance characteristics. > > For example if device {A} is backed by an IOMMU capable of blocking > no-snoop and device {B} is backed by an IOMMU which cannot block > no-snoop, then booting VM1 with {A,B} and later removing device {B} > would result in ongoing wbinvd emulation versus a VM2 only booted with > {A}. > > Type1 would use separate IO page tables (domains/ioasids) for these such > that VM1 and VM2 have the same characteristics at the end. > > Does this become user defined policy in the IOASID model? There's > quite a mess of exposing sufficient GET_INFO for an IOASID for the user > to know such properties of the IOMMU, plus maybe we need mapping flags > equivalent to IOMMU_CACHE exposed to the user, preventing sharing an > IOASID that could generate IOMMU faults, etc. IOMMU_CACHE is a fixed attribute given an IOMMU. So it's better to convey this info to userspace via GET_INFO for a device_label, before creating any IOASID. But overall I agree that careful thinking is required about how to organize those info reporting (per-fd, per-device, per-ioasid) to userspace. > > > > > It doesn't solve the problem to connect kvm to AP and kvmgt though > > > > > > It does not, we'll probably need a vfio ioctl to gratuitously announce > > > the KVM fd to each device. I think some devices might currently fail > > > their open callback if that linkage isn't already available though, so > > > it's not clear when that should happen, ie. it can't currently be a > > > VFIO_DEVICE ioctl as getting the device fd requires an open, but this > > > proposal requires some availability of the vfio device fd without any > > > setup, so presumably that won't yet call the driver open callback. > > > Maybe that's part of the attach phase now... I'm not sure, it's not > > > clear when the vfio device uAPI starts being available in the process > > > of setting up the ioasid. Thanks, > > > > At a certain point we maybe just have to stick to backward compat, I > > think. Though it is useful to think about green field alternates to > > try to guide the backward compat design.. > > I think more to drive the replacement design; if we can't figure out > how to do something other than backwards compatibility trickery in the > kernel, it's probably going to bite us. Thanks, > I'm a bit lost on the desired flow in your minds. Here is one flow based on my understanding of this discussion. Please comment whether it matches your thinking: 0) ioasid_fd is created and registered to KVM via KVM_ADD_IOASID_FD; 1) Qemu binds dev1 to ioasid_fd; 2) Qemu calls IOASID_GET_DEV_INFO for dev1. This will carry IOMMU_ CACHE info i.e. whether underlying IOMMU can enforce snoop; 3) Qemu plans to create a gpa_ioasid, and attach dev1 to it. Here Qemu needs to figure out whether dev1 wants to do no-snoop. This might be based a fixed vendor/class list or specified by user; 4) gpa_ioasid = ioctl(ioasid_fd, IOASID_ALLOC); At this point a 'snoop' flag is specified to decide the page table format, which is supposed to match dev1; 5) Qemu attaches dev1 to gpa_ioasid via VFIO_ATTACH_IOASID. At this point, specify snoop/no-snoop again. If not supported by related iommu or different from what gpa_ioasid has, attach fails. 6) call KVM to update the snoop requirement via KVM_UPADTE_IOASID_FD. this triggers ioasidfd_for_each_ioasid(); later when dev2 is attached to gpa_ioasid, same flow is followed. This implies that KVM_UPDATE_IOASID_FD is called only when new IOASID is created or existing IOASID is destroyed, because all devices under an IOASID should have the same snoop requirement. Thanks Kevin