On Fri, Jun 04, 2021 at 08:38:26AM +0000, Tian, Kevin wrote: > > I think more to drive the replacement design; if we can't figure out > > how to do something other than backwards compatibility trickery in the > > kernel, it's probably going to bite us. Thanks, > > I'm a bit lost on the desired flow in your minds. Here is one flow based > on my understanding of this discussion. Please comment whether it > matches your thinking: > > 0) ioasid_fd is created and registered to KVM via KVM_ADD_IOASID_FD; > > 1) Qemu binds dev1 to ioasid_fd; > > 2) Qemu calls IOASID_GET_DEV_INFO for dev1. This will carry IOMMU_ > CACHE info i.e. whether underlying IOMMU can enforce snoop; > > 3) Qemu plans to create a gpa_ioasid, and attach dev1 to it. Here Qemu > needs to figure out whether dev1 wants to do no-snoop. This might > be based a fixed vendor/class list or specified by user; > > 4) gpa_ioasid = ioctl(ioasid_fd, IOASID_ALLOC); At this point a 'snoop' > flag is specified to decide the page table format, which is supposed > to match dev1; > 5) Qemu attaches dev1 to gpa_ioasid via VFIO_ATTACH_IOASID. At this > point, specify snoop/no-snoop again. If not supported by related > iommu or different from what gpa_ioasid has, attach fails. Why do we need to specify it again? If the IOASID was created with the "block no-snoop" flag then it is blocked in that IOASID, and that blocking sets the page table format. The only question is if we can successfully attach a device to the page table, or not. The KVM interface is a bit tricky because Alex said this is partially security, wbinvd is only enabled if someone has a FD to a device that can support no-snoop. Personally I think this got way too complicated, the KVM interface should simply be ioctl(KVM_ALLOW_INCOHERENT_DMA, ioasidfd, device_label) ioctl(KVM_DISALLOW_INCOHERENT_DMA, ioasidfd, device_label) and let qemu sort it out based on command flags, detection, whatever. 'ioasidfd, device_label' is the security proof that Alex asked for. This needs to be some device in the ioasidfd that declares it is capabale of no-snoop. Eg vfio_pci would always declare it is capable of no-snoop. No kernel call backs, no kernel auto-sync/etc. If qemu mismatches the IOASID block no-snoop flag with the KVM_x_INCOHERENT_DMA state then it is just a kernel-harmless uerspace bug. Then user space can decide which of the various axis's it wants to optimize for. Jason