[Cc +Paolo] On Fri, 4 Jun 2021 09:28:30 -0300 Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > On Fri, Jun 04, 2021 at 08:38:26AM +0000, Tian, Kevin wrote: > > > I think more to drive the replacement design; if we can't figure out > > > how to do something other than backwards compatibility trickery in the > > > kernel, it's probably going to bite us. Thanks, > > > > I'm a bit lost on the desired flow in your minds. Here is one flow based > > on my understanding of this discussion. Please comment whether it > > matches your thinking: > > > > 0) ioasid_fd is created and registered to KVM via KVM_ADD_IOASID_FD; > > > > 1) Qemu binds dev1 to ioasid_fd; > > > > 2) Qemu calls IOASID_GET_DEV_INFO for dev1. This will carry IOMMU_ > > CACHE info i.e. whether underlying IOMMU can enforce snoop; > > > > 3) Qemu plans to create a gpa_ioasid, and attach dev1 to it. Here Qemu > > needs to figure out whether dev1 wants to do no-snoop. This might > > be based a fixed vendor/class list or specified by user; > > > > 4) gpa_ioasid = ioctl(ioasid_fd, IOASID_ALLOC); At this point a 'snoop' > > flag is specified to decide the page table format, which is supposed > > to match dev1; > > > 5) Qemu attaches dev1 to gpa_ioasid via VFIO_ATTACH_IOASID. At this > > point, specify snoop/no-snoop again. If not supported by related > > iommu or different from what gpa_ioasid has, attach fails. > > Why do we need to specify it again? My thought as well. > If the IOASID was created with the "block no-snoop" flag then it is > blocked in that IOASID, and that blocking sets the page table format. > > The only question is if we can successfully attach a device to the > page table, or not. > > The KVM interface is a bit tricky because Alex said this is partially > security, wbinvd is only enabled if someone has a FD to a device that > can support no-snoop. > > Personally I think this got way too complicated, the KVM interface > should simply be > > ioctl(KVM_ALLOW_INCOHERENT_DMA, ioasidfd, device_label) > ioctl(KVM_DISALLOW_INCOHERENT_DMA, ioasidfd, device_label) > > and let qemu sort it out based on command flags, detection, whatever. > > 'ioasidfd, device_label' is the security proof that Alex asked > for. This needs to be some device in the ioasidfd that declares it is > capabale of no-snoop. Eg vfio_pci would always declare it is capable > of no-snoop. > > No kernel call backs, no kernel auto-sync/etc. If qemu mismatches the > IOASID block no-snoop flag with the KVM_x_INCOHERENT_DMA state then it > is just a kernel-harmless uerspace bug. > > Then user space can decide which of the various axis's it wants to > optimize for. Let's make sure the KVM folks are part of this decision; a re-cap for them, KVM currently automatically enables wbinvd emulation when potentially non-coherent devices are present which is determined solely based on the IOMMU's (or platform's, as exposed via the IOMMU) ability to essentially force no-snoop transactions from a device to be cache coherent. This synchronization is triggered via the kvm-vfio device, where QEMU creates the device and adds/removes vfio group fd descriptors as an additionally layer to prevent the user from enabling wbinvd emulation on a whim. IIRC, this latter association was considered a security/DoS issue to prevent a malicious guest/userspace from creating a disproportionate system load. Where would KVM stand on allowing more direct userspace control of wbinvd behavior? Would arbitrary control be acceptable or should we continue to require it only in association to a device requiring it for correct operation. A wrinkle in "correct operation" is that while the IOMMU may be able to force no-snoop transactions to be coherent, in the scenario described in the previous reply, the user may intend to use non-coherent DMA regardless of the IOMMU capabilities due to their own optimization policy. There's a whole spectrum here, including aspects we can't determine around the device driver's intentions to use non-coherent transactions, the user's policy in trading hypervisor overhead for cache coherence overhead, etc. Thanks, Alex