> From: Jason Gunthorpe <jgg@xxxxxxxxxx> > Sent: Thursday, October 14, 2021 11:43 PM > > > > > I think the key is whether other archs allow driver to decide DMA > > > > coherency and indirectly the underlying I/O page table format. > > > > If yes, then I don't see a reason why such decision should not be > > > > given to userspace for passthrough case. > > > > > > The choice all comes down to if the other arches have cache > > > maintenance instructions in the VM that *don't work* > > > > Looks vfio always sets IOMMU_CACHE on all platforms as long as > > iommu supports it (true on all platforms except intel iommu which > > is dedicated for GPU): > > > > vfio_iommu_type1_attach_group() > > { > > ... > > if (iommu_capable(bus, IOMMU_CAP_CACHE_COHERENCY)) > > domain->prot |= IOMMU_CACHE; > > ... > > } > > > > Should above be set according to whether a device is coherent? > > For IOMMU_CACHE there are two questions related to the overloaded > meaning: > > - Should VFIO ask the IOMMU to use non-coherent DMA (ARM meaning) > This depends on how the VFIO user expects to operate the DMA. > If the VFIO user can issue cache maintenance ops then IOMMU_CACHE > should be controlled by the user. I have no idea what platforms > support user space cache maintenance ops. But just like you said for intel meaning below, even if those ops are privileged a uAPI can be provided to support such usage if necessary. > > - Should VFIO ask the IOMMU to suppress no-snoop (Intel meaning) > This depends if the VFIO user has access to wbinvd or not. > > wbinvd is a privileged instruction so normally userspace will not > be able to access it. > > Per Paolo recommendation there should be a uAPI someplace that > allows userspace to issue wbinvd - basically the suppress no-snoop > is also user controllable. > > The two things are very similar and ultimately are a choice userspace > should be making. yes > > From something like a qemu perspective things are more murkey - eg on > ARM qemu needs to co-ordinate with the guest. Whatever IOMMU_CACHE > mode VFIO is using must match the device coherent flag in the Linux > guest. I'm guessing all Linux guest VMs only use coherent DMA for all > devices today. I don't know if the cache maintaince ops are even > permitted in an ARM VM. > I'll leave it to Jean to confirm. If only coherent DMA can be used in the guest on other platforms, suppose VFIO should not blindly set IOMMU_CACHE and in concept it should deny assigning a non-coherent device since no co-ordination with guest exists today. So the bottomline is that we'll keep this no-snoop thing Intel-specific. For the basic skeleton we'll not support no-snoop thus the user needs to set enforce-snoop flag when creating an IOAS like this RFC v1 does. Also need to introduce a new flag instead of abusing IOMMU_CACHE in the kernel. For other platforms it may need a fix to deny non-coherent device (based on above open) for now. Thanks Kevin