On Tue, Aug 20, 2024 at 07:52:53PM +0000, Mostafa Saleh wrote: > On Tue, Aug 20, 2024 at 09:01:02AM -0300, Jason Gunthorpe wrote: > > On Tue, Aug 20, 2024 at 08:30:05AM +0000, Mostafa Saleh wrote: > > > Hi Jason, > > > > > > On Tue, Aug 06, 2024 at 08:41:15PM -0300, Jason Gunthorpe wrote: > > > > Force Write Back (FWB) changes how the S2 IOPTE's MemAttr field > > > > works. When S2FWB is supported and enabled the IOPTE will force cachable > > > > access to IOMMU_CACHE memory and deny cachable access otherwise. > > > > > > > > This is not especially meaningful for simple S2 domains, it apparently > > > > doesn't even force PCI no-snoop access to be coherent. > > > > > > > > However, when used with a nested S1, FWB has the effect of preventing the > > > > guest from choosing a MemAttr that would cause ordinary DMA to bypass the > > > > cache. Consistent with KVM we wish to deny the guest the ability to become > > > > incoherent with cached memory the hypervisor believes is cachable so we > > > > don't have to flush it. > > > > > > > > Turn on S2FWB whenever the SMMU supports it and use it for all S2 > > > > mappings. > > > > > > I have been looking into this recently from the KVM side as it will > > > use FWB for the CPU stage-2 unconditionally for guests(if supported), > > > however that breaks for non-coherent devices when assigned, and > > > limiting assigned devices to be coherent seems too restrictive. > > > > kvm's CPU S2 doesn't care about non-DMA-coherent devices though? That > > concept is only relevant to the SMMU. > > Why not? That would be a problem if a device is not dma coherent, > and the VM knows that and maps it’s DMA memory as non cacheable. > But it would be overridden by FWB in stage-2 to be cacheable, > it would lead to coherency issues. Oh, from that perspective yes, but the entire point of S2FWB is that VM's can not create non-coherent access so it is a bit nonsense to ask for both S2FWB and try to assign a non-DMA coherent device. > Yes, that also breaks (although I think this is an easier problem to > solve) Well, it is easy to solve, just don't use S2FWB and manually flush the caches before the hypervisor touches any memory. :) > What I mean is the master itself not the SMMU (the SID basically), > so in that case the STE shouldn’t have FWB enabled. That doesn't matter, those cases will not pass in IOMMU_CACHE and they will work fine with S2FWB turned on. > > Also bear in mind VFIO won't run unless ARM_SMMU_FEAT_COHERENCY is set > > so we won't even get a chance to ask for a S2 domain. > > Oh, I think that is only for the SMMU, not for the master, the > SMMU can be coherent (for pte, ste …) but the master can still be > non coherent. Looking at how VFIO uses it, that seems to be a bug? If there are mixes of SMMU feature and dev_is_dma_coherent() then it would be a bug yes.. I recall we started out trying to use dev_is_dma_coherent() but Christoph explained it doesn't work that generally: https://lore.kernel.org/kvm/20220406135150.GA21532@xxxxxx/ Seems we sort of gave up on it, too complicated. Robin had a nice observation of the complexity: Disregarding the complete disaster of PCIe No Snoop on Arm-Based systems, there's the more interesting effectively-opposite scenario where an SMMU bridges non-coherent devices to a coherent interconnect. It's not something we take advantage of yet in Linux, and it can only be properly described in ACPI, but there do exist situations where IOMMU_CACHE is capable of making the device's traffic snoop, but dev_is_dma_coherent() - and device_get_dma_attr() for external users - would still say non-coherent because they can't assume that the SMMU is enabled and programmed in just the right way. Anyhow, for the purposes of KVM and VFIO, devices that don't work with IOMMU_CACHE are not allowed. From an API perspective IOMMU_CAP_CACHE_COHERENCY is supposed to return if the struct device can use IOMMU_CACHE. The corner case where we have a ARM_SMMU_FEAT_COHERENCY SMMU but somehow specific devices don't support IOMMU_CACHE is not properly reflected in IOMMU_CAP_CACHE_COHERENCY. I don't know how to fix that, and we've been ignoring it for a long time now :) Jason