On Tue, Aug 20, 2024 at 05:21:38PM -0300, Jason Gunthorpe wrote: > On Tue, Aug 20, 2024 at 07:52:53PM +0000, Mostafa Saleh wrote: > > On Tue, Aug 20, 2024 at 09:01:02AM -0300, Jason Gunthorpe wrote: > > > On Tue, Aug 20, 2024 at 08:30:05AM +0000, Mostafa Saleh wrote: > > > > Hi Jason, > > > > > > > > On Tue, Aug 06, 2024 at 08:41:15PM -0300, Jason Gunthorpe wrote: > > > > > Force Write Back (FWB) changes how the S2 IOPTE's MemAttr field > > > > > works. When S2FWB is supported and enabled the IOPTE will force cachable > > > > > access to IOMMU_CACHE memory and deny cachable access otherwise. > > > > > > > > > > This is not especially meaningful for simple S2 domains, it apparently > > > > > doesn't even force PCI no-snoop access to be coherent. > > > > > > > > > > However, when used with a nested S1, FWB has the effect of preventing the > > > > > guest from choosing a MemAttr that would cause ordinary DMA to bypass the > > > > > cache. Consistent with KVM we wish to deny the guest the ability to become > > > > > incoherent with cached memory the hypervisor believes is cachable so we > > > > > don't have to flush it. > > > > > > > > > > Turn on S2FWB whenever the SMMU supports it and use it for all S2 > > > > > mappings. > > > > > > > > I have been looking into this recently from the KVM side as it will > > > > use FWB for the CPU stage-2 unconditionally for guests(if supported), > > > > however that breaks for non-coherent devices when assigned, and > > > > limiting assigned devices to be coherent seems too restrictive. > > > > > > kvm's CPU S2 doesn't care about non-DMA-coherent devices though? That > > > concept is only relevant to the SMMU. > > > > Why not? That would be a problem if a device is not dma coherent, > > and the VM knows that and maps it’s DMA memory as non cacheable. > > But it would be overridden by FWB in stage-2 to be cacheable, > > it would lead to coherency issues. > > Oh, from that perspective yes, but the entire point of S2FWB is that > VM's can not create non-coherent access so it is a bit nonsense to ask > for both S2FWB and try to assign a non-DMA coherent device. Yes, but KVM sets FWB unconditionally and would use cacheable mapping for stage-2, and I expect the same for the nested SMMU. > > > Yes, that also breaks (although I think this is an easier problem to > > solve) > > Well, it is easy to solve, just don't use S2FWB and manually flush the > caches before the hypervisor touches any memory. :) Yes, although that means virtualized devices would have worse performance :/ but I guess there is nothing more to do here. I have some ideas about that, I can send patches to the kvm list as an RFC. > > > What I mean is the master itself not the SMMU (the SID basically), > > so in that case the STE shouldn’t have FWB enabled. > > That doesn't matter, those cases will not pass in IOMMU_CACHE and they > will work fine with S2FWB turned on. > But that won’t be the case in nested? Otherwise why we use FWB in the first place. > > > Also bear in mind VFIO won't run unless ARM_SMMU_FEAT_COHERENCY is set > > > so we won't even get a chance to ask for a S2 domain. > > > > Oh, I think that is only for the SMMU, not for the master, the > > SMMU can be coherent (for pte, ste …) but the master can still be > > non coherent. Looking at how VFIO uses it, that seems to be a bug? > > If there are mixes of SMMU feature and dev_is_dma_coherent() then it > would be a bug yes.. > I think there is a bug, I was able to assign a “non-coherent” device with VFIO with no issues, and it allows it as long as the SMMU is coherent. > I recall we started out trying to use dev_is_dma_coherent() but > Christoph explained it doesn't work that generally: > > https://lore.kernel.org/kvm/20220406135150.GA21532@xxxxxx/ > > Seems we sort of gave up on it, too complicated. Robin had a nice > observation of the complexity: > > Disregarding the complete disaster of PCIe No Snoop on Arm-Based > systems, there's the more interesting effectively-opposite scenario > where an SMMU bridges non-coherent devices to a coherent interconnect. > It's not something we take advantage of yet in Linux, and it can only be > properly described in ACPI, but there do exist situations where > IOMMU_CACHE is capable of making the device's traffic snoop, but > dev_is_dma_coherent() - and device_get_dma_attr() for external users - > would still say non-coherent because they can't assume that the SMMU is > enabled and programmed in just the right way. > > Anyhow, for the purposes of KVM and VFIO, devices that don't work with > IOMMU_CACHE are not allowed. From an API perspective > IOMMU_CAP_CACHE_COHERENCY is supposed to return if the struct device > can use IOMMU_CACHE. > > The corner case where we have a ARM_SMMU_FEAT_COHERENCY SMMU but > somehow specific devices don't support IOMMU_CACHE is not properly > reflected in IOMMU_CAP_CACHE_COHERENCY. I don't know how to fix that, > and we've been ignoring it for a long time now :) Thanks a lot for the extra context! Maybe the SMMUv3 .capable, should be changed to check if the device is coherent (instead of using dev_is_dma_coherent, it can use lower level functions from the supported buses) Also, I think supporting IOMMU_CACHE is not enough, as the SMMU can support it but the device is still not coherent. Thanks, Mostafa > > Jason