On Tue, Sep 10, 2024 at 05:22:51PM -0300, Jason Gunthorpe wrote: > On Tue, Sep 10, 2024 at 10:55:51AM +0000, Mostafa Saleh wrote: > > On Tue, Sep 03, 2024 at 08:33:40PM -0300, Jason Gunthorpe wrote: > > > On Tue, Sep 03, 2024 at 07:57:01AM +0000, Mostafa Saleh wrote: > > > > > > > Basically, I believe we shouldn’t set FWB blindly just because it’s supported, > > > > I don’t see how it’s useful for stage-2 only domains. > > > > > > And the only problem we can see is some niche scenario where incoming > > > memory attributes that are already requesting cachable combine to a > > > different kind of cachable? > > > > No, it’s not about the niche scenario, as I mentioned I don’t think > > we should enable FWB because it just exists. One can argue the opposite, > > if S2FWB is no different why enable it? > > Well, I'd argue that it provides more certainty for the kernel that > the DMA API behavior is matched by HW behavior. But I don't feel strongly. > > I adjusted the patch to only enable it for nesting parents. > > > AFAIU, FWB would be useful in cases where the hypervisor(or VMM) knows > > better than the VM, for example some devices MMIO space are emulated so > > they are normal memory and it’s more efficient to use memory attributes. > > Not quite, the purpose of FWB is to allow the hypervisor to avoid > costly cache flushing. It is specifically to protect the hypervisor > against a VM causing the caches to go incoherent. > > Caches that are unexpectedly incoherent are a security problem for the > hypervisor. I see, thanks for explaining, I got confused about the device emulation case, it’s also about corruption because of a mismatch of memory attributes, something like: https://bugzilla.redhat.com/show_bug.cgi?id=1679680 At the moment, I see KVM doesn’t really touch guest memory, but it does CMO for guest map(in case memslot had already some data) and on unmap, which I believe has significant performance improvement. > > > > > and we should only set FWB for coherent > > > > devices in nested setup only where the VMM(or hypervisor) knows better than > > > > the VM. > > > > > > I don't want to touch the 'only coherent devices' question. Last time > > > I tried to do that I got told every option was wrong. > > > > > > I would be fine to only enable for nesting parent domains. It is > > > mandatory here and we definitely don't support non-cachable nesting > > > today. Can we agree on that? > > > > Why is it mandatory? > > Because iommufd/vfio doesn't have cache flushing. > I see. > > I think a supporting point for this, is that KVM does the same for > > the CPU, where it enables FWB for VMs if supported. I have this on > > my list to study if that can be improved. But may be if we are out > > of options that would be a start. > > When KVM turns on S2FWB it stops doing cache flushing. As I understand > it S2FWB is significantly a performance optimization. > > On the VFIO side we don't have cache flushing at all. So enforcing > cache consistency is mandatory for security. > > For native VFIO we set IOMMU_CACHE and expect that the contract with > the IOMMU is that no cache flushing is required. > > For nested we set S2FWB/CANWBS to prevent the VM from disabling VFIO's > IOMMU_CACHE and again the contract with the HW is that no cache > flushing is required. > > Thus VFIO is security correct even though it doesn't cache flush. > > None of this has anything to do with device coherence capability. It > is why I keep saying incoherent devices must be blocked from VFIO > because it cannot operate them securely/correctly. > > Fixing that is a whole other topic, Yi has a series for it on x86 at > least.. I see, that makes sense to only support it for nested domains on the assumption they are only used for VFIO/IOMMUFD till we figure out non-coherent devices, I guess you are referring to: https://lore.kernel.org/all/ZltQ3PyHKiQmN9SU@xxxxxxxxxx/t/#me702dd242782393eb7769000c96702a0fed7f6ca Thanks, Mostafa > > Jason