On Tue, Mar 18, 2025 at 08:09:09PM -0300, Jason Gunthorpe wrote: > > It's far more problematic the other way around, e.g. the host knows that > > something needs a Device-* attribute and the VM has done something > > cacheable. The endpoint for that PA could, for example, fall over when > > lines pulled in by the guest are written back, which of course can't > > always be traced back to the offending VM. > > > > OTOH, if the host knows that a PA is cacheable and the guest does > > something non-cacheable, you 'just' have to deal with the usual > > mismatched attributes problem as laid out in the ARM ARM. > > I think the issue is that KVM doesn't do that usual stuff (ie cache > flushing) for memory that doesn't have a struct page backing. Indeed, I clearly paged that out. What I said is how we arrived at the Device-* v. Normal-NC distinction. > So nothing in the hypervisor does any cache flushing and I belive you > end up with a situation where the VMM could have zero'd this cachable > memory using cachable stores to sanitize it across VMs and then KVM > can put that memory into the VM as uncached and the VM could then > access stale non-zeroed data from a prior VM. Yes? This is a security > problem. Pedantic, but KVM only cares about cache maintenance in response to the primary MM, not the VMM. After a stage-2 mapping has been established userspace cannot expect KVM to do cache maintenance on its behalf. You have a very good point that KVM is broken for cacheable PFNMAP'd crap since we demote to something non-cacheable, and maybe that deserves fixing first. Hopefully nobody notices that we've taken away the toys... > So I think the logic we want here in the fault handler is to: > Get the mm's PTE > If it is cachable: > Check if it has a struct page: > Yes - KVM flushes it and can use a non-FWB path > No - KVM either fails to install it, or installs it using FWB > to force cachability. KVM never allows degrading cachable > to non-cachable when it can't do flushing. > Not cachable: > Install it with Normal-NC as was previously discussed and merged We still need to test the VMA flag here to select Normal-NC v. Device. > > Userspace should be stating intentions on the memslot with the sort of > > mapping that it wants to create, and a memslot flag to say "I allow > > cacheable mappings" seems to fit the bill. > > I'm not sure about this, I don't see that the userspace has any > choice. As above, KVM has to follow whatever is in the PTEs, the > userspace can't ask for something different here. At best you could > make non-struct page cachable memory always fail unless the flag is > given - but why? > > It seems sufficient for fast fail to check if the VMA has PFNMAP and > pgprot cachable then !FEAT_FWB fails the memslot. There is no real > recovery from this, the VMM is doing something that cannot be > supported. I'm less worried about recovery and more focused on userspace being able to understand what happened. Otherwise we may get folks complaining about the ioctl failing "randomly" on certain machines. But we might need to just expose the FWB-ness of the MMU to userspace since it can already encounter mismatched attributes when poking struct page-backed guest memory. Thanks, Oliver