Re: [PATCH v3 1/1] KVM: arm64: Allow cacheable stage 2 mapping using VMA flags

Catalin Marinas <catalin.marinas@xxxxxxx> · Mon, 17 Mar 2025 19:54:25 +0000

On Mon, Mar 17, 2025 at 09:27:52AM +0000, Marc Zyngier wrote:
> On Mon, 17 Mar 2025 05:55:55 +0000,
> Ankit Agrawal <ankita@xxxxxxxxxx> wrote:
> > 
> > >> For my education, what is an accepted way to communicate this? Please let
> > >> me know if there are any relevant examples that you may be aware of.
> > >
> > > A KVM capability is what is usually needed.
> > 
> > I see. If IIUC, this would involve a corresponding Qemu (usermode) change
> > to fetch the new KVM cap. Then it could fail in case the FWB is not
> > supported with some additional conditions (so that the currently supported
> > configs with !FWB won't break on usermode). 
> > 
> > The proposed code change is to map in S2 as NORMAL when vma flags
> > has VM_PFNMAP. However, Qemu cannot know that driver is mapping
> > with PFNMAP or not. So how may Qemu decide whether it is okay to
> > fail for !FWB or not?
> 
> This is not about FWB as far as userspace is concerned. This is about
> PFNMAP as non-device memory. If the host doesn't have FWB, then the
> "PFNMAP as non-device memory" capability doesn't exist, and userspace
> fails early.
> 
> Userspace must also have some knowledge of what device it obtains the
> mapping from, and whether that device requires some extra host
> capability to be assigned to the guest.
> 
> You can then check whether the VMA associated with the memslot is
> PFNMAP or not, if the memslot has been enabled for PFNMAP mappings
> (either globally or on a per-memslot basis, I don't really care).

Trying to page this back in, I think there are three stages:

1. A KVM cap that the VMM can use to check for non-device PFNMAP (or
   rather cacheable PFNMAP since we already support Normal NC).

2. Memslot registration - we need a way for the VMM to require such
   cacheable PFNMAP and for KVM to check. Current patch relies on (a)
   the stage 1 vma attributes which I'm not a fan of. An alternative I
   suggested was (b) a VM_FORCE_CACHEABLE vma flag, on the assumption
   that the vfio driver knows if it supports cacheable (it's a bit of a
   stretch trying to make this generic). Yet another option is (c) a
   KVM_MEM_CACHEABLE flag that the VMM passes at memslot registration.

3. user_mem_abort() - follows the above logic (whatever we decide),
   maybe with some extra check and WARN in case we got the logic wrong.

The problems in (2) are that we need to know that the device supports
cacheable mappings and we don't introduce additional issues or end up
with FWB on a PFNMAP that does not support cacheable. Without any vma
flag like the current VM_ALLOW_ANY_UNCACHED, the next best thing is
relying on the stage 1 attributes. But we don't know them at the memslot
registration, only later in step (3) after a GUP on the VMM address
space.

So in (2), when !FWB, we only want to reject VM_PFNMAP slots if we know
they are going to be mapped as cacheable. So we need this information
somehow, either from the vma->vm_flags or slot->flags.

-- 
Catalin