Re: Interaction between host-side mprotect() and KVM MMU

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tuesday, 21.05.2019 at 07:02, Sean Christopherson wrote:
> > Questions:
> > 
> > a. Is this the intended behaviour, and can it be relied on? Note that
> > KVM/aarch64 behaves the same for me.
> > 
> > b. Why does case (1) fail but case (2) succeed? I spent a day reading
> > through the KVM MMU code, but failed to understand how this is implemented.
> 
> Case (1) fails because KVM explicitly grabs WRITE permissions when
> retrieving the HPA.  See __gfn_to_pfn_memslot() and hva_to_pfn().
> Note, KVM also allows userspace to set a guest memslot as RO
> independent of mprotect().

Thanks for the pointers. I'm aware of the ability to set a memslot as RO,
but currently we use a single memslot + mprotect() as it suits our loader
architecture better (see below).

> Case (2) doesn't fault because KVM doesn't support execute protection,
> i.e. all pages are executable in the guest (at least on x86).  My guess
> is that execute protection isn't supported because there isn't a strong
> use case for traditional virtualization and so no one has gone through
> the effort to add NX support.  E.g. the vast majority of system memory
> can be dynamically allocated (for userspace code), which practically
> speaking leaves only the guest kernel's data sections, and marking those
> NX requires at a minimum:
> 
>   - knowing exactly what kernel will be loaded
>   - no ASLR in the physical domain
>   - no transient execution, e.g. in vBIOS or trampoline code

In the Solo5 case we're using hardware virtualization in a non-traditional
sense, as an isolation layer for a static guest (i.e. no changes to
physical memory layout or page protections after "boot"). The guest is
considered untrusted and all [*] the setup is performed by the loader/VMM
("tender" in our terminology), which has all the knowledge of what gets
loaded into the VM available up front. So your points above are not an
issue.

[*] well, almost all, the guest sets up its own IDT in order to report
exceptions and abort

> 
> > c. In order to enforce W^X both ways I'd like to have case (2) also fail
> > with EFAULT, is this possible?
> 
> Not without modifying KVM and the kernel (if you want to do it through
> mprotect()).

Hooking up the full EPT protection bits available to KVM via mprotect()
would be the best solution for us, and could also give us the ability to
have execute-only pages on x86, which is a nice defence against ROP attacks
in the guest. However, I can see now that this is not a trivial
undertaking, especially across the various MMU models (tdp, softmmu) and
architectures dealt with by the core KVM code.

N.B. We also have tender implementations for bhyve and OpenBSD vmm, and at
least in the OpenBSD case some community contributors are looking into
developing an "ept_mprotect" for precisely this use-case, though their vmm
code is much simpler (and does less) compared to KVM.

I take it there's no other way to mark a range of pages as NX by the guest
from the host side, so if we want this without modifying KVM and the
kernel, the only way to get it would be to set up "real" page tables inside
the guest ...?

Martin



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux