Re: [RFC PATCH v2 0/1] VM introspection

Mihai Donțu <mdontu@xxxxxxxxxxxxxxx> · Mon, 07 Aug 2017 18:28:53 +0300

On Fri, 2017-07-07 at 19:29 +0200, Paolo Bonzini wrote:
> On 07/07/2017 16:34, Adalbert Lazar wrote:
> > One bit of code that has passed (maybe) unnoticed in the RFC is a new
> > function added to Linux' mm called vm_replace_page() which, much like KSM's
> > replace_page(), gets two processes to share a page (read-write, no-COW):
> > 
> > https://marc.info/?l=kvm&m=149762056518799&w=2
> > 
> > This is used to quickly scan and patch the guest software.
> 
> Thanks for pointing this out.
> 
> In my review of patch 1 I suggested using only read/write, but it's slow.
> 
> I think we need to figure out a safe way to map foreign memory, as I'm
> worried of TOC/TOU races for obvious reasons.

Would it be possible to describe the race you are referring to? I would
imagine that grabbing a hold of the task and/or mm descriptor would be
enough to ensure the gfn is available for mapping (and locking), though
it's true that qemu might (does it ever?) shuffle things around so the
introspector would end up with a mapping pointing to the wrong thing.

> One thing I was thinking about (but didn't have much time to completely
> think through) is a special /dev/kvmmem device, where you could do
> 
>     kvmmem_fd = open("/dev/kvmmem", O_RDWR);
>     ptr = ioctl(kvmmem_fd, KVMMEM_MAP_MEMORY, { token, size });
>     ioctl(kvmmem_fd, KVMMEM_UNMAP_MEMORY, { ptr, size });

The idea with a device solves one of our problems with
KVMI_MAP_PHYSICAL_PAGE_TO_GUEST: the introspector has to undo the
mapping, otherwise the VM will crash when the kernel reuses the gfn.
This does not happen if the introspector crashes ...

We can make it so that when kvmmem_fd is closed, all mappings are
automatically undone.

> The map/unmap memory operation would be a hypercall, not a socket
> command, but the random "token" would be returned on the socket via some
> KVMI_MAP_PHYSICAL_PAGE_TO_GUEST command (or more accurately, a
> replacement accepting {gpa, size} instead of {gpa, gfn_dest}).  Handles
> can be short lived, e.g. you could have at most a small number tokens
> per host created (and passed back via KVMI) but not yet used by the
> hypercall.  Once it's used by the hypercall, the token is not needed
> anymore, so this is not a strong limitation.

I understand that the token is used to make sure the mapping hypercall
is only used by VM-s and applications doing introspection.

> After KVMMEM_MAP_MEMORY, you'd get a SIGSEGV if the guest memory layout
> changes (userfaultfd can be used by the introspector to simplify the
> handling and retry). You'd have to re-map the memory explicitly.

"guest memory layout changes" - is this the guest _being_ introspected?
If so, how would an introspector running in a separate VM get a
SIGSEGV? Have KVM inject a page fault if the host encounters a #PF for
the VMA representing the foreign mapping (presumably it knows the gva
as well)?

> Alas I have no idea how to verify the handle securely on the host, since
> the host is not supposed to know which guests are introspectors and
> which host got which token.  But maybe if the token namespace is big
> enough (256 bits?) and random, it's okay to ignore the possibility that
> a guest tries to guess.  (This idea is roughly based on how SCSI
> offloaded copies work).
> 
> Andy, does it look like utter BS or could it have some merit?

-- 
Mihai Donțu