Re: [RFC PATCH v2 0/1] VM introspection

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 07/08/2017 17:28, Mihai Donțu wrote:
> On Fri, 2017-07-07 at 19:29 +0200, Paolo Bonzini wrote:
>> On 07/07/2017 16:34, Adalbert Lazar wrote:
>>> One bit of code that has passed (maybe) unnoticed in the RFC is a new
>>> function added to Linux' mm called vm_replace_page() which, much like KSM's
>>> replace_page(), gets two processes to share a page (read-write, no-COW):
>>>
>>> https://marc.info/?l=kvm&m=149762056518799&w=2
>>>
>>> This is used to quickly scan and patch the guest software.
>>
>> Thanks for pointing this out.
>>
>> In my review of patch 1 I suggested using only read/write, but it's slow.
>>
>> I think we need to figure out a safe way to map foreign memory, as I'm
>> worried of TOC/TOU races for obvious reasons.
> 
> Would it be possible to describe the race you are referring to? I would
> imagine that grabbing a hold of the task and/or mm descriptor would be
> enough to ensure the gfn is available for mapping (and locking), though
> it's true that qemu might (does it ever?) shuffle things around so the
> introspector would end up with a mapping pointing to the wrong thing.

Yes, QEMU can shuffle things around.  Usually it's BARs that are
shuffled around, but there can be special cases where RAM changes in the
memory map (for example SMRAM).

>> One thing I was thinking about (but didn't have much time to completely
>> think through) is a special /dev/kvmmem device, where you could do
>>
>>     kvmmem_fd = open("/dev/kvmmem", O_RDWR);
>>     ptr = ioctl(kvmmem_fd, KVMMEM_MAP_MEMORY, { token, size });
>>     ioctl(kvmmem_fd, KVMMEM_UNMAP_MEMORY, { ptr, size });
> 
> The idea with a device solves one of our problems with
> KVMI_MAP_PHYSICAL_PAGE_TO_GUEST: the introspector has to undo the
> mapping, otherwise the VM will crash when the kernel reuses the gfn.
> This does not happen if the introspector crashes ...
> 
> We can make it so that when kvmmem_fd is closed, all mappings are
> automatically undone.

Yes, that's a nice side effect.

>> The map/unmap memory operation would be a hypercall, not a socket
>> command, but the random "token" would be returned on the socket via some
>> KVMI_MAP_PHYSICAL_PAGE_TO_GUEST command (or more accurately, a
>> replacement accepting {gpa, size} instead of {gpa, gfn_dest}).  Handles
>> can be short lived, e.g. you could have at most a small number tokens
>> per host created (and passed back via KVMI) but not yet used by the
>> hypercall.  Once it's used by the hypercall, the token is not needed
>> anymore, so this is not a strong limitation.
> 
> I understand that the token is used to make sure the mapping hypercall
> is only used by VM-s and applications doing introspection.

Yes.

>> After KVMMEM_MAP_MEMORY, you'd get a SIGSEGV if the guest memory layout
>> changes (userfaultfd can be used by the introspector to simplify the
>> handling and retry). You'd have to re-map the memory explicitly.
> 
> "guest memory layout changes" - is this the guest _being_ introspected?

Yes.

> If so, how would an introspector running in a separate VM get a
> SIGSEGV? Have KVM inject a page fault if the host encounters a #PF for
> the VMA representing the foreign mapping (presumably it knows the gva
> as well)?

When the foreign mapping is invalidated by the actions of the guest
being introspected, the pages become inaccessible in the EPT page tables
of the introspector VM.

Then, an access in the introspector VM will result in an EPT violation,
which KVM can translate to a page fault---or actually a #VE would be
even better.  (Even if you don't modify KVM to use a hardware #VE, KVM's
MMU can choose to inject the exception as a #VE rather than #PF).

That said, I think this should come later, and for the beginning only
read/write should be included.

Paolo

>> Alas I have no idea how to verify the handle securely on the host, since
>> the host is not supposed to know which guests are introspectors and
>> which host got which token.  But maybe if the token namespace is big
>> enough (256 bits?) and random, it's okay to ignore the possibility that
>> a guest tries to guess.  (This idea is roughly based on how SCSI
>> offloaded copies work).



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux