Re: [RFC PATCH v2 1/1] kvm: Add documentation and ABI/API header for VM introspection

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 27/07/2017 18:23, Mihai Donțu wrote:
> On Thu, 2017-07-13 at 11:15 +0200, Paolo Bonzini wrote:
>> On 13/07/2017 10:36, Mihai Donțu wrote:
>>> On Fri, 2017-07-07 at 18:52 +0200, Paolo Bonzini wrote:
>>>> Worse, KVM is not able to distinguish userspace that has paused the VM
>>>> from userspace that is doing MMIO or userspace that has a bug and hung
>>>> somewhere.  And even worse, there are cases where userspace wants to
>>>> modify registers while doing port I/O (the awful VMware RPCI port).  So
>>>> I'd rather avoid this.
>>>
>>> I should give more details here: we don't need to pause the vCPU-s in
>>> the sense widely understood but just prevent them from entering the
>>> guest for a short period of time. In our particular case, we need this
>>> when we start introspecting a VM that's already running. For this we
>>> kick the vCPU-s out of the guest so that our scan of the memory does
>>> not race with the guest kernel/applications.
>>>
>>> Another use case is when we inject applications into a running guest.
>>> We also kick the vCPU-s out while we atomically make changes to kernel
>>> specific structures.
>>
>> This is not possible to do in KVM, because KVM doesn't control what
>> happens to the memory outside KVM_RUN (and of course it doesn't control
>> devices doing DMA).  You need to talk to QEMU in order to do this.
> 
> Maybe add a new exit reason (eg. KVM_EXIT_PAUSE) and have qemu wait on
> the already opened file descriptor to /dev/kvm for an event?

Nope.  QEMU might be running and writing to memory in another thread.  I
don't see how this can be reliable on other hypervisors too, actually.

>> To do atomic changes to kernel specific structures, I would change the
>> page tables to inaccessible instead, but that also doesn't protect them
>> from devices doing DMA into them.
> 
> If we have qemu pull out of the guest all vCPU-s and wait for a sign
> from the KVMI subsystem, then that looks sufficient. Devices acessing
> the memory (passedthrough devices, I assume) should be no problem as
> we're never interested in device memory.

You're certainly interested in bus-master DMA from those devices though.

>> Another issue: say a VM is waiting for a reply from the introspector,
>> and the reply is delayed so the VM gets a signal and needs to get out to
>> QEMU with EINTR.  I don't think it is always possible to retry the
>> instruction on the next KVM_RUN, because the introspector might be
>> making partial changes.  Add live migration to the mix if you want to
>> make things even more complicated. :)
>>
>> I think we need a way to mark a set of commands for atomic application.
>> That is, the structure of the command stream needs to be
>>
>>     command 1
>>     command 2
>>     event reply 1
>>     transaction end marker
>>     command 3
>>     transaction end marker
>>     command 4
>>     event reply 2
>>     transaction end marker
> 
> This should be covered by a previous email exchange.

Correct.

>>>>> +8. KVMI_GET_MTRR_TYPE
>>>>> +---------------------
>>>>
>>>> What is this used for?  KVM ignores the guest MTRRs, so if possible I'd
>>>> rather avoid it.
>>>
>>> We use it do identify cacheable memory which usually indicates device
>>> memory, something we don't want to touch. We are also looking into
>>> making use of the page attributes (PAT) or other PTE-bits instead of
>>> MTRR, but for the time being MTRR-s are still being relied upon.
>>
>> Fair enough.  But you can compute it yourself from the MTRRs, can't you?
>> A separate command is just adding attack surface in the hypervisor.
> 
> I think we can make some basic MTRR info available via GET_REGISTERS
> and do the rest in the introspection tool.

Ok.

>>>>> +Injects a breakpoint for the specified vCPU. This command is usually sent in
>>>>> +response to an event and as such the proper GPR-s will be set with the reply.
>>>>
>>>> What is a "breakpoint" in this context?
>>>
>>> A simple INT3. It's what most of our patches consist of. We keep track
>>> of them and handle the ones we own while pass (reinject) the ones used
>>> by the guest (debuggers or whatnot).
>>
>> Why can't they be written with KVMI_READ/WRITE_PHYSICAL?  (I would keep
>> those two as they provide a more direct interface than map/unmap, and
>> they work even with introspectors that are not sibling guests of the
>> introspected VM).
> 
> They can, nothing is stopping that. Also, we can keep the plain
> read/write interfaces around. It just seemed easier to implement them
> on top of an eventual mmap/munmap interface.

I prefer to keep the simple interface and drop the breakpoint one.

Paolo



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux