Re: [RFC PATCH v2 1/1] kvm: Add documentation and ABI/API header for VM introspection

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2017-07-13 at 11:15 +0200, Paolo Bonzini wrote:
> On 13/07/2017 10:36, Mihai Donțu wrote:
> > On Fri, 2017-07-07 at 18:52 +0200, Paolo Bonzini wrote:
> > > Worse, KVM is not able to distinguish userspace that has paused the VM
> > > from userspace that is doing MMIO or userspace that has a bug and hung
> > > somewhere.  And even worse, there are cases where userspace wants to
> > > modify registers while doing port I/O (the awful VMware RPCI port).  So
> > > I'd rather avoid this.
> > 
> > I should give more details here: we don't need to pause the vCPU-s in
> > the sense widely understood but just prevent them from entering the
> > guest for a short period of time. In our particular case, we need this
> > when we start introspecting a VM that's already running. For this we
> > kick the vCPU-s out of the guest so that our scan of the memory does
> > not race with the guest kernel/applications.
> > 
> > Another use case is when we inject applications into a running guest.
> > We also kick the vCPU-s out while we atomically make changes to kernel
> > specific structures.
> 
> This is not possible to do in KVM, because KVM doesn't control what
> happens to the memory outside KVM_RUN (and of course it doesn't control
> devices doing DMA).  You need to talk to QEMU in order to do this.

Maybe add a new exit reason (eg. KVM_EXIT_PAUSE) and have qemu wait on
the already opened file descriptor to /dev/kvm for an event?

> To do atomic changes to kernel specific structures, I would change the
> page tables to inaccessible instead, but that also doesn't protect them
> from devices doing DMA into them.

If we have qemu pull out of the guest all vCPU-s and wait for a sign
from the KVMI subsystem, then that looks sufficient. Devices acessing
the memory (passedthrough devices, I assume) should be no problem as
we're never interested in device memory.

> Another issue: say a VM is waiting for a reply from the introspector,
> and the reply is delayed so the VM gets a signal and needs to get out to
> QEMU with EINTR.  I don't think it is always possible to retry the
> instruction on the next KVM_RUN, because the introspector might be
> making partial changes.  Add live migration to the mix if you want to
> make things even more complicated. :)
> 
> I think we need a way to mark a set of commands for atomic application.
> That is, the structure of the command stream needs to be
> 
>     command 1
>     command 2
>     event reply 1
>     transaction end marker
>     command 3
>     transaction end marker
>     command 4
>     event reply 2
>     transaction end marker

This should be covered by a previous email exchange. Commands targeting
vCPU-s for which events are pending or are currently being handled
should be done via a structure that acts like a cache, so that when the
event reply reaches KVM, all potential modifications are applied in one
go. This affects all register manipulations. Commands targeting memory
are unaffected by the state of the vCPU, though this might change when
we factor EPT views. But, alas, we have no clear view on this last
topic as of yet.

> > > > +8. KVMI_GET_MTRR_TYPE
> > > > +---------------------
> > > 
> > > What is this used for?  KVM ignores the guest MTRRs, so if possible I'd
> > > rather avoid it.
> > 
> > We use it do identify cacheable memory which usually indicates device
> > memory, something we don't want to touch. We are also looking into
> > making use of the page attributes (PAT) or other PTE-bits instead of
> > MTRR, but for the time being MTRR-s are still being relied upon.
> 
> Fair enough.  But you can compute it yourself from the MTRRs, can't you?
> A separate command is just adding attack surface in the hypervisor.

I think we can make some basic MTRR info available via GET_REGISTERS
and do the rest in the introspection tool.

> > > > +14. KVMI_INJECT_BREAKPOINT
> > > > +--------------------------
> > > > +
> > > > +:Architectures: all
> > > > +:Versions: >= 1
> > > > +:Parameters: ↴
> > > > +
> > > > +::
> > > > +
> > > > +	struct kvmi_inject_breakpoint {
> > > > +		__u16 vcpu;
> > > > +		__u16 padding[3];
> > > > +	};
> > > > +
> > > > +:Returns: ↴
> > > > +
> > > > +::
> > > > +
> > > > +	struct kvmi_error_code {
> > > > +		__s32 err;
> > > > +		__u32 padding;
> > > > +	};
> > > > +
> > > > +Injects a breakpoint for the specified vCPU. This command is usually sent in
> > > > +response to an event and as such the proper GPR-s will be set with the reply.
> > > 
> > > What is a "breakpoint" in this context?
> > 
> > A simple INT3. It's what most of our patches consist of. We keep track
> > of them and handle the ones we own while pass (reinject) the ones used
> > by the guest (debuggers or whatnot).
> 
> Why can't they be written with KVMI_READ/WRITE_PHYSICAL?  (I would keep
> those two as they provide a more direct interface than map/unmap, and
> they work even with introspectors that are not sibling guests of the
> introspected VM).

They can, nothing is stopping that. Also, we can keep the plain
read/write interfaces around. It just seemed easier to implement them
on top of an eventual mmap/munmap interface.

-- 
Mihai Donțu




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux