On Tue, Aug 1, 2017 at 4:40 AM, Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote: > On 27/07/2017 19:19, Mihai Donțu wrote: >> On Thu, 2017-07-27 at 18:52 +0200, Paolo Bonzini wrote: >>> On 27/07/2017 18:23, Mihai Donțu wrote: >>>> On Thu, 2017-07-13 at 11:15 +0200, Paolo Bonzini wrote: >>>>> On 13/07/2017 10:36, Mihai Donțu wrote: >>>>>> On Fri, 2017-07-07 at 18:52 +0200, Paolo Bonzini wrote: >>>>>>> Worse, KVM is not able to distinguish userspace that has paused the VM >>>>>>> from userspace that is doing MMIO or userspace that has a bug and hung >>>>>>> somewhere. And even worse, there are cases where userspace wants to >>>>>>> modify registers while doing port I/O (the awful VMware RPCI port). So >>>>>>> I'd rather avoid this. >>>>>> >>>>>> I should give more details here: we don't need to pause the vCPU-s in >>>>>> the sense widely understood but just prevent them from entering the >>>>>> guest for a short period of time. In our particular case, we need this >>>>>> when we start introspecting a VM that's already running. For this we >>>>>> kick the vCPU-s out of the guest so that our scan of the memory does >>>>>> not race with the guest kernel/applications. >>>>>> >>>>>> Another use case is when we inject applications into a running guest. >>>>>> We also kick the vCPU-s out while we atomically make changes to kernel >>>>>> specific structures. >>>>> >>>>> This is not possible to do in KVM, because KVM doesn't control what >>>>> happens to the memory outside KVM_RUN (and of course it doesn't control >>>>> devices doing DMA). You need to talk to QEMU in order to do this. >>>> >>>> Maybe add a new exit reason (eg. KVM_EXIT_PAUSE) and have qemu wait on >>>> the already opened file descriptor to /dev/kvm for an event? >>> >>> Nope. QEMU might be running and writing to memory in another thread. I >>> don't see how this can be reliable on other hypervisors too, actually. >> >> I assume it largely depends on knowing what's possible to do and what >> not with the guest memory even while the vCPU-s are suspended. The >> price of breaking this rule will be something any KVMI user will have >> to be very aware of. > > If you actually pause the whole VM (through QEMU's monitor commands > "stop" and "cont") everything should be safe. Of course there can be > bugs and PCI passthrough devices should be problematic, but in general > the device emulation is quiescent. This however is not the case when > only the VCPUs are paused. IMHO for some use-cases it is sufficient to have the guest itself be limited in the modifications it makes to memory. So for example if just a vCPU is paused there are areas of memory that you can interact with without having to worry about it changing underneath the introspecting application (ie. thread-specific datastructures like the KPCR, etc..). If the introspecting application needs access to areas that non-paused vCPUs may touch, or QEMU, or a pass-through device, then it should be a decision for the introspecting app whether to pause the VM completely. It may still choose to instead do some error-detection on reads/writes to detect inconsistent accesses and perhaps just re-try the operation till it succeeds. This may have less of an impact on the performance of the VM as no full VM pause had to be performed. It is all very application specific, so having options is always a good thing. Tamas