On Mon, 2017-08-07 at 17:56 +0200, Paolo Bonzini wrote: > On 07/08/2017 16:12, Mihai Donțu wrote: > > On Mon, 2017-08-07 at 15:49 +0200, Paolo Bonzini wrote: > > > On 07/08/2017 15:25, Mihai Donțu wrote: > > > > > "Pause all VCPUs and stop all DMA" would definitely be a layering > > > > > violation, so it cannot be added. > > > > > > > > > > "Pause all VCPUs" is basically a shortcut for many "pause the VCPU with > > > > > a given id" commands. I lean towards omitting it. > > > > > > > > The case where the introspector wants to scan the guest memory needs a > > > > KVMI_PAUSE_VM, which as discussed in a previous email, can be the > > > > actual qemu 'pause' command. > > > > > > Do you mean it needs to stop DMA as well? > > > > No, DMA can proceed normally. I remain of the opinion that KVMI users > > must know what guest memory ranges are OK to access by looking at MTRR- > > s, PAT or guest kernel structures, or a combination of all three. > > Ok, good. Sorry if I am dense on the DMA/no-DMA cases. I think it's OK to restate things, especially since my (our) view on these matters might not match the KVM reality that you know far better. > (But, I don't understand your remark about guest memory ranges; the point of > bus-master DMA is that it works on any memory, and cache snooping makes > it even easier for hypothetical malware to do memory writes via > bus-master DMA). This is where I separated things in my head: I merely limited myself to accessing memory, while leaving the reality of DMA-based attacks a problem to be solved separately. There is some reasearch work being tested internally on that, but I have yet to get in touch with the people involved in it. As soon as I get some details maybe we can connect something in KVM. > > > > However, we would like to limit the > > > > communication channels we have with the host and not use qmp (or > > > > libvirt/etc. if qmp is not exposed). Instead, have a command that > > > > triggers a KVM_RUN exit to qemu which in turn will call the underlying > > > > pause function used by qmp. Would that be OK with you? > > > > > > You would have to send back something on completion, and then I am > > > worried of races and deadlocks. Plus, pausing a VM at the QEMU level is > > > a really expensive operation, so I don't think it's a good idea to let > > > the introspector do this. You can pause all VCPUs, or use memory page > > > permissions. > > > > Pausing all vCPU-s was my first thought, I was just trying to follow > > your statement: "I lean towards omitting it". :-) > > Yes, and I still do because a hypothetical "pause all VCPUs" command > still has the issue that you could get other events before the command > completes. So I am not convinced that a specialized command actually > makes the introspector code much simpler. > > I hope you understand that I want to keep the trusted base (not just the > code I maintain, though that is a secondary benefit ;)) as simple as > possible. > > > It will take a bit of user-space-fu, in that after issuing N vCPU pause > > commands in a row we will have to wait for N events, which might race > > with other events (MSR, CRx etc.) which need handling otherwise the > > pause ones will not arrive > > The same issue would be there in QEMU or KVM though. > > If you can always request "pause all vCPUs" from an event handler, > avoiding deadlocks is relatively easy. If you cannot ensure that, for > example because of work that is scheduled periodically, you can send a > KVM_PAUSE command to ensure the work is done in a safe condition. > > Then you get the following pseudocode algorithm: > > // a vCPU is not executing guest code, and it's going to check > // num_pause_vm_requests before going back to guest code > vcpu_not_running(id) { > unmark vCPU "id" as running > if (num vcpus running == 0) > cond_broadcast(no_running_cpus) > } > > pause_vcpu(id) { > mark vCPU "id" as being-paused > send KVMI_PAUSED for the vcpu > } > > // return only when no vCPU is in KVM_RUN > pause_vm() { > if this vCPU is running > if not in an event handler > // caller should do pause_vcpu and defer the work > return > > // we know this vCPU is not KVM_RUN > vcpu_not_running() > > num_pause_vm_requests++ > if (num vcpus running > 0) > for each vCPU that is running and not being-paused > pause_vcpu(id) > while (num vcpus running > 0) > cond_wait(no_running_vcpus) > } > > // tell paused vCPUs that they can resume > resume_vm() { > num_pause_vm_requests-- > if (num_pause_all_requests == 0) > cond_broadcast(no_pending_pause_vm_requests) > // either we're in an event handler, or a "pause" command was > // sent for this vCPU. in any case we're guaranteed to do an > // event_reply sooner or later, which will again mark the vCPU > // as running > } > > // after an event reply, the vCPU goes back to KVM_RUN. therefore > // an event reply can act as a synchronization point for pause-vm > // requests: delay the reply if there's such a request > event_reply(id, data) { > if (num_pause_vm_requests > 0) { > if vCPU "id" is running > vcpu_not_running(id) > while (num_pause_vm_requests > 0) > cond_wait(no_pending_pause_vm_requests) > } > mark vCPU "id" as running > send event reply on KVMI socket > } > > // this is what you do when KVM tells you that the guest is either > // in userspace, or waiting to be woken up ("paused" event). from > // the introspector POV the two are the same. > vcpu_ack_pause(id) { > vcpu_not_running(id) > unmark vCPU "id" as being-paused > > // deferred work presumably calls pause_vm/resumve_vm, and this > // vCPU is not running now, so this is a nice point to flush it > if any deferred work exists, do it now > } > > and on the KVMI read handler: > > on reply to "pause" command: > if reply says the vCPU is currently in userspace > // we'll get a KVMI_PAUSED event as soon as the host > // reenters KVM with KVM_RUN, but we can already say the > // CPU is not running > vcpu_ack_pause() > > on "paused" event: > vcpu_ack_pause() > event_reply() Thank you for this! -- Mihai Donțu