Avi, Marcelo, Jan? Any thoughts on this? Alex On 01.08.2012, at 05:17, Benjamin Herrenschmidt wrote: > Hi Avi ! > > We identified a problem on powerpc which seems to actually be a generic > issue, and Alex suggested we propose a generic fix. I want to make sure > we are on the right track first before proposing an actual patch as we > would like the patch to go in ASAP (ie not waiting the next merge > window) as it will fix an actual nasty bug with reset in KVM. > > So the basic issue has to do with doing a machine reset as a result of a > hypervisor call, but the same problem should happen with MMIO/PIO > emulation. > > After we do an exit as a result of such an operation, at the next > KVM_RUN, KVM will fetch the "results" of the operation (in the hypercall > case that's a bunch of register values, in the MMIO read emulation case > it's a single register value usually, x86 might have more subtle cases) > and we update the VCPU state (ie. registers) with that data. > > However, what happens is that if a reset happens in between, we end up > clobbering the reset state. > > IE. What happens in qemu is roughtly: > > - The hcall or MMIO that triggers the reset happens, goes to qemu, > which eventually calls qemu_system_reset_request() > > - This sets the global reset pending flag and wakes up the main loop. > It also does a stop of the current vcpu, so we do not return to the > kernel at this stage. > > - The main loop gets the flag, starts the reset process, which begins > with stopping all the VCPUs. > > - The reset handlers are called, which includes resetting the CPU > state, which in our case (powerpc) results in a SET_REGS ioctl to > establish a new fresh state for booting. > > - The generic code then restarts all VCPUs, which then return into > VCPU_RUN. > > - The VCPU(s) that did an exit as a result of MMIO emulation, > hypercall, or similiar (typically the one that triggered the reset but > possibly others) then gets some of their register state "updated" by the > result of the operation (in the hcall case, it's a field in the mmap'ed > run structure that clobbers GPR3 among others). > > Now this is generally not a big issue as -usually- machines don't care > much about the state of registers on reset. > > However it is for us, at least in a couple of cases because we build a > device-tree in qemu and we pass a pointer to it in GPR3 before we start > the firmware or the guest kernel. > > This pointer is set in env->gpr[3] which then gets written by a reset > handler to KVM with SET_REGS, but that ends up being clobbered > eventually by the return from the hypercall. > > My original idea to fix that was a ppc specific "trick" where we would > make SET_REGS clear the flags indicating that we had a pending hypercall > (or similar) and thus have SET_REGS have the side-effect of aborting a > current operation. > > However, Alex reckons we should instead have a generic ioctl > (VCPU_ABORT_OPERATION) which qemu could use when there's a reset, that > we call to generically abort all such operations (MMIO emu, hcalls, > etc...) on all archs when qemu decides to reset the processors. > > Alex and I have looked at various possible ways to try to fix that from > inside qemu but they are all extremely horrible & gross and might not > even work in the general case. > > Avi, before I actually do a patch (that adds the generic definition, > doc, and powerpc implementation), I'd like to confirm with you that this > is the right approach and that this can benefit from expedited > upstreaming :-) > > Cheers, > Ben. > > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html