On 08/02/2012 03:59 PM, Alexander Graf wrote: > > On 02.08.2012, at 14:35, Avi Kivity wrote: > >> On 08/01/2012 06:17 AM, Benjamin Herrenschmidt wrote: >>> Hi Avi ! >>> >>> We identified a problem on powerpc which seems to actually be a generic >>> issue, and Alex suggested we propose a generic fix. I want to make sure >>> we are on the right track first before proposing an actual patch as we >>> would like the patch to go in ASAP (ie not waiting the next merge >>> window) as it will fix an actual nasty bug with reset in KVM. >>> >>> So the basic issue has to do with doing a machine reset as a result of a >>> hypervisor call, but the same problem should happen with MMIO/PIO >>> emulation. >>> >>> After we do an exit as a result of such an operation, at the next >>> KVM_RUN, KVM will fetch the "results" of the operation (in the hypercall >>> case that's a bunch of register values, in the MMIO read emulation case >>> it's a single register value usually, x86 might have more subtle cases) >>> and we update the VCPU state (ie. registers) with that data. >>> >>> However, what happens is that if a reset happens in between, we end up >>> clobbering the reset state. >>> >>> IE. What happens in qemu is roughtly: >>> >>> - The hcall or MMIO that triggers the reset happens, goes to qemu, >>> which eventually calls qemu_system_reset_request() >>> >>> - This sets the global reset pending flag and wakes up the main loop. >>> It also does a stop of the current vcpu, so we do not return to the >>> kernel at this stage. >>> >>> - The main loop gets the flag, starts the reset process, which begins >>> with stopping all the VCPUs. >>> >>> - The reset handlers are called, which includes resetting the CPU >>> state, which in our case (powerpc) results in a SET_REGS ioctl to >>> establish a new fresh state for booting. >>> >>> - The generic code then restarts all VCPUs, which then return into >>> VCPU_RUN. >>> >>> - The VCPU(s) that did an exit as a result of MMIO emulation, >>> hypercall, or similiar (typically the one that triggered the reset but >>> possibly others) then gets some of their register state "updated" by the >>> result of the operation (in the hcall case, it's a field in the mmap'ed >>> run structure that clobbers GPR3 among others). >>> >>> Now this is generally not a big issue as -usually- machines don't care >>> much about the state of registers on reset. >>> >> >> This is actually documented in api.txt, though not in relation to reset: >> >> NOTE: For KVM_EXIT_IO, KVM_EXIT_MMIO and KVM_EXIT_OSI, the >> corresponding operations are complete (and guest state is consistent) >> only after userspace has re-entered the kernel with KVM_RUN. The >> kernel side will first finish incomplete operations and then check >> for pending signals. Userspace can re-enter the guest with an >> unmasked signal pending to complete pending operations. >> >> For x86 the issue was with live migration - you can't copy guest >> register state in the middle of an I/O operation. Reset is actually >> similar, but it involves writing state (which can then be overwritten) >> instead of reading it. > > Yeah, we stumbled over this chunk as well. So you're saying we should delay the reset by invoking a self-signal if we're in such an operation? Yes. Qemu of course already supports this for migration, so it should be easy to add. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html