Re: Reset problem vs. MMIO emulation, hypercalls, etc...

Avi Kivity <avi@xxxxxxxxxx> · Thu, 02 Aug 2012 16:05:53 +0300

On 08/02/2012 03:59 PM, Alexander Graf wrote:
> 
> On 02.08.2012, at 14:35, Avi Kivity wrote:
> 
>> On 08/01/2012 06:17 AM, Benjamin Herrenschmidt wrote:
>>> Hi Avi !
>>> 
>>> We identified a problem on powerpc which seems to actually be a generic
>>> issue, and Alex suggested we propose a generic fix. I want to make sure
>>> we are on the right track first before proposing an actual patch as we
>>> would like the patch to go in ASAP (ie not waiting the next merge
>>> window) as it will fix an actual nasty bug with reset in KVM.
>>> 
>>> So the basic issue has to do with doing a machine reset as a result of a
>>> hypervisor call, but the same problem should happen with MMIO/PIO
>>> emulation.
>>> 
>>> After we do an exit as a result of such an operation, at the next
>>> KVM_RUN, KVM will fetch the "results" of the operation (in the hypercall
>>> case that's a bunch of register values, in the MMIO read emulation case
>>> it's a single register value usually, x86 might have more subtle cases)
>>> and we update the VCPU state (ie. registers) with that data.
>>> 
>>> However, what happens is that if a reset happens in between, we end up
>>> clobbering the reset state.
>>> 
>>> IE. What happens in qemu is roughtly:
>>> 
>>> - The hcall or MMIO that triggers the reset happens, goes to qemu,
>>> which eventually calls qemu_system_reset_request()
>>> 
>>> - This sets the global reset pending flag and wakes up the main loop.
>>> It also does a stop of the current vcpu, so we do not return to the
>>> kernel at this stage.
>>> 
>>> - The main loop gets the flag, starts the reset process, which begins
>>> with stopping all the VCPUs.
>>> 
>>> - The reset handlers are called, which includes resetting the CPU
>>> state, which in our case (powerpc) results in a SET_REGS ioctl to
>>> establish a new fresh state for booting.
>>> 
>>> - The generic code then restarts all VCPUs, which then return into
>>> VCPU_RUN.
>>> 
>>> - The VCPU(s) that did an exit as a result of MMIO emulation,
>>> hypercall, or similiar (typically the one that triggered the reset but
>>> possibly others) then gets some of their register state "updated" by the
>>> result of the operation (in the hcall case, it's a field in the mmap'ed
>>> run structure that clobbers GPR3 among others).
>>> 
>>> Now this is generally not a big issue as -usually- machines don't care
>>> much about the state of registers on reset.
>>> 
>> 
>> This is actually documented in api.txt, though not in relation to reset:
>> 
>>  NOTE: For KVM_EXIT_IO, KVM_EXIT_MMIO and KVM_EXIT_OSI, the
>>  corresponding operations are complete (and guest state is consistent)
>>  only after userspace has re-entered the kernel with KVM_RUN.  The
>>  kernel side will first finish incomplete operations and then check
>>  for pending signals.  Userspace can re-enter the guest with an
>>  unmasked signal pending to complete pending operations.
>> 
>> For x86 the issue was with live migration - you can't copy guest
>> register state in the middle of an I/O operation.  Reset is actually
>> similar, but it involves writing state (which can then be overwritten)
>> instead of reading it.
> 
> Yeah, we stumbled over this chunk as well. So you're saying we should delay the reset by invoking a self-signal if we're in such an operation?

Yes.  Qemu of course already supports this for migration, so it should
be easy to add.

-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html