Re: [PATCH 2/2] KVM: PPC: Book3S HV: Better handling of exceptions that happen in real mode

Alexander Graf <agraf@xxxxxxx> · Fri, 4 Oct 2013 14:56:31 +0200

On 04.10.2013, at 14:33, Paul Mackerras wrote:

> On Fri, Oct 04, 2013 at 01:59:25PM +0200, Alexander Graf wrote:
>> 
>> On 04.10.2013, at 13:45, Paul Mackerras wrote:
>> 
>>> When an interrupt or exception happens in the guest that comes to the
>>> host, the CPU goes to hypervisor real mode (MMU off) to handle the
>>> exception but doesn't change the MMU context.  After saving a few
>>> registers, we then clear the "in guest" flag.  If, for any reason,
>>> we get an exception in the real-mode code, that then gets handled
>>> by the normal kernel exception handlers, which turn the MMU on.  This
>>> is disastrous if the MMU is still set to the guest context, since we
>>> end up executing instructions from random places in the guest kernel
>>> with hypervisor privilege.
>>> 
>>> In order to catch this situation, we define a new value for the "in guest"
>>> flag, KVM_GUEST_MODE_HOST_HV, to indicate that we are in hypervisor real
>>> mode with guest MMU context.  If the "in guest" flag is set to this value,
>>> we branch off to an emergency handler.  For the moment, this just does
>>> a branch to self to stop the CPU from doing anything further.
>> 
>> I don't understand how you get there. The only case I can imagine where you'd hit a normal Linux handler while in guest MMU context is a bug in the complex real mode handling code.
> 
> A bug is the usual case.  I think it is also possible (though very
> unlikely) to get a machine check interrupt, since they can come at any
> time.
> 
>> So basically what you're doing is you're changing the "guest mode" bit to HOST_NV while you're executing these.
>> 
>> The other change this patch does is it postpones the return to GUEST_MODE_NONE to after fast-path handling of interrupt exits.
>> 
>> What if you simply don't introduce a new mode but instead only postpone the GUEST_MODE_NONE switch to later? Worst case that can happen is that your bug spins the CPU into handling that exit in a tight loop - not much different from your explicit spin, no?
> 
> I did it like that so that we have a chance to save away the register
> state for the point where the exception happened separately from the
> guest state.  It can be very useful for debugging to have both sets.
> The other thing of course is that if I did what you suggest and then
> happened not to hit the exception on the second time through, we would
> end up with corrupted guest state and no indication that it was
> corrupted (since the register state for the bad exception would get
> saved away in the vcpu struct).
> 
> I admit I haven't written the code to save away the register state
> when one of these bad exceptions happens; that's partly because in the
> lab we have ways of getting the register state directly from the CPU,
> but I'm certainly intending to write that code soon.

Fair enough, but I think doing that additional code when we only have a single register available and then even stall the CPU on a memory write to store away and load the state doesn't really help performance.

Either way, applied to ppc-next.

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html