Re: [PATCH 2/2] KVM: PPC: Book3S HV: Better handling of exceptions that happen in real mode

Paul Mackerras <paulus@xxxxxxxxx> · Fri, 4 Oct 2013 22:33:29 +1000

On Fri, Oct 04, 2013 at 01:59:25PM +0200, Alexander Graf wrote:
> 
> On 04.10.2013, at 13:45, Paul Mackerras wrote:
> 
> > When an interrupt or exception happens in the guest that comes to the
> > host, the CPU goes to hypervisor real mode (MMU off) to handle the
> > exception but doesn't change the MMU context.  After saving a few
> > registers, we then clear the "in guest" flag.  If, for any reason,
> > we get an exception in the real-mode code, that then gets handled
> > by the normal kernel exception handlers, which turn the MMU on.  This
> > is disastrous if the MMU is still set to the guest context, since we
> > end up executing instructions from random places in the guest kernel
> > with hypervisor privilege.
> > 
> > In order to catch this situation, we define a new value for the "in guest"
> > flag, KVM_GUEST_MODE_HOST_HV, to indicate that we are in hypervisor real
> > mode with guest MMU context.  If the "in guest" flag is set to this value,
> > we branch off to an emergency handler.  For the moment, this just does
> > a branch to self to stop the CPU from doing anything further.
> 
> I don't understand how you get there. The only case I can imagine where you'd hit a normal Linux handler while in guest MMU context is a bug in the complex real mode handling code.

A bug is the usual case.  I think it is also possible (though very
unlikely) to get a machine check interrupt, since they can come at any
time.

> So basically what you're doing is you're changing the "guest mode" bit to HOST_NV while you're executing these.
> 
> The other change this patch does is it postpones the return to GUEST_MODE_NONE to after fast-path handling of interrupt exits.
> 
> What if you simply don't introduce a new mode but instead only postpone the GUEST_MODE_NONE switch to later? Worst case that can happen is that your bug spins the CPU into handling that exit in a tight loop - not much different from your explicit spin, no?

I did it like that so that we have a chance to save away the register
state for the point where the exception happened separately from the
guest state.  It can be very useful for debugging to have both sets.
The other thing of course is that if I did what you suggest and then
happened not to hit the exception on the second time through, we would
end up with corrupted guest state and no indication that it was
corrupted (since the register state for the bad exception would get
saved away in the vcpu struct).

I admit I haven't written the code to save away the register state
when one of these bad exceptions happens; that's partly because in the
lab we have ways of getting the register state directly from the CPU,
but I'm certainly intending to write that code soon.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html