On Fri, Oct 04, 2013 at 02:56:31PM +0200, Alexander Graf wrote: > > On 04.10.2013, at 14:33, Paul Mackerras wrote: > > > On Fri, Oct 04, 2013 at 01:59:25PM +0200, Alexander Graf wrote: > >> > >> On 04.10.2013, at 13:45, Paul Mackerras wrote: > >> > >>> When an interrupt or exception happens in the guest that comes to the > >>> host, the CPU goes to hypervisor real mode (MMU off) to handle the > >>> exception but doesn't change the MMU context. After saving a few > >>> registers, we then clear the "in guest" flag. If, for any reason, > >>> we get an exception in the real-mode code, that then gets handled > >>> by the normal kernel exception handlers, which turn the MMU on. This > >>> is disastrous if the MMU is still set to the guest context, since we > >>> end up executing instructions from random places in the guest kernel > >>> with hypervisor privilege. > >>> > >>> In order to catch this situation, we define a new value for the "in guest" > >>> flag, KVM_GUEST_MODE_HOST_HV, to indicate that we are in hypervisor real > >>> mode with guest MMU context. If the "in guest" flag is set to this value, > >>> we branch off to an emergency handler. For the moment, this just does > >>> a branch to self to stop the CPU from doing anything further. > >> > >> I don't understand how you get there. The only case I can imagine where you'd hit a normal Linux handler while in guest MMU context is a bug in the complex real mode handling code. > > > > A bug is the usual case. I think it is also possible (though very > > unlikely) to get a machine check interrupt, since they can come at any > > time. > > > >> So basically what you're doing is you're changing the "guest mode" bit to HOST_NV while you're executing these. > >> > >> The other change this patch does is it postpones the return to GUEST_MODE_NONE to after fast-path handling of interrupt exits. > >> > >> What if you simply don't introduce a new mode but instead only postpone the GUEST_MODE_NONE switch to later? Worst case that can happen is that your bug spins the CPU into handling that exit in a tight loop - not much different from your explicit spin, no? > > > > I did it like that so that we have a chance to save away the register > > state for the point where the exception happened separately from the > > guest state. It can be very useful for debugging to have both sets. > > The other thing of course is that if I did what you suggest and then > > happened not to hit the exception on the second time through, we would > > end up with corrupted guest state and no indication that it was > > corrupted (since the register state for the bad exception would get > > saved away in the vcpu struct). > > > > I admit I haven't written the code to save away the register state > > when one of these bad exceptions happens; that's partly because in the > > lab we have ways of getting the register state directly from the CPU, > > but I'm certainly intending to write that code soon. > > Fair enough, but I think doing that additional code when we only have a single register available and then even stall the CPU on a memory write to store away and load the state doesn't really help performance. That's what register renaming, branch prediction and speculative execution are for. :) > Either way, applied to ppc-next. Thanks, Paul. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html