On Sat, 2009-10-03 at 20:59 +1000, Benjamin Herrenschmidt wrote: > On Sat, 2009-10-03 at 12:08 +0200, Avi Kivity wrote: > > > > So these MSRs can be modified by the hypervisor? Otherwise you'd cache > > them in the guest with no hypervisor involvement, right? (just making > > sure :) > > There's one MSR :-) Among others, it can be altered by the act of > taking an interrupt (for example, it contains the PR bit, which means > user vs. supervisor, things like that). For a bit more context... On PowerPC, all those "special" registers are called "SPR"s (special registers, surprise ! :-) They are generally accessed via mfspr/mtspr instructions that encode the SPR number, though some of them can also have decicated instructions or be set as a side effect of some instructions or events etc... MSR is a bit special here because it's not per-se an SPR. It's the Machine State Register, in the core, it's in the fast path of a whole bunch of pipeline stages, and it contains the state of things such as the current privilege level, the state of MMU translation for I and D, the interrupt enable bit, etc... It's accessed via specific mfmsr/mtmsr instructions (to simplify as there are other instructions that modify the MSR as a side effect, interrupts do that too, etc...). So the MSR warrants special treatment for KVM. Other SPRs may or may not depending on what they are. Some are just storage like the SPRGs, some contain a copy of the previous PC and MSR when taking an interrupt (SRR0 and SRR1) and are used by the rfi instruction to restore them when returning from an interrupt, and some are totally unrelated (such as the decrementer which is our core timer facility) or other processor specific registers containing various things like cache configuration etc... The main issue with kernel entry / exit performances, though, revolve around MSR, SPRG and SRR0/1 accesses. SPRGs could -almost- be entirely guest cached, but since the goal is to save a register to use as scrach at a time when no register can be clobbered, saving a register to them must fit in one instruction that has no side effect. The typical option we are thinking about here is a store-absolute to an address that KVM can then map to some per-CPU storage page. Things like SRR0/SRR1 can be replaced by similar load/stores as long as the HV sets them appropriately with the original MSR (or emulated MSR) and PC when directing an interrupt to the guest, and know where to retrieve the content set by the kernel when emulating an "rfi" instruction. The MSR can be read from cache always by the guest as long as the HV knows how to alter its cached value when directing an interrupt to the guest or emulating another of those instructions that can affect it (such as rfi of course), etc... So in our case, that (relatively small) level of paravirt provides a tremendous performance boost, since every guest interrupt (syscall, etc...) goes down from something like a good dozen emulation traps to maybe a couple just for the base entry/exit path from the kernel. This is very different from the issues around PV that you guys had in x86 world related to MMU emulation, though in our case, PV may also prove useful, as our MMU structure is very different, this is a completely orthogonal matter. Cheers, Ben. -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html