Re: [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v4

Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx> · Sat, 03 Oct 2009 21:10:18 +1000

On Sat, 2009-10-03 at 20:59 +1000, Benjamin Herrenschmidt wrote:
> On Sat, 2009-10-03 at 12:08 +0200, Avi Kivity wrote:
> > 
> > So these MSRs can be modified by the hypervisor?  Otherwise you'd cache 
> > them in the guest with no hypervisor involvement, right?  (just making 
> > sure :)
> 
> There's one MSR :-) Among others, it can be altered by the act of
> taking an interrupt (for example, it contains the PR bit, which means
> user vs. supervisor, things like that).

For a bit more context...

On PowerPC, all those "special" registers are called "SPR"s (special
registers, surprise ! :-)

They are generally accessed via mfspr/mtspr instructions that encode
the SPR number, though some of them can also have decicated instructions
or be set as a side effect of some instructions or events etc...

MSR is a bit special here because it's not per-se an SPR. It's the
Machine State Register, in the core, it's in the fast path of a whole
bunch of pipeline stages, and it contains the state of things such as
the current privilege level, the state of MMU translation for I and D,
the interrupt enable bit, etc... It's accessed via specific mfmsr/mtmsr
instructions (to simplify as there are other instructions that modify
the MSR as a side effect, interrupts do that too, etc...).

So the MSR warrants special treatment for KVM. Other SPRs may or may not
depending on what they are. Some are just storage like the SPRGs, some
contain a copy of the previous PC and MSR when taking an interrupt (SRR0
and SRR1) and are used by the rfi instruction to restore them when
returning from an interrupt, and some are totally unrelated (such as
the decrementer which is our core timer facility) or other processor
specific registers containing various things like cache configuration
etc...

The main issue with kernel entry / exit performances, though, revolve
around MSR, SPRG and SRR0/1 accesses. SPRGs could -almost- be entirely
guest cached, but since the goal is to save a register to use as scrach
at a time when no register can be clobbered, saving a register to them
must fit in one instruction that has no side effect. The typical option
we are thinking about here is a store-absolute to an address that KVM
can then map to some per-CPU storage page.

Things like SRR0/SRR1 can be replaced by similar load/stores as long as
the HV sets them appropriately with the original MSR (or emulated MSR)
and PC when directing an interrupt to the guest, and know where to
retrieve the content set by the kernel when emulating an "rfi"
instruction. The MSR can be read from cache always by the guest as
long as the HV knows how to alter its cached value when directing
an interrupt to the guest or emulating another of those instructions
that can affect it (such as rfi of course), etc...

So in our case, that (relatively small) level of paravirt provides a
tremendous performance boost, since every guest interrupt (syscall,
etc...) goes down from something like a good dozen emulation traps
to maybe a couple just for the base entry/exit path from the kernel.

This is very different from the issues around PV that you guys had in
x86 world related to MMU emulation, though in our case, PV may also
prove useful, as our MMU structure is very different, this is a
completely orthogonal matter.

Cheers,
Ben.

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html