Re: kvm BookE and SPRGs

Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx> · Fri, 10 Jul 2009 19:09:40 +1000

On Fri, 2009-07-10 at 10:42 +0200, Alexander Graf wrote:
> 
> IMHO paravirt stuff can be really useful, but should stay in the  
> guest. I don't really like the idea of adding binary patching of  
> guests in the hypervisor more than for dcbz where I didn't see another
> way to do it.
> 
I wasn't talking about that sort of binary patching :-)

There's two ways to do it:

 - One is when you fault on an instruction like mtsprg2, you can patch
-that- instruction and replace it with a magic stwa to the "shared"
page. However, I prefer -real- paravirt which is:

 - The guest can use the existing self-binary patching facility we have
to replace its own SPR access instructions with instructions that access
the magic shared page.

> Linux does provide pv_ops for such purposes, or maybe you could use  
> the magic kernel patches itself hacks that exist in the power port  
> today already.

pv_ops are useful for higher level things. We don't necessarily needs
them anyway as we already have various hooks for our existing
hypervisors which are all some kind of paravirt. But the problem we have
now with running supervisor instructions in user mode is too low level
and performance sensitive for something like pv_ops.

My proposed scheme would be much more efficient and remains reasonably
simple.

> So then newer guests would be fast, older guests would be slow.
> Sounds  
> like a good tradeoff to me :-).

Right :-)

> Maybe we could also do the hacks in the hypervisor, but #ifdef them  
> out by default. I always get stomachaches from patching guests by  
> default ;-).

I don't like patching guest from the HV that much neither, I prefer
paravirt for things like that. The case where we may -have- to do it
would be if we tried to run legacy non-open source OSes like MacOS to
handle things like cache line size issues, but then, it should be
special options that have to be explicitely enabled via some sort of
flags passed from userspace.

Thus, from the userspace tools, when creating a VM, you could enable
special "MacOS 9 compatibility hacks" for example.

But let's deal with that later, right now, the focus is linux on linux.
I was just proposing a simple paravirt approach that would speed up
significantly a whole bunch of existing low level exception entry/exit
code path.

Another approach would be to do that at a higher level, by having more
C-like entry points for the HV to call the guest into but that seems to
inflexible to me and complicated.

> [...]

> That seems to be guest responsibility, no?

Yes. mostly. The host side KVM code would have to provide the shared
"page" which contains the shadows of SPRGs, SRR's, MSR, etc... and
properly context switch and update it, and provide a way to map it up
the top of the address space (ie, we should make it appear in pseudo
real-mode too on KVM "server", on existing KVM BookE, I suppose the
guest can do an explicit call to the HV to instanciate it).

But for the actual replacement of the various instructions with accesses
to this page, that would be the responsibility of the guest to patch
itself, for which we already have appropriate mechanisms so it should be
reasonably easy.

Cheers,
Ben.

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html