[PATCH] (with benchmarks) binary patching of paravirt_ops call sites

rusty at rustcorp.com.au (Rusty Russell) · Fri, 02 Jun 2006 13:51:24 +1000

On Thu, 2006-06-01 at 17:01 -0700, Zachary Amsden wrote:
> Rusty Russell wrote:
> > Hi all,
> >
> > 	Sorry for the delay.  This implements binary patching of call sites for
> > interrupt-related paravirt ops, since no-doubt Andi wasn't the only one
> > to believe this approach is slow.
> >
> >
> >   
> 
> Sorry to take so long to look over this.  I believe this is another good 
> step.  But you do need more - I believe the following are extremely 
> sensitive to context switch latency:
> 
> > Lmbench pipe bandwidth:
> >   normal 2522.2
> >   paravirt 2335.5 [-7.402%]
> >   paravirt-patch 2401 [-4.805%]
> > Lmbench UNIX socket bandwidth:
> >   normal 2935
> >   paravirt 2617 [-10.834%]
> >   paravirt-patch 2788.2 [-5.001%]
> 
> This means you'll probably need to inline / patch everything on the 
> common path in switch_to, which includes GDT updates and a reload of CR3.

Thanks, I'll do them next.  It's pretty easy.

> So if you have to do inlining for both a read and write CR accessors, 
> doesn't it seem easier to just do them all and be gone with the stub 
> implementations?  Having a common approach is what led us down the patch 
> of full blown patching, as it was easier to maintain than an ad-hoc set 
> of interfaces selected simply by virtue of being on the critical path.  
> The critical paths are quite a bit different on 64-bit as well, which 
> means things like CR8 and WRMSR become important to inline.

I still think there's significant benefit in the call implementations.
The first is that we can apply the patches over them at almost any time,
and the second is that their implementation is optional (good for
debugging).  Mainly, though, it's because having them there is trivial.

I'm not convinced that the maintenance burden of only patching some
insns is greater than patching them all...

> In either case, letting the kernel decide which interfaces to complicate 
> with inline patching could be the best solution - but we'd have to be 
> careful to require non-virtualizable interfaces (or interfaces which 
> require memory trapping) to always provide patchable alternatives.

I don't understand the last half of this sentence.  What were you
thinking of?  AFAICT patching is always a performance optimization,
never a correctness requirement?

Rusty.
-- 
 ccontrol: http://ccontrol.ozlabs.org