[PATCH] (with benchmarks) binary patching of paravirt_ops call sites

zach at vmware.com (Zachary Amsden) · Thu, 01 Jun 2006 17:01:30 -0700

Rusty Russell wrote:
> Hi all,
>
> 	Sorry for the delay.  This implements binary patching of call sites for
> interrupt-related paravirt ops, since no-doubt Andi wasn't the only one
> to believe this approach is slow.
>
>
>   

Sorry to take so long to look over this.  I believe this is another good 
step.  But you do need more - I believe the following are extremely 
sensitive to context switch latency:

> Lmbench pipe bandwidth:
>   normal 2522.2
>   paravirt 2335.5 [-7.402%]
>   paravirt-patch 2401 [-4.805%]
> Lmbench UNIX socket bandwidth:
>   normal 2935
>   paravirt 2617 [-10.834%]
>   paravirt-patch 2788.2 [-5.001%]

This means you'll probably need to inline / patch everything on the 
common path in switch_to, which includes GDT updates and a reload of CR3.

You'll probably also want to inline / patch the read of CR2 in the page 
fault path.

So if you have to do inlining for both a read and write CR accessors, 
doesn't it seem easier to just do them all and be gone with the stub 
implementations?  Having a common approach is what led us down the patch 
of full blown patching, as it was easier to maintain than an ad-hoc set 
of interfaces selected simply by virtue of being on the critical path.  
The critical paths are quite a bit different on 64-bit as well, which 
means things like CR8 and WRMSR become important to inline.

In either case, letting the kernel decide which interfaces to complicate 
with inline patching could be the best solution - but we'd have to be 
careful to require non-virtualizable interfaces (or interfaces which 
require memory trapping) to always provide patchable alternatives.

Zach