Rusty Russell wrote: > Hi all, > > Sorry for the delay. This implements binary patching of call sites for > interrupt-related paravirt ops, since no-doubt Andi wasn't the only one > to believe this approach is slow. > > > Sorry to take so long to look over this. I believe this is another good step. But you do need more - I believe the following are extremely sensitive to context switch latency: > Lmbench pipe bandwidth: > normal 2522.2 > paravirt 2335.5 [-7.402%] > paravirt-patch 2401 [-4.805%] > Lmbench UNIX socket bandwidth: > normal 2935 > paravirt 2617 [-10.834%] > paravirt-patch 2788.2 [-5.001%] This means you'll probably need to inline / patch everything on the common path in switch_to, which includes GDT updates and a reload of CR3. You'll probably also want to inline / patch the read of CR2 in the page fault path. So if you have to do inlining for both a read and write CR accessors, doesn't it seem easier to just do them all and be gone with the stub implementations? Having a common approach is what led us down the patch of full blown patching, as it was easier to maintain than an ad-hoc set of interfaces selected simply by virtue of being on the critical path. The critical paths are quite a bit different on 64-bit as well, which means things like CR8 and WRMSR become important to inline. In either case, letting the kernel decide which interfaces to complicate with inline patching could be the best solution - but we'd have to be careful to require non-virtualizable interfaces (or interfaces which require memory trapping) to always provide patchable alternatives. Zach