Re: [PATCH] Add MCE support to KVM

Avi Kivity <avi@xxxxxxxxxx> · Mon, 20 Apr 2009 16:45:03 +0300

Gerd Hoffmann wrote:
On 04/20/09 14:43, Avi Kivity wrote:
Gerd Hoffmann wrote:
That said, I'd like to be able to emulate the Xen HVM hypercalls. 
But in
any case, they hypercall implementation has to be in the kernel,

No. With Xenner the xen hypercall emulation code lives in guest
address space.

In this case the guest ring-0 code should trap the #GP, and install the
hypercall page (which uses sysenter/syscall?). No kvm or qemu changes
needed.

Doesn't fly.

Reason #1: In the pv-on-hvm case the guest runs on ring0.

Sure, in this case you need to trap the MSR in the kernel (or qemu).  
But the handler is no longer in the guest address space, and you do need 
to update the opcode.

Let's not confuse the two cases.

Reason #2: Chicken-egg issue:  For the pv-on-hvm case only few,
           simple hypercalls are needed.  The code to handle them
           is small enougth that it can be loaded directly into the
           hypercall page(s).

Please elaborate.  What hypercalls are so simple that an exit into the 
hypervisor is not necessary?

Is there any reason to? I *think* xen does it for better scheduling
latency. But with xen emulation sitting in guest address space we can
schedule the guest at will anyway.

It also improves latency within the guest itself. At least I think that
what was the Hyper-V spec is saying. You can interrupt the execution of
a long hypercall, inject and interrupt, and resume. Sort of like a
rep/movs instruction, which the cpu can and will interrupt.

Hmm.  Needs investigation..  I'd expect the main source of latencies 
is page table walking.  Xen works very different from kvm+xenner here ...

kvm is mostly O(1).  We need to limit rmap chains, but we're fairly 
close.  The kvm paravirt mmu calls are not O(1), but we can easily use 
continuations there (and they're disabled on newer processors anyway).

Another area that worries me is virtio notification, which can take a 
long time.  It won't be trivial, but we can make work:

- for the existing pio-to-userspace notification, add a bit that tells 
the kernel to repeat the instruction instead of continuing.  the 'outl' 
instruction is idempotent, so we can do partial work, and return to the 
kernel.
- if using hypercallfd/piofd to a pipe, we're offloading everything to 
another thread anyway, so we can return immediately
- if using hypercallfd/piofd to a kernel virtio server, it can return 0 
bytes written, indicating it needs a retry.  kvm can try to inject an 
interrupt if it sees this.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html