Re: Where is the entry of hypercalls in kvm?

Anthony Liguori <anthony@xxxxxxxxxxxxx> · Wed, 30 Jun 2010 11:34:19 -0500

On 06/30/2010 11:28 AM, Peter Teoh wrote:
Thank you Alex for the reply, very glad to know you!!!

On Wed, Jun 30, 2010 at 4:56 PM, Alexander Graf<agraf@xxxxxxx>  wrote:

On 30.06.2010, at 10:17, Peter Teoh wrote:

Your questioned is answered here:

http://www.spinics.net/lists/kvm/msg37526.html

And check this paper out:

http://ozlabs.org/~rusty/virtio-spec/virtio-paper.pdf

The general concept to remember is that QEMU and KVM just execute the
input as binary stream....it does not know what "functions" it is
executing...so the binary stream can be any OS (windows / Linux
etc)....QEMU just setup the basic block (call basic blocks
translation) mechanism, and then execute it block by block.   Each
block by definition is demarcated by a branch/jump etc.   Within the
block if there is any privilege instruction, (eg, write MSR registers,
load LDT registers etc), then a transition will be made from guest in
QEMU into KVM to update the VMCB/VMCS information.   (these terms are
from Intel/AMD manual).

Eh, no.

There are two modes of operation:

1) TCG
2) KVM

Now I am clear, it is translate-all.c vs kvm-all.c as the two main
file in QEMU.   Thanks for that!

In mode 1, qemu goes through target-xxx/translate.c and converts the basic blocks you were talking about above to native machine code on the host system using tcg (see the tcg directory). No KVM is involved, everything happens in user mode.

In mode 2, qemu executes _everything_ by calling KVM. There is no guest code interpreted, looked at or whatever in qemu. Whenever the guest CPU runs, it runs because qemu called ioctrl(VCPU_RUN) on its kvm vcpu fd.

Now I don't understand.....guest codes usually have two parts -->  one
running in ring3, and another in ring0, so if we were running
everything in KVM, won't it posed a security risks?   as far as I
know, VMware use ring1 to run ALL the guest codes, and transition to
ring0 whenever privilege instructions is encountered.

This is not quite accurate anymore.

VT and SVM introduce what's often called compressed ring 0 mode.  In 
this new mode, you can execute code in ring 0, 1, 2, or 3 but trap any 
operations that are potentially sensitive (like IO operations).  The act 
of trapping these events results in a transition from compressed ring 0 
to normal ring 0.  This transition is called a vmexit.

KVM enables compressed ring 0 mode and runs all guest code in that 
mode.  It directly handles all vmexits and decides to pass a subset of 
those exits down to qemu for further handling.  Typically, this subset 
includes anything that requires device emulation.

Regards,

Anthony Liguori

    so what is the
equivalent mechanism in qemu?   Key issue I am facing with here is
basically "privilege insn", ----->  only these should be executing in
kvm module, which is running in ring0, and the rest is best to be at
lower level?

I have not seen any IOCTL calls in QEMU,

See kvm*.c and target-xxx/kvm.c

but I suspect ultimately it
should drop to a VMRUN (for AMD, Intel called it VMLAUNCH or VMRESUME)
calls inside KVM, which can be found here:

arch/x86/kvm/

And the AMD specific virtualization is done in svm.c whereas that of
vmx.c is for Intel.

Copying the remark in vmx.c:

/*
* The exit handlers return 1 if the exit was handled fully and guest execution
* may resume.  Otherwise they set the kvm_run parameter to indicate what needs
* to be done to userspace and return 0.
*/
static int (*kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
        [EXIT_REASON_EXCEPTION_

And after reading the Intel manual, u will understand that "exit" here
actually refers to the special set of privilege intel instructions,
which upon being executed by the guest OS, will immediately caused and
VMEXIT condition, and these are handled by the above handler in
kvm.ko.

in kvm-xxx.ko for x86.

Also, please don't top post :)

Alex

Thanks again.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html