Re: tlb flush after each vm_exit, also virtual interrupts injection

"Charls D. Chap" <chapcharls@xxxxxxxxx> · Wed, 3 Aug 2016 17:43:33 +0300

On Tue, Aug 2, 2016 at 8:33 PM, Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote:
>
>> 1) I've seen some slides, back in 08, in which it is described
>> that the use of VPID, will solve the problem of TLB flush after each VM_EXIT.
>> But, i see from the code that it actually does a flush after a VM_EXIT.
>>
>> Obviously, i am wrong. So I need some help,
>> Where to look, i mean which lines of code, in order to figure out, what is
>> happening with TLB flush and VM_EXITS
>
> You are saying that you "see from the code that it actually does a flush
> after a VM_EXIT".  Where is this?
>
>> 2) system call from ing 0 (non-root), to ring 0(root)
>> Could guest os, do a system call to host os?
>
> No.  You'd need a program running on the host, and a channel between
> this program and a guest (such as a socket or a serial port).
>
>> 3) what is the mechanism of virtual interrupt injection
>> What is the mechanism that is used for a virtual interrupt injection,
>> in full virtualization?
>>
>> Host injects an interrupt to guest, HOW?  eg. hardware interrupt?
>> to which point of guest? guest complete_bh?
>
> Interrupt injections happens through ioctls on the KVM file descriptors
> (the CPU file descriptor for KVM_INTERRUPT, the VM file descriptors for others).
>
> When the LAPIC is emulated by userspace (not the common case) this is
> done with the KVM_INTERRUPT ioctl.  When the LAPIC is emulated in kernel,
> there are various mechanisms.
>
> ioctl                   when?                interrupt kind
> ------------------------------------------------------------------------
> KVM_INTERRUPT           i8259 in userspace   EXTINT
> KVM_SET_GSI_ROUTING     (always)             IOAPIC
> KVM_SIGNAL_MSI          (always)             MSI
> KVM_SET_GSI_ROUTING     (always)             MSI
> KVM_IRQFD                                    any that can use KVM_SET_GSI_ROUTING
>
> After KVM_SET_GSI_ROUTING, the host invokes another ioctl on the VM
> file descriptor (either KVM_IRQ_LINE or KVM_IRQ_LINE_STATUS) in order
> to trigger the interrupt.  In QEMU this corresponds to qemu_irq_raise,
> pci_set_irq or msi_notify.
>
What do you mean by "this corresponds",
There is an kvm_vcpu_ioctl from host kernel to guest?
or kvm_vcpu_ioct from host kernel, to host userspace (qemu) to guest??

Why not call directly  vcpu_enter_guest(struct kvm_vcpu *vcpu)
avoiding the switch to QEMU?

So in the case of write I/O using virtio-blk dataplane=off
For the return I/O path: what qemu/host, host/qemu, and  qemu/guest are there?

The above ioctls go from host kvm to qemu, and the qemu
notifies guest? how?
ioctl(SET_GSI_ROUTING)
ioctl(KVM_IRQFD)

For the return path: What is going to happen after the host there is
the real I/O completion, the host complete bh is executed?  We go
through iothread to guest, in order to executte the
virtio-blk-complete request?

One last Question about vmentry and vmexit code, it seems to me that
vmentry and vm exit share the same asm block of code:
I can understand that in 8719 line, we switch to non-root guest mode
and the lines 8720 and below are not executed. Is this the vmentry?

And when a vmexit happens, the instructions from 8721 and below is the
vmexit part?
How did the context change?, i mean, which instruction, made the jump,
and now we are in this line "mov %0, %c[wordsize](%%" _ASM_SP ")
\n\t"?

--------------------------------------

/* Enter guest mode */
8716                 "jne 1f \n\t"
8717                 __ex(ASM_VMX_VMLAUNCH) "\n\t"
8718                 "jmp 2f \n\t"
8719                 "1: " __ex(ASM_VMX_VMRESUME) "\n\t"
8720                 "2: "
8721                 /* Save guest registers, load host registers, keep flags */
8722                 "mov %0, %c[wordsize](%%" _ASM_SP ") \n\t"
8723                 "pop %0 \n\t"
8724                 "mov %%" _ASM_AX ", %c[rax](%0) \n\t"
8725                 "mov %%" _ASM_BX ", %c[rbx](%0) \n\t"
8726                 __ASM_SIZE(pop) " %c[rcx](%0) \n\t"
8727                 "mov %%" _ASM_DX ", %c[rdx](%0) \n\t"
8728                 "mov %%" _ASM_SI ", %c[rsi](%0) \n\t"
8729                 "mov %%" _ASM_DI ", %c[rdi](%0) \n\t"
8730                 "mov %%" _ASM_BP ", %c[rbp](%0) \n\t"

> After KVM_IRQFD, the host writes to an eventfd in order to trigger the
> interrupt.  In QEMU this corresponds to event_notifier_set.
>
> (For MSI, KVM_SIGNAL_MSI is preferred to KVM_IRQ_LINE/KVM_IRQ_LINE_STATUS
> because it's faster, but they provide the same functionality).
>
>> 4)
>> I've seen from bibliography, that KVM operates in protection ring -1.
>> What doe it mean? Is there HW implementation for that ring?
>>
>> Why not in ring 0?
>
> Ring -1 is not a particularly good name.  The right name is that KVM
> operates in VMX ring 0 root mode, while the guest operates in VMX
> non-root mode (which can be any of ring 0-1-2-3 depending on the
> current privilege level of the guest).
>
> Paolo

thanks
Charls
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html