Re: tlb flush after each vm_exit, also virtual interrupts injection

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Aug 3, 2016 at 6:56 PM, Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote:
>
>
> On 03/08/2016 16:43, Charls D. Chap wrote:
>> On Tue, Aug 2, 2016 at 8:33 PM, Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote:
>>>> 3) what is the mechanism of virtual interrupt injection
>>>> What is the mechanism that is used for a virtual interrupt injection,
>>>> in full virtualization?
>>>>
>>>> Host injects an interrupt to guest, HOW?  eg. hardware interrupt?
>>>> to which point of guest? guest complete_bh?
>>>
>>> Interrupt injections happens through ioctls on the KVM file descriptors
>>> (the CPU file descriptor for KVM_INTERRUPT, the VM file descriptors for others).
>>>
>>> ioctl                   when?                interrupt kind
>>> ------------------------------------------------------------------------
>>> KVM_INTERRUPT           i8259 in userspace   EXTINT
>>> KVM_SET_GSI_ROUTING     (always)             IOAPIC
>>> KVM_SIGNAL_MSI          (always)             MSI
>>> KVM_SET_GSI_ROUTING     (always)             MSI
>>> KVM_IRQFD                                    any that can use KVM_SET_GSI_ROUTING
>>>
>>> After KVM_SET_GSI_ROUTING, the host invokes another ioctl on the VM
>>> file descriptor (either KVM_IRQ_LINE or KVM_IRQ_LINE_STATUS) in order
>>> to trigger the interrupt.  In QEMU this corresponds to qemu_irq_raise,
>>> pci_set_irq or msi_notify.
>>>
>> What do you mean by "this corresponds",
>> There is an kvm_vcpu_ioctl from host kernel to guest?
>> or kvm_vcpu_ioct from host kernel, to host userspace (qemu) to guest??
>
> It's kvm_vcpu_ioctl or kvm_vm_ioctl, and it goes from host userspace to
> host kernel (ioctl is a syscall).  The ioctl is invoked when QEMU
> generates an interrupt with qemu_irq_raise (sometimes called directly,
> sometimes through pci_set_irq) or msi_notify.
>
>> Why not call directly  vcpu_enter_guest(struct kvm_vcpu *vcpu)
>> avoiding the switch to QEMU?
>
> Two reasons.  First, it's QEMU that wants to generate the interrupt.
> The ioctl or eventfd is how KVM receives the signal.
>
> For kernel event sources, those that are part of KVM such as i8254.c
> generate the interrupt through kvm_set_irq.  But this is the exception,
> not the rule.  In general, KVM wants to be self-contained and exposes
> interfaces to connect other parts of the kernel to KVM.  irqfd is the
> main such interface; it is used by both vhost and VFIO, for example.
>
>> So in the case of write I/O using virtio-blk dataplane=off
>> [...] What is going to happen after the host there is
>> the real I/O completion, the host complete bh is executed?  We go
>> through iothread to guest, in order to executte the
>> virtio-blk-complete request?
>

How did the control transfer to QEMU user space (and which thread is
running vcpu or worker)
->virtio_blk_device_realize
-> virtio_blk_req_complete
Was it the "real" interrupt for I/O completion from the device?

Which qemu thread executes the code you mentioned?, vcpu or a
worker(iothread or main_loop)  When did iothread finish its work?




>   virtio_blk_req_complete
>   -> virtio_notify
>   -> virtio_pci_notify
>   -> either msix_notify or pci_set_irq
>
> The paths then are different.  Assuming you are using the kernel's LAPIC
> implementation (which has a QEMU "bridge" in hw/i386/kvm/apic.c), for
> msix_notify it goes like this:
>
>   msix_notify
>   -> msi_send_message
>   -> address_space_stl_le
>   -> ...
>   -> kvm_apic_mem_write
>   -> kvm_irqchip_send_msi
>   -> kvm_vm_ioctl
>
> while for pci_set_irq:
>
>   pci_set_irq
>   -> pci_irq_handler
>   -> pci_change_irq_level
>   -> piix3_set_irq
>   -> piix3_set_irq_level
>   -> piix3_set_irq_pic
>   -> qemu_set_irq(piix3->pic[pic_irq], ...)
>   -> kvm_pc_gsi_handler
>   -> qemu_set_irq(s->i8259_irq[n], ...)
>   -> kvm_pic_set_irq
>   -> kvm_set_irq
>   -> kvm_vm_ioctl
>
>> One last Question about vmentry and vmexit code, it seems to me that
>> vmentry and vm exit share the same asm block of code:
>> I can understand that in 8719 line, we switch to non-root guest mode
>> and the lines 8720 and below are not executed. Is this the vmentry?
>
> Yes, it's either line 8717 or line 8719.
>
>> And when a vmexit happens, the instructions from 8721 and below is the
>> vmexit part?
>> How did the context change?, i mean, which instruction, made the jump,
>> and now we are in this line "mov %0, %c[wordsize](%%" _ASM_SP ")
>> \n\t"?


I know that there are many exit reasons, but it's not clear to me
HOW exactly, transfer the control from the execution of one of these
instructions
to VMEXIT point which is "vmx_return: " _ASM_PTR " 2b \n\t"
Where does this extraction happened and we jumped to this label?
Is it inside of the corresponding ioctl implementation?

I guess the answer is: "read the manual", which is fine to me, because
you already helped me a lot :)


>
> It can be one of many conditions, only some of which correspond to
> particular instructions.  All the reasons for vmexit are listed in the
> SDM.  They include instructions (e.g. moves to control registers,
> RDMSR/WRMSR, HLT, CPUID, etc.), exceptions injected in the guest,
> interrupts injected in the host, page faults on EPT pages, conditions
> that the processor cannot handle (triple fault, task switch), conditions
> requested previously by the hypervisor ("interrupt window" and NMI
> window), etc.
>
> You really need to read the manual. :)
>



> Paolo
>
>> --------------------------------------
>>
>> /* Enter guest mode */
>> 8716                 "jne 1f \n\t"
>> 8717                 __ex(ASM_VMX_VMLAUNCH) "\n\t"
>> 8718                 "jmp 2f \n\t"
>> 8719                 "1: " __ex(ASM_VMX_VMRESUME) "\n\t"
>> 8720                 "2: "
>> 8721                 /* Save guest registers, load host registers, keep flags */
>> 8722                 "mov %0, %c[wordsize](%%" _ASM_SP ") \n\t"
>> 8723                 "pop %0 \n\t"
>> 8724                 "mov %%" _ASM_AX ", %c[rax](%0) \n\t"
>> 8725                 "mov %%" _ASM_BX ", %c[rbx](%0) \n\t"
>> 8726                 __ASM_SIZE(pop) " %c[rcx](%0) \n\t"
>> 8727                 "mov %%" _ASM_DX ", %c[rdx](%0) \n\t"
>> 8728                 "mov %%" _ASM_SI ", %c[rsi](%0) \n\t"
>> 8729                 "mov %%" _ASM_DI ", %c[rdi](%0) \n\t"
>> 8730                 "mov %%" _ASM_BP ", %c[rbp](%0) \n\t"
>>
>>
>>
>>
>>
>>> After KVM_IRQFD, the host writes to an eventfd in order to trigger the
>>> interrupt.  In QEMU this corresponds to event_notifier_set.
>>>
>>> (For MSI, KVM_SIGNAL_MSI is preferred to KVM_IRQ_LINE/KVM_IRQ_LINE_STATUS
>>> because it's faster, but they provide the same functionality).
>>>
>>>> 4)
>>>> I've seen from bibliography, that KVM operates in protection ring -1.
>>>> What doe it mean? Is there HW implementation for that ring?
>>>>
>>>> Why not in ring 0?
>>>
>>> Ring -1 is not a particularly good name.  The right name is that KVM
>>> operates in VMX ring 0 root mode, while the guest operates in VMX
>>> non-root mode (which can be any of ring 0-1-2-3 depending on the
>>> current privilege level of the guest).
>>>
>>> Paolo
>>
>> thanks
>> Charls
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux