> On 26 Mar 2019, at 15:48, Vitaly Kuznetsov <vkuznets@xxxxxxxxxx> wrote: > > Liran Alon <liran.alon@xxxxxxxxxx> writes: > >>> On 26 Mar 2019, at 15:07, Vitaly Kuznetsov <vkuznets@xxxxxxxxxx> wrote: >>> - Instread of putting the temporary HF_SMM_MASK drop to >>> rsm_enter_protected_mode() (as was suggested by Liran), move it to >>> emulator_set_cr() modifying its interface. emulate.c seems to be >>> vcpu-specifics-free at this moment, we may want to keep it this way. >>> - It seems that Hyper-V+UEFI on KVM is still broken, I'm observing sporadic >>> hangs even with this patch. These hangs, however, seem to be unrelated to >>> rsm. >> >> Feel free to share details on these hangs ;) >> > > You've asked for it) > > The immediate issue I'm observing is some sort of a lockup which is easy > to trigger with e.g. "-usb -device usb-tablet" on Qemu command line; it > seems we get too many interrupts and combined with preemtion timer for > L2 we're not making any progress: > > kvm_userspace_exit: reason KVM_EXIT_IOAPIC_EOI (26) > kvm_set_irq: gsi 18 level 1 source 0 > kvm_msi_set_irq: dst 0 vec 177 (Fixed|physical|level) > kvm_apic_accept_irq: apicid 0 vec 177 (Fixed|edge) > kvm_fpu: load > kvm_entry: vcpu 0 > kvm_exit: reason VMRESUME rip 0xfffff80000848115 info 0 0 > kvm_entry: vcpu 0 > kvm_exit: reason PREEMPTION_TIMER rip 0xfffff800f4448e01 info 0 0 > kvm_nested_vmexit: rip fffff800f4448e01 reason PREEMPTION_TIMER info1 0 info2 0 int_info 0 int_info_err 0 > kvm_nested_vmexit_inject: reason EXTERNAL_INTERRUPT info1 0 info2 0 int_info 800000b1 int_info_err 0 > kvm_entry: vcpu 0 > kvm_exit: reason APIC_ACCESS rip 0xfffff8000081fe11 info 10b0 0 > kvm_apic: apic_write APIC_EOI = 0x0 > kvm_eoi: apicid 0 vector 177 > kvm_fpu: unload > kvm_userspace_exit: reason KVM_EXIT_IOAPIC_EOI (26) > ... > (and the pattern repeats) > > Maybe it is a usb-only/Qemu-only problem, maybe not. > > -- > Vitaly The trace of kvm_apic_accept_irq should indicate that __apic_accept_irq() was called to inject an interrupt to L1 guest. (I know that now we are running in L1 because next exit is a VMRESUME). However, it is surprising to see that on next entry to guest, no interrupt was injected by vmx_inject_irq(). It may be because L1 guest is currently running with interrupt disabled and therefore only an IRQ-window was requested. (Too bad we don’t have a trace for this…) Next, we got an exit from L1 guest on VMRESUME. As part of it’s handling, active VMCS was changed from vmcs01 to vmcs02. I believe the immediate exit later on preemption-timer was because the immediate-exit-request mechanism was invoked which is now implemented by setting a VMX preemption-timer with value of 0 (Thanks to Sean). (See vmx_vcpu_run() -> vmx_update_hv_timer() -> vmx_arm_hv_timer(vmx, 0)). (Note that the pending interrupt was evaluated because of a recent patch of mine to nested_vmx_enter_non_root_mode() to request KVM_REQ_EVENT when vmcs01 have requested an IRQ-window) Therefore when entering L2, you immediately get an exit on PREEMPTION_TIMER which will cause eventually L0 to call vmx_check_nested_events() which notices now the pending interrupt that should have been injected before to L1 and now exit from L2 to L1 on EXTERNAL_INTERRUPT on vector 0xb1. Then L1 handles the interrupt by performing an EOI to LAPIC which propagate an EOI to IOAPIC which immediately re-inject the interrupt (after clearing the remote_irr) as the irq-line is still set. i.e. QEMU’s ioapic_eoi_broadcast() calls ioapic_service() immediate after it clears remote-irr for this pin. Also note that in trace we see only a single kvm_set_irq to level 1 but we don’t see immediately another kvm_set_irq to level 0. This should indicate that in QEMU’s IOAPIC redirection-table, this pin is configured as level-triggered interrupt. However, the trace of kvm_apic_accept_irq indicates that this interrupt is raised as an edge-triggered interrupt. To sum up: 1) I would create a patch to add a trace to vcpu_enter_guest() when calling enable_smi_window() / enable_nmi_window() / enable_irq_window(). 2) It is worth investigating why MSI trigger-mode is edge-triggered instead of level-triggered. 3) If this is indeed a level-triggered interrupt, it is worth investigating how the interrupt source behaves. i.e. What cause this device to lower the irq-line? (As we don’t see any I/O Port or MMIO access by L1 guest interrupt-handler before performing the EOI) 4) Does this issue reproduce also when running with kernel-irqchip? (Instead of split-irqchip) -Liran