> On 1 Apr 2019, at 11:39, Vitaly Kuznetsov <vkuznets@xxxxxxxxxx> wrote: > > Paolo Bonzini <pbonzini@xxxxxxxxxx> writes: > >> On 29/03/19 16:32, Liran Alon wrote: >>> Paolo I am not sure this is the case here. Please read my other >>> replies in this email thread. >>> >>> I think this is just a standard issue of a level-triggered interrupt >>> handler in L1 (Hyper-V) that performs EOI before it lowers the >>> irq-line. I don’t think vector 96 is even related to the issue at >>> hand here. This is why after it was already handled, the loop of >>> EXTERNAL_INTERRUPT happens on vector 80 and not vector 96. >> >> Hmm... Vitaly, what machine were you testing on---does it have APIC-v? >> If not, then you should have seen either an EOI for irq 96 or a TPR >> below threshold vmexit. However, if it has APIC-v then you wouldn't >> have seen any of this (you only see the EOI for irq 80 because it's >> level triggered) and Liran is probably right. >> > > It does, however, the issue is reproducible with and without > it. Moreover, I think the second simultaneous IRQ is just a red herring; > Here is another trace (enable_apicv). Posting it non-stripped and hope > your eyes will catch something I'm missing: > > [001] 513675.736316: kvm_exit: reason VMRESUME rip 0xfffff80002cae115 info 0 0 > [001] 513675.736321: kvm_entry: vcpu 0 > [001] 513675.736565: kvm_exit: reason EXTERNAL_INTERRUPT rip 0xfffff80362dcd26d info 0 800000ec > [001] 513675.736566: kvm_nested_vmexit: rip fffff80362dcd26d reason EXTERNAL_INTERRUPT info1 0 info2 0 int_info 800000ec int_info_err 0 > [001] 513675.736568: kvm_entry: vcpu 0 > [001] 513675.736650: kvm_exit: reason EPT_VIOLATION rip 0xfffff80362dcd230 info 182 0 > [001] 513675.736651: kvm_nested_vmexit: rip fffff80362dcd230 reason EPT_VIOLATION info1 182 info2 0 int_info 0 int_info_err 0 > [001] 513675.736651: kvm_page_fault: address 261200000 error_code 182 > > -> injecting > > [008] 513675.737059: kvm_set_irq: gsi 23 level 1 source 0 > [008] 513675.737061: kvm_msi_set_irq: dst 0 vec 80 (Fixed|physical|level) > [008] 513675.737062: kvm_apic_accept_irq: apicid 0 vec 80 (Fixed|edge) > [001] 513675.737233: kvm_nested_vmexit_inject: reason EXTERNAL_INTERRUPT info1 0 info2 0 int_info 80000050 int_info_err 0 > [001] 513675.737239: kvm_entry: vcpu 0 > [001] 513675.737243: kvm_exit: reason EOI_INDUCED rip 0xfffff80002c85e1a info 50 0 > > -> immediate EOI causing re-injection (even preemption timer is not > involved here). > > [001] 513675.737244: kvm_eoi: apicid 0 vector 80 > [001] 513675.737245: kvm_fpu: unload > [001] 513675.737246: kvm_userspace_exit: reason KVM_EXIT_IOAPIC_EOI (26) > [001] 513675.737256: kvm_set_irq: gsi 23 level 1 source 0 > [001] 513675.737259: kvm_msi_set_irq: dst 0 vec 80 (Fixed|physical|level) > [001] 513675.737260: kvm_apic_accept_irq: apicid 0 vec 80 (Fixed|edge) > [001] 513675.737264: kvm_fpu: load > [001] 513675.737265: kvm_entry: vcpu 0 > [001] 513675.737271: kvm_exit: reason VMRESUME rip 0xfffff80002cae115 info 0 0 > [001] 513675.737278: kvm_entry: vcpu 0 > [001] 513675.737282: kvm_exit: reason PREEMPTION_TIMER rip 0xfffff80362dcc2d0 info 0 0 > [001] 513675.737283: kvm_nested_vmexit: rip fffff80362dcc2d0 reason PREEMPTION_TIMER info1 0 info2 0 int_info 0 int_info_err 0 > [001] 513675.737285: kvm_nested_vmexit_inject: reason EXTERNAL_INTERRUPT info1 0 info2 0 int_info 80000050 int_info_err 0 > [001] 513675.737289: kvm_entry: vcpu 0 > [001] 513675.737293: kvm_exit: reason EOI_INDUCED rip 0xfffff80002c85e1a info 50 0 > [001] 513675.737293: kvm_eoi: apicid 0 vector 80 > [001] 513675.737294: kvm_fpu: unload > [001] 513675.737295: kvm_userspace_exit: reason KVM_EXIT_IOAPIC_EOI (26) > [001] 513675.737299: kvm_set_irq: gsi 23 level 1 source 0 > [001] 513675.737299: kvm_msi_set_irq: dst 0 vec 80 (Fixed|physical|level) > [001] 513675.737300: kvm_apic_accept_irq: apicid 0 vec 80 (Fixed|edge) > [001] 513675.737302: kvm_fpu: load > [001] 513675.737303: kvm_entry: vcpu 0 > [001] 513675.737307: kvm_exit: reason VMRESUME rip 0xfffff80002cae115 info 0 0 > > ... > > -- > Vitaly So to sum-up: This matches what I mentioned in my previous emails right? That vector 96 is not related, and the only issue here is that level-triggered interrupt handler for vector 80 is doing EOI before lowering the irq-line. Which cause vector 80 to be injected in infinite loop. And this is not even related to being a nested virtualization workload. It’s just an issue in Hyper-V (L1) interrupt handler for vector 80. Therefore the only action-items are: 1) Microsoft to fix Hyper-V vector 80 interrupt handler to lower irq-line before EOI. 2) Patch QEMU IOAPIC implementation to have a mechanism similar to KVM to delay injection of level-triggered interrupt in case we are injecting the same interrupt for X times in a row. -Liran