Re: [PATCH RFC] KVM: x86: vmx: throttle immediate exit through preemtion timer to assist buggy guests

Paolo Bonzini <pbonzini@xxxxxxxxxx> · Fri, 29 Mar 2019 15:18:56 +0100

On 28/03/19 21:31, Vitaly Kuznetsov wrote:
> 
> The 'hang' scenario develops like this:
> 1) Hyper-V boots and QEMU is trying to inject two irq simultaneously. One
>  of them is level-triggered. KVM injects the edge-triggered one and
>  requests immediate exit to inject the level-triggered:
> 
>  kvm_set_irq:          gsi 23 level 1 source 0
>  kvm_msi_set_irq:      dst 0 vec 80 (Fixed|physical|level)
>  kvm_apic_accept_irq:  apicid 0 vec 80 (Fixed|edge)
>  kvm_msi_set_irq:      dst 0 vec 96 (Fixed|physical|edge)
>  kvm_apic_accept_irq:  apicid 0 vec 96 (Fixed|edge)
>  kvm_nested_vmexit_inject: reason EXTERNAL_INTERRUPT info1 0 info2 0 int_info 80000060 int_info_err 0
> 
> 2) Hyper-V requires one of its VMs to run to handle the situation but
>  immediate exit happens:
> 
>  kvm_entry:            vcpu 0
>  kvm_exit:             reason VMRESUME rip 0xfffff80006a40115 info 0 0
>  kvm_entry:            vcpu 0
>  kvm_exit:             reason PREEMPTION_TIMER rip 0xfffff8022f3d8350 info 0 0
>  kvm_nested_vmexit:    rip fffff8022f3d8350 reason PREEMPTION_TIMER info1 0 info2 0 int_info 0 int_info_err 0
>  kvm_nested_vmexit_inject: reason EXTERNAL_INTERRUPT info1 0 info2 0 int_info 80000050 int_info_err 0

I supposed before this there was an eoi for vector 96?

The main issue with your patch is that the preemption timer is buggy on
some processors (it runs too fast) and on those processors we shouldn't
use it with nonzero deadline.  In particular because it runs too fast it
may not hide the bug.

I think level-triggered interrupts are required for the bug to show.
Edge-triggered interrupts usually have to be acknowledged with a device
register before the host device will trigger another interrupt; or at
least the interrupt event, for example an incoming network packet, must
happen again.  This way, when the guest hangs it puts some back pressure
on the host.

I think we should do in QEMU the same fix that was done in the in-kernel
IOAPIC.

Paolo