On Thu, May 06, 2021 at 03:57:34PM -0300, Marcelo Tosatti wrote: > For VMX, when a vcpu enters HLT emulation, pi_post_block will: > > 1) Add vcpu to per-cpu list of blocked vcpus. > > 2) Program the posted-interrupt descriptor "notification vector" > to POSTED_INTR_WAKEUP_VECTOR > > With interrupt remapping, an interrupt will set the PIR bit for the > vector programmed for the device on the CPU, test-and-set the > ON bit on the posted interrupt descriptor, and if the ON bit is clear > generate an interrupt for the notification vector. > > This way, the target CPU wakes upon a device interrupt and wakes up > the target vcpu. > > Problem is that pi_post_block only programs the notification vector > if kvm_arch_has_assigned_device() is true. Its possible for the > following to happen: > > 1) vcpu V HLTs on pcpu P, kvm_arch_has_assigned_device is false, > notification vector is not programmed > 2) device is assigned to VM > 3) device interrupts vcpu V, sets ON bit (notification vector not programmed, > so pcpu P remains in idle) > 4) vcpu 0 IPIs vcpu V (in guest), but since pi descriptor ON bit is set, > kvm_vcpu_kick is skipped > 5) vcpu 0 busy spins on vcpu V's response for several seconds, until > RCU watchdog NMIs all vCPUs. > > To fix this, use the start_assignment kvm_x86_ops callback to program the > notification vector when assigned device count changes from 0 to 1. > > Reported-by: Pei Zhang <pezhang@xxxxxxxxxx> > Signed-off-by: Marcelo Tosatti <mtosatti@xxxxxxxxxx> Argh, missing setting vmx_pi_start_assignment, will resend v2.