On 06/11/19 18:56, Joao Martins wrote: > When vCPU enters block phase, pi_pre_block() inserts vCPU to a per pCPU > linked list of all vCPUs that are blocked on this pCPU. Afterwards, it > changes PID.NV to POSTED_INTR_WAKEUP_VECTOR which its handler > (wakeup_handler()) is responsible to kick (unblock) any vCPU on that > linked list that now has pending posted interrupts. > > While vCPU is blocked (in kvm_vcpu_block()), it may be preempted which > will cause vmx_vcpu_pi_put() to set PID.SN. If later the vCPU will be > scheduled to run on a different pCPU, vmx_vcpu_pi_load() will clear > PID.SN but will also *overwrite PID.NDST to this different pCPU*. > Instead of keeping it with original pCPU which vCPU had entered block > phase on. > > This results in an issue because when a posted interrupt is delivered, > the wakeup_handler() will be executed and fail to find blocked vCPU on > its per pCPU linked list of all vCPUs that are blocked on this pCPU. > Which is due to the vCPU being placed on a *different* per pCPU > linked list than the original pCPU that it had entered block phase. > > The regression is introduced by commit c112b5f50232 ("KVM: x86: > Recompute PID.ON when clearing PID.SN"). Therefore, partially revert > it and reintroduce the condition in vmx_vcpu_pi_load() responsible for > avoiding changing PID.NDST when loading a blocked vCPU. > > Fixes: c112b5f50232 ("KVM: x86: Recompute PID.ON when clearing PID.SN") > Signed-off-by: Joao Martins <joao.m.martins@xxxxxxxxxx> > Signed-off-by: Liran Alon <liran.alon@xxxxxxxxxx> Something wrong in the SoB line? Otherwise looks good. Paolo > --- > arch/x86/kvm/vmx/vmx.c | 14 ++++++++++++++ > arch/x86/kvm/vmx/vmx.h | 6 ++++++ > 2 files changed, 20 insertions(+) > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > index 18b0bee662a5..75d903455e1c 100644 > --- a/arch/x86/kvm/vmx/vmx.c > +++ b/arch/x86/kvm/vmx/vmx.c > @@ -1274,6 +1274,18 @@ static void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu) > if (!pi_test_sn(pi_desc) && vcpu->cpu == cpu) > return; > > + /* > + * If the 'nv' field is POSTED_INTR_WAKEUP_VECTOR, do not change > + * PI.NDST: pi_post_block is the one expected to change PID.NDST and the > + * wakeup handler expects the vCPU to be on the blocked_vcpu_list that > + * matches PI.NDST. Otherwise, a vcpu may not be able to be woken up > + * correctly. > + */ > + if (pi_desc->nv == POSTED_INTR_WAKEUP_VECTOR || vcpu->cpu == cpu) { > + pi_clear_sn(pi_desc); > + goto after_clear_sn; > + } > + > /* The full case. */ > do { > old.control = new.control = pi_desc->control; > @@ -1289,6 +1301,8 @@ static void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu) > } while (cmpxchg64(&pi_desc->control, old.control, > new.control) != old.control); > > +after_clear_sn: > + > /* > * Clear SN before reading the bitmap. The VT-d firmware > * writes the bitmap and reads SN atomically (5.2.3 in the > diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h > index bee16687dc0b..1e32ab54fc2d 100644 > --- a/arch/x86/kvm/vmx/vmx.h > +++ b/arch/x86/kvm/vmx/vmx.h > @@ -373,6 +373,12 @@ static inline void pi_clear_on(struct pi_desc *pi_desc) > (unsigned long *)&pi_desc->control); > } > > +static inline void pi_clear_sn(struct pi_desc *pi_desc) > +{ > + clear_bit(POSTED_INTR_SN, > + (unsigned long *)&pi_desc->control); > +} > + > static inline int pi_test_on(struct pi_desc *pi_desc) > { > return test_bit(POSTED_INTR_ON, >