Re: [PATCH v3] KVM: VMX: fix lockdep warning on posted intr wakeup

Sean Christopherson <seanjc@xxxxxxxxxx> · Thu, 30 Mar 2023 11:14:27 -0700

On Thu, Mar 30, 2023, Yan Zhao wrote:
> On Wed, Mar 29, 2023 at 01:51:23PM +0200, Paolo Bonzini wrote:
> > On 3/29/23 03:53, Yan Zhao wrote:
> > > Yes, there's no actual deadlock currently.
> > > 
> > > But without fixing this issue, debug_locks will be set to false along
> > > with below messages printed. Then lockdep will be turned off and any
> > > other lock detections like lockdep_assert_held()... will not print
> > > warning even when it's obviously violated.
> > 
> > Can you use lockdep subclasses, giving 0 to the sched_in path and 1 to the
> > sched_out path?
> 
> Yes, thanks for the suggestion!
> This can avoid this warning of "possible circular locking dependency".
> 
> I tried it like this:
> - in sched_out path:
>   raw_spin_lock_nested(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu), 1);
> 
> - in irq and sched_in paths:
>   raw_spin_lock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
> 
> But I have a concern:
> If sched_in path removes vcpu A from wakeup list of its previous pcpu A,
> and at the mean time, sched_out path adds vcpu B to the wakeup list of
> pcpu A, the sched_in and sched_out paths should race for the same
> subclass of lock.
> But if sched_in path only holds subclass 0, and sched_out path holds
> subclass 1, then lockdep would not warn of "possible circular locking
> dependency" if someone made a change as below in sched_in path.
> 
> if (pi_desc->nv == POSTED_INTR_WAKEUP_VECTOR) {
>             raw_spin_lock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
>             list_del(&vmx->pi_wakeup_list);
> +            raw_spin_lock(&current->pi_lock);
> +            raw_spin_unlock(&current->pi_lock);
>             raw_spin_unlock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
> }
> 
> While with v3 of this patch (sched_in path holds both out_lock and in_lock),
> lockdep is still able to warn about this issue.

Couldn't we just add a manual assertion?  That'd also be a good location for a
comment to document all of this, and to clarify that current->pi_lock is a
completely different lock that has nothing to do with posted interrupts.

It's not foolproof, but any patches that substantially touch this code need a
ton of scrutiny as the scheduling interactions are gnarly, i.e. IMO a deadlock
bug sneaking in is highly unlikely.

diff --git a/arch/x86/kvm/vmx/posted_intr.c b/arch/x86/kvm/vmx/posted_intr.c
index 94c38bea60e7..19325a10e42f 100644
--- a/arch/x86/kvm/vmx/posted_intr.c
+++ b/arch/x86/kvm/vmx/posted_intr.c
@@ -90,6 +90,7 @@ void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu)
         */
        if (pi_desc->nv == POSTED_INTR_WAKEUP_VECTOR) {
                raw_spin_lock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
+               lockdep_assert_not_held(&current->pi_lock);
                list_del(&vmx->pi_wakeup_list);
                raw_spin_unlock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
        }