Re: Lockdep failure due to 'wierd' per-cpu wakeup_vcpus_on_cpu_lock lock

Maxim Levitsky <mlevitsk@xxxxxxxxxx> · Wed, 19 Mar 2025 18:49:52 -0400

On Wed, 2025-03-19 at 09:17 -0700, Sean Christopherson wrote:
> +James and Yan
> 
> On Tue, Mar 18, 2025, Maxim Levitsky wrote:
> > Hi!
> > 
> > I recently came up with an interesting failure in the CI pipeline.
> > 
> > 
> > [  592.704446] WARNING: possible circular locking dependency detected 
> > [  592.710625] 6.12.0-36.el10.x86_64+debug #1 Not tainted 
> > [  592.715764] ------------------------------------------------------ 
> > [  592.721946] swapper/19/0 is trying to acquire lock: 
> > [  592.726823] ff110001b0e64ec0 (&p->pi_lock)\{-.-.}-\{2:2}, at: try_to_wake_up+0xa7/0x15c0 
> > [  592.734761]  
> > [  592.734761] but task is already holding lock: 
> > [  592.740596] ff1100079ec0c058 (&per_cpu(wakeup_vcpus_on_cpu_lock, cpu))\{-...}-\{2:2}, at: pi_wakeup_handler+0x60/0x130 [kvm_intel] 
> > [  592.752185]  
> > [  592.752185] which lock already depends on the new lock. 
> 
> ...
> 
> > As far as I see, there is no race, but lockdep doesn't understand this.
> 
> Yep :-(
> 
> This splat fires every time (literally) I run through my battery of tests on
> systems with IPI virtualization, it's basically an old friend at this point.
> 
> > It thinks that:
> > 
> > 1. pi_enable_wakeup_handler is called from schedule() which holds rq->lock, and it itself takes wakeup_vcpus_on_cpu_lock lock
> > 
> > 2. pi_wakeup_handler takes wakeup_vcpus_on_cpu_lock and then calls try_to_wake_up which can eventually take rq->lock
> > (at the start of the function there is a list of cases when it takes it)
> > 
> > I don't know lockdep well yet, but maybe a lockdep annotation will help, 
> > if we can tell it that there are multiple 'wakeup_vcpus_on_cpu_lock' locks.
> 
> Yan posted a patch to fudge around the issue[*], I strongly objected (and still
> object) to making a functional and confusing code change to fudge around a lockdep
> false positive.
> 
> James has also looked at the issue, and wasn't able to find a way to cleanly tell
> lockdep about the situation.
> 
> Looking at this (yet) again, what if we temporarily tell lockdep that
> wakeup_vcpus_on_cpu_lock isn't held when calling kvm_vcpu_wake_up()?  Gross, but
> more surgical than disabling lockdep entirely on the lock.  This appears to squash
> the warning without any unwanted side effects.
> 
> diff --git a/arch/x86/kvm/vmx/posted_intr.c b/arch/x86/kvm/vmx/posted_intr.c
> index ec08fa3caf43..5984ad6f6f21 100644
> --- a/arch/x86/kvm/vmx/posted_intr.c
> +++ b/arch/x86/kvm/vmx/posted_intr.c
> @@ -224,9 +224,17 @@ void pi_wakeup_handler(void)
>  
>         raw_spin_lock(spinlock);
>         list_for_each_entry(vmx, wakeup_list, pi_wakeup_list) {
> -
> +               /*
> +                * Temporarily lie to lockdep to avoid false positives due to
> +                * lockdep not understanding that deadlock is impossible.  This
> +                * is called only in IRQ context, and the problematic locks
> +                * taken in the kvm_vcpu_wake_up() call chain are only acquired
> +                * with IRQs disabled.
> +                */
> +               spin_release(&spinlock->dep_map, _RET_IP_);
>                 if (pi_test_on(&vmx->pi_desc))
>                         kvm_vcpu_wake_up(&vmx->vcpu);
> +               spin_acquire(&spinlock->dep_map, 0, 0, _RET_IP_);
>         }
>         raw_spin_unlock(spinlock);
>  }
> 
> [*] https://lore.kernel.org/all/20230313111022.13793-1-yan.y.zhao@xxxxxxxxx
> 

This is a good idea.

Reviewed-by: Maxim Levitsky <mlevitsk@xxxxxxxxxx>

Best regards,
	Maxim Levitsky