Re: [RFC] Catch dwmw2's deadlock

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On Dec 30, 2022, at 7:28 AM, David Woodhouse <dwmw2@xxxxxxxxxxxxx> wrote:
> 
> On Fri, 2022-12-30 at 11:38 +0000, Matthew Wilcox wrote:
>> 
>> <dwmw2> why doesn't lockdep catch us calling synchronize_srcu() with
>>    a lock held, and elsewhere obtaining that lock within an srcu critical
>>    region ("read lock") ?
>> 
>> Because synchronize_srcu() doesn't acquire the lock, merely checks that
>> it isn't held.
>> 
>> Would this work?  Not even compile tested.
>> 
>> You can put my SoB on this if it works.
>> 
> 
> Thanks. Not sure if this addresses the issue I was talking about.
> 
> There exists a completely unrelated mutex, let's call it kvm->lock.
> 
> One code path *holds* kvm->lock while calling synchronize_srcu().
> Another code path is in a read-side section and attempts to obtain the
> same kvm->lock.
> 
> Described here:
> 
> https://lore.kernel.org/kvm/20221229211737.138861-1-mhal@xxxxxxx/T/#m594c1068d7a659f0a1aab3b64b0903836919bb0a

I think the patch from Matthew Wilcox will address it because the read side section already acquires the dep_map. So lockdep subsystem should be able to nail the dependency. One comment on that below:

> 
>> diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
>> index ca4b5dcec675..e9c2ab8369c0 100644
>> --- a/kernel/rcu/srcutree.c
>> +++ b/kernel/rcu/srcutree.c
>> @@ -1267,11 +1267,11 @@ static void __synchronize_srcu(struct srcu_struct *ssp, bool do_norm)
>>  {
>>         struct rcu_synchronize rcu;
>>  
>> -       RCU_LOCKDEP_WARN(lockdep_is_held(ssp) ||
>> -                        lock_is_held(&rcu_bh_lock_map) ||
>> +       rcu_lock_acquire(&ssp->dep_map);
>> +       RCU_LOCKDEP_WARN(lock_is_held(&rcu_bh_lock_map) ||
>>                          lock_is_held(&rcu_lock_map) ||
>>                          lock_is_held(&rcu_sched_lock_map),
>> -                        "Illegal synchronize_srcu() in same-type SRCU (or in RCU) read-side critical section");
>> +                        "Illegal synchronize_srcu() in RCU read-side critical section");
>>  
>>         if (rcu_scheduler_active == RCU_SCHEDULER_INACTIVE)
>>                 return;

Should you not release here?

Thanks,

- Joel 


>> @@ -1282,6 +1282,7 @@ static void __synchronize_srcu(struct srcu_struct *ssp, bool do_norm)
>>         __call_srcu(ssp, &rcu.head, wakeme_after_rcu, do_norm);
>>         wait_for_completion(&rcu.completion);
>>         destroy_rcu_head_on_stack(&rcu.head);
>> +       rcu_lock_release(&ssp->dep_map);
>>  
>>         /*
>>          * Make sure that later code is ordered after the SRCU grace
> 




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux