RE: [PATCH] drm/amdgpu: annotate a false positive recursive locking

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[AMD Public Use]

> [AMD Public Use]
>
>> [SNIP]
>>>> I think it is a limitation of init_rwsem.
>>> And exactly that's wrong, this is intentional and perfectly correct.
>>>
>>> [Dennis Li] I couldn't understand. Why is it a perfectly correct?
>>> For example, we define two rw_sem: a and b. If we don't check init_rwsem definition, we may think case#1 and case#2 have the same behavior, but in fact they are different.
>>>
>>> Case 1:
>>> init_rwsem(&a);
>>> init_rwsem(&b);
>>>
>>> Case2:
>>> void init_rwsem_ext(rw_sem* sem)
>>> {
>>>         init_rwsem(sem);
>>> }
>>> init_rwsem_ext(&a);
>>> init_rwsem_ext(&b);
>>>
>>> As far as I know it is perfectly possible that the locks in the hive are not always grabbed in the same order. And that's why lockdep is complaining about this.
>>> [Dennis Li] No. I takes a hive with two devices(a and b) to explain why lockdep complain.
>>>
>>> // Firstly driver lock the reset_sem of all devices 
>>> down_write(&a->reset_sem); do_suspend(a);
>>> down_write(&b->reset_sem);   // Because  b shared the same lock_class_key with a, lockdep will take a and b as the same rw_sem and complain here.
>>> do_suspend(b);
>>>
>>> // do recovery
>>> do_hive_recovery()
>>>
>>> // unlock the reset_sem of all devices do_resume(a); 
>>> up_write(&a->reset_sem); do_resume(b); up_write(&b->reset_sem);
>> Ah! Now I understand what you are working around. So the problem is the static lock_class_key in the macro?
>> [Dennis Li] Yes. The author of init_rwsem might not consider our similar use case.

Well this is also really not the intended use case.

When you lock the same rwsem class recursively you can easily run into deadlocks if you don't keep the order the same all the time.

And abstracting init functions behind your own layer is a no-go in the Linux kernel as well.

>>> What we should do instead is to make sure we have only a single lock for the complete hive instead.
>>> [Dennis Li] If we use a single lock, users will must wait for all devices resuming successfully, but in fact their tasks are only running in device a. It is not friendly to users.
>> Well that is intentional I would say. We can only restart submissions after all devices are resumed successfully, cause otherwise it can happen that a job on device A depends on memory on device B which isn't accessible.
>> [Dennis Li] Yes, you are right. Driver have make sure that the shared resources(such as the shard memory) are ready before unlock the lock of adev one by one. But each device still has private hardware resources such as video  and display.
> Yeah, but those are negligible and we have a team working on display support for XGMI. So this will sooner or later become a problem as well.
>
> I still think that a single rwsem for the whole hive is still the best option here.
>
> [Dennis Li] For your above concern, we can open a new thread to discuss it. If we make a decision to use a single after discussing, we can create another patch for it.

That's a really good argument, but I still hesitate to merge this patch. 
How severe is the lockdep splat?

At bare minimum we need a "/* TODO: ...." comment why we do this and how to remove the workaround again.
[Dennis Li] It is not a workaround. According to design of lockdep, each rw_semaphore should has a separated lock_class_key. I have explained that init_rwsem has the above limitation, so we must correct it. Core network driver (net/core/dev.c) has the similar use case with us.

Regards,
Christian.

>
> Best Regards
> Dennis lI
>> Regards,
>> Christian.
>>
>>> Regards,
>>> Christian.
>>>
>>> Am 06.08.20 um 11:15 schrieb Li, Dennis:
>>>> [AMD Official Use Only - Internal Distribution Only]
>>>>
>>>> Hi, Christian,
>>>>          For this case, it is safe to use separated lock key. Please see the definition of init_rwsem as the below. Every init_rwsem calling will use a new static key, but devices of  the hive share the same code to do initialization, so their lock_class_key are the same. I think it is a limitation of init_rwsem.  In our case, it should be correct that reset_sem of each adev has different  lock_class_key. BTW, this change doesn't effect dead-lock detection and just correct it.
>>>>
>>>> #define init_rwsem(sem)					\
>>>> do {								\
>>>> 	static struct lock_class_key __key;			\
>>>> 								\
>>>> 	__init_rwsem((sem), #sem, &__key);			\
>>>> } while (0)
>>>>
>>>> Best Regards
>>>> Dennis Li
>>>> -----Original Message-----
>>>> From: Koenig, Christian <Christian.Koenig@xxxxxxx>
>>>> Sent: Thursday, August 6, 2020 4:13 PM
>>>> To: Li, Dennis <Dennis.Li@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx; 
>>>> Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Kuehling, Felix 
>>>> <Felix.Kuehling@xxxxxxx>; Zhang, Hawking <Hawking.Zhang@xxxxxxx>
>>>> Subject: Re: [PATCH] drm/amdgpu: annotate a false positive 
>>>> recursive locking
>>>>
>>>> Preventing locking problems during implementation is obviously a good approach, but lockdep has proven to be massively useful for finding and fixing problems.
>>>>
>>>> Disabling lockdep splat by annotating lock with separate classes is usually a no-go and only allowed if there is no other potential approach.
>>>>
>>>> In this case here we should really clean things up instead.
>>>>
>>>> Christian.
>>>>
>>>> Am 06.08.20 um 09:44 schrieb Li, Dennis:
>>>>> [AMD Official Use Only - Internal Distribution Only]
>>>>>
>>>>> Hi, Christian,
>>>>>            I agree with your concern. However we shouldn't rely on system to detect dead-lock, and should consider this when doing code implementation. In fact, dead-lock detection isn't enabled by default.
>>>>>            For your proposal to remove reset_sem into the hive structure, we can open a new topic to discuss it. Currently we couldn't make sure which is the best solution. For example, with your proposal, we must wait for all devices resuming successfully before resubmit an old task in one device, which will effect performance.
>>>>>
>>>>> Best Regards
>>>>> Dennis Li
>>>>> -----Original Message-----
>>>>> From: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of 
>>>>> Christian König
>>>>> Sent: Thursday, August 6, 2020 3:08 PM
>>>>> To: Li, Dennis <Dennis.Li@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx; 
>>>>> Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Kuehling, Felix 
>>>>> <Felix.Kuehling@xxxxxxx>; Zhang, Hawking <Hawking.Zhang@xxxxxxx>
>>>>> Subject: Re: [PATCH] drm/amdgpu: annotate a false positive 
>>>>> recursive locking
>>>>>
>>>>> Am 06.08.20 um 09:02 schrieb Dennis Li:
>>>>>> [  584.110304] ============================================
>>>>>> [  584.110590] WARNING: possible recursive locking detected
>>>>>> [  584.110876] 5.6.0-deli-v5.6-2848-g3f3109b0e75f #1 Tainted: G           OE
>>>>>> [  584.111164] --------------------------------------------
>>>>>> [  584.111456] kworker/38:1/553 is trying to acquire lock:
>>>>>> [  584.111721] ffff9b15ff0a47a0 (&adev->reset_sem){++++}, at:
>>>>>> amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu] [  584.112112]
>>>>>>                      but task is already holding lock:
>>>>>> [  584.112673] ffff9b1603d247a0 (&adev->reset_sem){++++}, at:
>>>>>> amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu] [  584.113068]
>>>>>>                      other info that might help us debug this:
>>>>>> [  584.113689]  Possible unsafe locking scenario:
>>>>>>
>>>>>> [  584.114350]        CPU0
>>>>>> [  584.114685]        ----
>>>>>> [  584.115014]   lock(&adev->reset_sem);
>>>>>> [  584.115349]   lock(&adev->reset_sem);
>>>>>> [  584.115678]
>>>>>>                       *** DEADLOCK ***
>>>>>>
>>>>>> [  584.116624]  May be due to missing lock nesting notation
>>>>>>
>>>>>> [  584.117284] 4 locks held by kworker/38:1/553:
>>>>>> [  584.117616]  #0: ffff9ad635c1d348 
>>>>>> ((wq_completion)events){+.+.},
>>>>>> at: process_one_work+0x21f/0x630 [  584.117967]  #1:
>>>>>> ffffac708e1c3e58 ((work_completion)(&con->recovery_work)){+.+.}, at:
>>>>>> process_one_work+0x21f/0x630 [  584.118358]  #2: ffffffffc1c2a5d0 
>>>>>> (&tmp->hive_lock){+.+.}, at: amdgpu_device_gpu_recover+0xae/0x1030 [amdgpu] [  584.118786]  #3: ffff9b1603d247a0 (&adev->reset_sem){++++}, at: amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu] [  584.119222]
>>>>>>                      stack backtrace:
>>>>>> [  584.119990] CPU: 38 PID: 553 Comm: kworker/38:1 Kdump: loaded Tainted: G           OE     5.6.0-deli-v5.6-2848-g3f3109b0e75f #1
>>>>>> [  584.120782] Hardware name: Supermicro 
>>>>>> SYS-7049GP-TRT/X11DPG-QT, BIOS 3.1 05/23/2019 [  584.121223] 
>>>>>> Workqueue: events amdgpu_ras_do_recovery [amdgpu] [  584.121638] Call Trace:
>>>>>> [  584.122050]  dump_stack+0x98/0xd5 [  584.122499]
>>>>>> __lock_acquire+0x1139/0x16e0 [  584.122931]  ?
>>>>>> trace_hardirqs_on+0x3b/0xf0 [  584.123358]  ?
>>>>>> cancel_delayed_work+0xa6/0xc0 [  584.123771]
>>>>>> lock_acquire+0xb8/0x1c0 [  584.124197]  ?
>>>>>> amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu] [ 584.124599]  down_write+0x49/0x120 [  584.125032]  ?
>>>>>> amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu] [  584.125472]
>>>>>> amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu] [  584.125910]  ?
>>>>>> amdgpu_ras_error_query+0x1b8/0x2a0 [amdgpu] [  584.126367]
>>>>>> amdgpu_ras_do_recovery+0x159/0x190 [amdgpu] [  584.126789]
>>>>>> process_one_work+0x29e/0x630 [  584.127208]
>>>>>> worker_thread+0x3c/0x3f0 [  584.127621]  ?
>>>>>> __kthread_parkme+0x61/0x90 [  584.128014]
>>>>>> kthread+0x12f/0x150 [  584.128402]  ? 
>>>>>> kthread+process_one_work+0x630/0x630 [
>>>>>> 584.128790]  ? kthread_park+0x90/0x90 [  584.129174]
>>>>>> ret_from_fork+0x3a/0x50
>>>>>>
>>>>>> Each adev has owned lock_class_key to avoid false positive 
>>>>>> recursive locking.
>>>>> NAK, that is not a false positive but a real problem.
>>>>>
>>>>> The issue here is that we have multiple reset semaphores, one for each device in the hive. If those are not acquired in the correct order we deadlock.
>>>>>
>>>>> The real solution would be to move the reset_sem into the hive structure and make sure that we lock it only once.
>>>>>
>>>>> Christian.
>>>>>
>>>>>> Signed-off-by: Dennis Li <Dennis.Li@xxxxxxx>
>>>>>> Change-Id: I7571efeccbf15483982031d00504a353031a854a
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>>>> index e97c088d03b3..766dc8f8c8a0 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>>>> @@ -967,6 +967,7 @@ struct amdgpu_device {
>>>>>>        	atomic_t                        in_gpu_reset;
>>>>>>        	enum pp_mp1_state               mp1_state;
>>>>>>        	struct rw_semaphore	reset_sem;
>>>>>> +	struct lock_class_key lock_key;
>>>>>>        	struct amdgpu_doorbell_index doorbell_index;
>>>>>>        
>>>>>>        	struct mutex			notifier_lock;
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>> index 6c572db42d92..d78df9312d34 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>> @@ -3037,6 +3037,7 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>>>>>>        	mutex_init(&adev->virt.vf_errors.lock);
>>>>>>        	hash_init(adev->mn_hash);
>>>>>>        	init_rwsem(&adev->reset_sem);
>>>>>> +	lockdep_set_class(&adev->reset_sem, &adev->lock_key);
>>>>>>        	atomic_set(&adev->in_gpu_reset, 0);
>>>>>>        	mutex_init(&adev->psp.mutex);
>>>>>>        	mutex_init(&adev->notifier_lock);
>>>>> _______________________________________________
>>>>> amd-gfx mailing list
>>>>> amd-gfx@xxxxxxxxxxxxxxxxxxxxx
>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2F
>>>>> l
>>>>> i
>>>>> s
>>>>> t
>>>>> s.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=02%7C01%
>>>>> 7
>>>>> C
>>>>> D
>>>>> e
>>>>> nnis.Li%40amd.com%7C56c95f939ddd441bd10408d839d77c9e%7C3dd8961fe48
>>>>> 8
>>>>> 4
>>>>> e
>>>>> 6
>>>>> 08e11a82d994e183d%7C0%7C0%7C637322944985436656&amp;sdata=%2FBoRyEW
>>>>> 3
>>>>> i
>>>>> K
>>>>> 5
>>>>> 9Y52ctLWd4y1lOmi2CInb6lpIgAF88i4%3D&amp;reserved=0
_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux