[AMD Official Use Only - Internal Distribution Only] Quick grep says it is protected like that ... can you pls paste the full build error without {}? -Daniel [Dennis Li] hi, Daniel, the full build error as the following: make: Entering directory '/home/yajunl/workspace/amd/brahma-staging/BUILD/x86_64/linux' CC [M] drivers/gpu/drm/amd/amdgpu/amdgpu_device.o drivers/gpu/drm/amd/amdgpu/amdgpu_device.c: In function ‘amdgpu_device_lock_adev’: drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:4155:2: error: ‘else’ without a previous ‘if’ else ^~~~ scripts/Makefile.build:267: recipe for target 'drivers/gpu/drm/amd/amdgpu/amdgpu_device.o' failed make[1]: *** [drivers/gpu/drm/amd/amdgpu/amdgpu_device.o] Error 1 Makefile:1683: recipe for target 'drivers/gpu/drm/amd/amdgpu' failed make: *** [drivers/gpu/drm/amd/amdgpu] Error 2 > > Let me take a look, > Christian. > > > > > Regards, > > Guchun > > > > -----Original Message----- > > From: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of > > Christian König > > Sent: Tuesday, August 11, 2020 2:53 PM > > To: Li, Dennis <Dennis.Li@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx; > > Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Kuehling, Felix > > <Felix.Kuehling@xxxxxxx>; Zhang, Hawking <Hawking.Zhang@xxxxxxx>; > > daniel@xxxxxxxx > > Subject: Re: [PATCH v3] drm/amdgpu: annotate a false positive > > recursive locking > > > > Am 11.08.20 um 04:12 schrieb Dennis Li: > >> [ 584.110304] ============================================ > >> [ 584.110590] WARNING: possible recursive locking detected > >> [ 584.110876] 5.6.0-deli-v5.6-2848-g3f3109b0e75f #1 Tainted: G OE > >> [ 584.111164] -------------------------------------------- > >> [ 584.111456] kworker/38:1/553 is trying to acquire lock: > >> [ 584.111721] ffff9b15ff0a47a0 (&adev->reset_sem){++++}, at: > >> amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu] [ 584.112112] > >> but task is already holding lock: > >> [ 584.112673] ffff9b1603d247a0 (&adev->reset_sem){++++}, at: > >> amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu] [ 584.113068] > >> other info that might help us debug this: > >> [ 584.113689] Possible unsafe locking scenario: > >> > >> [ 584.114350] CPU0 > >> [ 584.114685] ---- > >> [ 584.115014] lock(&adev->reset_sem); > >> [ 584.115349] lock(&adev->reset_sem); > >> [ 584.115678] > >> *** DEADLOCK *** > >> > >> [ 584.116624] May be due to missing lock nesting notation > >> > >> [ 584.117284] 4 locks held by kworker/38:1/553: > >> [ 584.117616] #0: ffff9ad635c1d348 ((wq_completion)events){+.+.}, > >> at: process_one_work+0x21f/0x630 [ 584.117967] #1: > >> ffffac708e1c3e58 ((work_completion)(&con->recovery_work)){+.+.}, at: > >> process_one_work+0x21f/0x630 [ 584.118358] #2: ffffffffc1c2a5d0 > >> (&tmp->hive_lock){+.+.}, at: amdgpu_device_gpu_recover+0xae/0x1030 [amdgpu] [ 584.118786] #3: ffff9b1603d247a0 (&adev->reset_sem){++++}, at: amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu] [ 584.119222] > >> stack backtrace: > >> [ 584.119990] CPU: 38 PID: 553 Comm: kworker/38:1 Kdump: loaded Tainted: G OE 5.6.0-deli-v5.6-2848-g3f3109b0e75f #1 > >> [ 584.120782] Hardware name: Supermicro SYS-7049GP-TRT/X11DPG-QT, > >> BIOS 3.1 05/23/2019 [ 584.121223] Workqueue: events > >> amdgpu_ras_do_recovery [amdgpu] [ 584.121638] Call Trace: > >> [ 584.122050] dump_stack+0x98/0xd5 [ 584.122499] > >> __lock_acquire+0x1139/0x16e0 [ 584.122931] ? > >> trace_hardirqs_on+0x3b/0xf0 [ 584.123358] ? > >> cancel_delayed_work+0xa6/0xc0 [ 584.123771] > >> lock_acquire+0xb8/0x1c0 [ 584.124197] ? > >> amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu] [ 584.124599] down_write+0x49/0x120 [ 584.125032] ? > >> amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu] [ 584.125472] > >> amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu] [ 584.125910] ? > >> amdgpu_ras_error_query+0x1b8/0x2a0 [amdgpu] [ 584.126367] > >> amdgpu_ras_do_recovery+0x159/0x190 [amdgpu] [ 584.126789] > >> process_one_work+0x29e/0x630 [ 584.127208] > >> worker_thread+0x3c/0x3f0 [ 584.127621] ? > >> __kthread_parkme+0x61/0x90 [ 584.128014] > >> kthread+0x12f/0x150 [ 584.128402] ? process_one_work+0x630/0x630 > >> kthread+[ > >> 584.128790] ? kthread_park+0x90/0x90 [ 584.129174] > >> ret_from_fork+0x3a/0x50 > >> > >> Each adev has owned lock_class_key to avoid false positive > >> recursive locking. > >> > >> v2: > >> 1. register adev->lock_key into lockdep, otherwise lockdep will > >> report the below warning > >> > >> [ 1216.705820] BUG: key ffff890183b647d0 has not been registered! > >> [ 1216.705924] ------------[ cut here ]------------ [ 1216.705972] > >> DEBUG_LOCKS_WARN_ON(1) [ 1216.705997] WARNING: CPU: 20 PID: 541 at > >> kernel/locking/lockdep.c:3743 lockdep_init_map+0x150/0x210 > >> > >> v3: > >> change to use down_write_nest_lock to annotate the false dead-lock > >> warning. > >> > >> Signed-off-by: Dennis Li <Dennis.Li@xxxxxxx> > >> > >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > >> index 62ecac97fbd2..8a55b0bc044a 100644 > >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > >> @@ -4145,12 +4145,15 @@ static int amdgpu_do_asic_reset(struct amdgpu_hive_info *hive, > >> return r; > >> } > >> > >> -static bool amdgpu_device_lock_adev(struct amdgpu_device *adev) > >> +static bool amdgpu_device_lock_adev(struct amdgpu_device *adev, > >> +struct amdgpu_hive_info *hive) > >> { > >> if (atomic_cmpxchg(&adev->in_gpu_reset, 0, 1) != 0) > >> return false; > >> > >> - down_write(&adev->reset_sem); > >> + if (hive) { > >> + down_write_nest_lock(&adev->reset_sem, &hive->hive_lock); > >> + } else > >> + down_write(&adev->reset_sem); > > Coding style nit pick: You should drop the {} here. > > > > Apart from that the patch is Reviewed-by: Christian König > > <christian.koenig@xxxxxxx> > > > >> > >> atomic_inc(&adev->gpu_reset_counter); > >> switch (amdgpu_asic_reset_method(adev)) { @@ -4312,7 +4315,7 > >> @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, > >> > >> /* block all schedulers and reset given job's ring */ > >> list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) { > >> - if (!amdgpu_device_lock_adev(tmp_adev)) { > >> + if (!amdgpu_device_lock_adev(tmp_adev, hive)) { > >> DRM_INFO("Bailing on TDR for s_job:%llx, as another already in progress", > >> job ? job->base.id : -1); > >> r = 0; > > _______________________________________________ > > amd-gfx mailing list > > amd-gfx@xxxxxxxxxxxxxxxxxxxxx > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fli > > sts.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=02%7C01% > > 7CDennis.Li%40amd.com%7Cee5574efd1f2475112a908d83dca5daa%7C3dd8961fe > > 4884e608e11a82d994e183d%7C0%7C0%7C637327286674286678&sdata=K5Sx4 > > cO3DIQaxNRSo6q%2B8u%2BscGkCpy0ueQ0cOWyipFA%3D&reserved=0 > -- Daniel Vetter Software Engineer, Intel Corporation https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2F&data=02%7C01%7CDennis.Li%40amd.com%7Cee5574efd1f2475112a908d83dca5daa%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637327286674286678&sdata=7tC0Gakzg7uFgxH6N9pvn5%2B5TCYL3aKK%2FhHTkTyCmtM%3D&reserved=0 _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx