[AMD Official Use Only] Yes, as Christian mentioned, enabling CONFIG_LOCKDEP_SUPPORT will help debugging such deadlock issue. Meanwhile, can you give the following change(drop the lock protections in amdgpu_dpm_compute_clocks) a try? diff --git a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c index c73fb73e9628..50e89f5659fa 100644 --- a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c +++ b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c @@ -423,9 +423,7 @@ void amdgpu_dpm_compute_clocks(struct amdgpu_device *adev) if (!pp_funcs->pm_compute_clocks) return; - mutex_lock(&adev->pm.mutex); pp_funcs->pm_compute_clocks(adev->powerplay.pp_handle); - mutex_unlock(&adev->pm.mutex); } void amdgpu_dpm_enable_uvd(struct amdgpu_device *adev, bool enable) BR Evan > -----Original Message----- > From: Koenig, Christian <Christian.Koenig@xxxxxxx> > Sent: Friday, April 1, 2022 4:56 PM > To: Arthur Marsh <arthur.marsh@xxxxxxxxxxxxxxxx>; Quan, Evan > <Evan.Quan@xxxxxxx> > Cc: Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Feng, Kenneth > <Kenneth.Feng@xxxxxxx>; Lazar, Lijo <Lijo.Lazar@xxxxxxx>; amd- > gfx@xxxxxxxxxxxxxxxxxxxxx > Subject: Re: [PATCH V4 17/17] drm/amd/pm: unified lock protections in > amdgpu_dpm.c > > Hi Arthur, > > apart from blacklisting amdgpu I generally advise to SSH from another > computer into the affected system if you have a problem like this. > > Additionally to what Evan said I suggest that you enable > CONFIG_LOCKDEP_SUPPORT in your kernel configuration. This will yield > warnings in your system log in case of deadlocks or accidentally forgetting to > unlock something. > > Regards, > Christian. > > Am 01.04.22 um 10:49 schrieb Arthur Marsh: > > Hi Evan, this is what was logged (filtering for drm and amdgpu) when I > > blacklisted amdgpu then manually did: > > > > modprobe amdgpu si_support=1 gpu_recovery=1 > > > > Apr 1 18:31:14 am64 kernel: [ 0.000000] Command line: > BOOT_IMAGE=/vmlinuz-5.17.0+ root=UUID=39706f53-7c27-4310-b22a- > 36c7b042d1a1 ro amdgpu.audio=1 amdgpu.si_support=1 > radeon.si_support=0 page_owner=on amdgpu.gpu_recovery=1 udev.log- > priority=info rd.udev.log-priority=info > > Apr 1 18:31:14 am64 kernel: [ 0.059624] Kernel command line: > BOOT_IMAGE=/vmlinuz-5.17.0+ root=UUID=39706f53-7c27-4310-b22a- > 36c7b042d1a1 ro amdgpu.audio=1 amdgpu.si_support=1 > radeon.si_support=0 page_owner=on amdgpu.gpu_recovery=1 udev.log- > priority=info rd.udev.log-priority=info > > > > Apr 1 18:33:43 am64 kernel: [ 245.724485] ACPI: bus type > > drm_connector registered Apr 1 18:33:44 am64 kernel: [ 245.945020] [drm] > amdgpu kernel modesetting enabled. > > Apr 1 18:33:44 am64 kernel: [ 245.945140] amdgpu 0000:01:00.0: > > vgaarb: deactivate vga console Apr 1 18:33:44 am64 kernel: [ 245.946413] > [drm] initializing kernel modesetting (VERDE 0x1002:0x682B 0x1458:0x22CA > 0x87). > > Apr 1 18:33:44 am64 kernel: [ 245.946423] amdgpu 0000:01:00.0: > > amdgpu: Trusted Memory Zone (TMZ) feature not supported Apr 1 > > 18:33:44 am64 kernel: [ 245.946448] [drm] register mmio base: > > 0xFE8C0000 Apr 1 18:33:44 am64 kernel: [ 245.946451] [drm] register > > mmio size: 262144 Apr 1 18:33:44 am64 kernel: [ 245.946642] [drm] > > add ip block number 0 <si_common> Apr 1 18:33:44 am64 kernel: [ > > 245.946657] [drm] add ip block number 1 <gmc_v6_0> Apr 1 18:33:44 > > am64 kernel: [ 245.946660] [drm] add ip block number 2 <si_ih> Apr 1 > > 18:33:44 am64 kernel: [ 245.946663] [drm] add ip block number 3 > > <gfx_v6_0> Apr 1 18:33:44 am64 kernel: [ 245.946666] [drm] add ip > > block number 4 <si_dma> Apr 1 18:33:44 am64 kernel: [ 245.946668] > > [drm] add ip block number 5 <si_dpm> Apr 1 18:33:44 am64 kernel: [ > > 245.946671] [drm] add ip block number 6 <dce_v6_0> Apr 1 18:33:44 > > am64 kernel: [ 245.946674] [drm] add ip block number 7 <uvd_v3_1> Apr > > 1 18:33:44 am64 kernel: [ 245.990113] [drm] BIOS signature incorrect > > 20 7 Apr 1 18:33:44 am64 kernel: [ 245.990146] amdgpu 0000:01:00.0: > > No more image in the PCI ROM Apr 1 18:33:44 am64 kernel: [ > > 245.991510] amdgpu 0000:01:00.0: amdgpu: Fetched VBIOS from ROM BAR > > Apr 1 18:33:44 am64 kernel: [ 245.991516] amdgpu: ATOM BIOS: > > xxx-xxx-xxx Apr 1 18:33:44 am64 kernel: [ 245.991539] amdgpu > > 0000:01:00.0: amdgpu: PCIE atomic ops is not supported Apr 1 18:33:44 > > am64 kernel: [ 245.991841] [drm] vm size is 64 GB, 2 levels, block > > size is 10-bit, fragment size is 9-bit Apr 1 18:33:44 am64 kernel: > [ 246.045705] amdgpu 0000:01:00.0: amdgpu: VRAM: 2048M > 0x000000F400000000 - 0x000000F47FFFFFFF (2048M used) Apr 1 18:33:44 > am64 kernel: [ 246.045719] amdgpu 0000:01:00.0: amdgpu: GART: 1024M > 0x000000FF00000000 - 0x000000FF3FFFFFFF Apr 1 18:33:44 am64 kernel: > [ 246.045736] [drm] Detected VRAM RAM=2048M, BAR=256M Apr 1 18:33:44 > am64 kernel: [ 246.045739] [drm] RAM width 128bits DDR3 Apr 1 18:33:44 > am64 kernel: [ 246.045825] [drm] amdgpu: 2048M of VRAM memory ready > Apr 1 18:33:44 am64 kernel: [ 246.045829] [drm] amdgpu: 3072M of GTT > memory ready. > > Apr 1 18:33:44 am64 kernel: [ 246.045854] [drm] GART: num cpu pages > > 262144, num gpu pages 262144 Apr 1 18:33:44 am64 kernel: [ 246.046180] > amdgpu 0000:01:00.0: amdgpu: PCIE GART of 1024M enabled (table at > 0x000000F400900000). > > Apr 1 18:33:44 am64 kernel: [ 246.084159] [drm] Internal thermal > > controller with fan control Apr 1 18:33:44 am64 kernel: [ > > 246.084180] [drm] amdgpu: dpm initialized Apr 1 18:33:44 am64 kernel: > > [ 246.084264] [drm] AMDGPU Display Connectors Apr 1 18:33:44 am64 > kernel: [ 246.084268] [drm] Connector 0: > > Apr 1 18:33:44 am64 kernel: [ 246.084270] [drm] HDMI-A-1 > > Apr 1 18:33:44 am64 kernel: [ 246.084272] [drm] HPD1 > > Apr 1 18:33:44 am64 kernel: [ 246.084274] [drm] DDC: 0x194c 0x194c > 0x194d 0x194d 0x194e 0x194e 0x194f 0x194f > > Apr 1 18:33:44 am64 kernel: [ 246.084279] [drm] Encoders: > > Apr 1 18:33:44 am64 kernel: [ 246.084281] [drm] DFP1: > INTERNAL_UNIPHY > > Apr 1 18:33:44 am64 kernel: [ 246.084283] [drm] Connector 1: > > Apr 1 18:33:44 am64 kernel: [ 246.084285] [drm] DVI-D-1 > > Apr 1 18:33:44 am64 kernel: [ 246.084287] [drm] HPD2 > > Apr 1 18:33:44 am64 kernel: [ 246.084289] [drm] DDC: 0x1950 0x1950 > 0x1951 0x1951 0x1952 0x1952 0x1953 0x1953 > > Apr 1 18:33:44 am64 kernel: [ 246.084293] [drm] Encoders: > > Apr 1 18:33:44 am64 kernel: [ 246.084295] [drm] DFP2: > INTERNAL_UNIPHY > > Apr 1 18:33:44 am64 kernel: [ 246.084297] [drm] Connector 2: > > Apr 1 18:33:44 am64 kernel: [ 246.084299] [drm] VGA-1 > > Apr 1 18:33:44 am64 kernel: [ 246.084301] [drm] DDC: 0x1970 0x1970 > 0x1971 0x1971 0x1972 0x1972 0x1973 0x1973 > > Apr 1 18:33:44 am64 kernel: [ 246.084305] [drm] Encoders: > > Apr 1 18:33:44 am64 kernel: [ 246.084307] [drm] CRT1: > INTERNAL_KLDSCP_DAC1 > > Apr 1 18:33:44 am64 kernel: [ 246.135615] [drm] Found UVD firmware > > Version: 64.0 Family ID: 13 Apr 1 18:33:44 am64 kernel: [ > > 246.137371] [drm] PCIE gen 2 link speeds already enabled Apr 1 18:33:44 > am64 kernel: [ 246.674277] [drm] UVD initialized successfully. > > Apr 1 18:33:44 am64 kernel: [ 246.674849] amdgpu 0000:01:00.0: > > amdgpu: SE 1, SH per SE 2, CU per SH 5, active_cu_number 8 Apr 1 > > 18:33:45 am64 kernel: [ 247.008964] [drm] Initialized amdgpu 3.46.0 > > 20150101 for 0000:01:00.0 on minor 0 Apr 1 18:33:45 am64 kernel: [ > > 247.068412] fbcon: amdgpudrmfb (fb0) is primary device > > > > The monitor still went blank but the magic sysreq sync and boot > > worked, allowing capture of the above log but nothing after the line above. > > > > Regards, > > > > Arthur Marsh.