RE: [PATCH] drm/amdgpu: fix a GPU hang issue when remove device

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[AMD Public Use]

Hi Dennis,

Thanks for digging this out. 

I'd like to understand where the mmio register access is from prior to driver call amdgpu_device_set_pg_state to disable gfxoff in ip_fini phase. I think we already move ungate gfx pg in very early stage of device_fini. The only GC register access, ahead of disabling gfxoff,  I can think of is gfx eop interrupt disablement. 

The call stack show register write failure through kiq, but that approach should be safe even with gfxoff enabled.

Regards,
Hawking

-----Original Message-----
From: Dennis Li <Dennis.Li@xxxxxxx> 
Sent: Wednesday, December 30, 2020 19:51
To: amd-gfx@xxxxxxxxxxxxxxxxxxxxx; Chen, Jiansong (Simon) <Jiansong.Chen@xxxxxxx>; Zhang, Hawking <Hawking.Zhang@xxxxxxx>; Koenig, Christian <Christian.Koenig@xxxxxxx>
Cc: Li, Dennis <Dennis.Li@xxxxxxx>
Subject: [PATCH] drm/amdgpu: fix a GPU hang issue when remove device

When GFXOFF is enabled and GPU is idle, driver will fail to access some registers. Therefore disable GFXOFF before unload device.

amdgpu 0000:03:00.0: amdgpu: amdgpu: finishing device.
amdgpu: cp queue pipe 4 queue 0 preemption failed amdgpu 0000:03:00.0: amdgpu: failed to write reg 2890 wait reg 28a2 amdgpu 0000:03:00.0: amdgpu: failed to write reg 1a6f4 wait reg 1a706 amdgpu 0000:03:00.0: amdgpu: failed to write reg 2890 wait reg 28a2 amdgpu 0000:03:00.0: amdgpu: failed to write reg 1a6f4 wait reg 1a706

Signed-off-by: Dennis Li <Dennis.Li@xxxxxxx>
Change-Id: I42431f5d0bf54909e1df888a0d72fc009d8e196c

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index e365c4fdcfe3..47d1291d5053 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -83,6 +83,8 @@ void amdgpu_driver_unload_kms(struct drm_device *dev)
 	if (adev == NULL)
 		return;
 
+	amdgpu_gfx_off_ctrl(adev, false);
+
 	amdgpu_unregister_gpu_instance(adev);
 
 	if (adev->rmmio == NULL)
--
2.17.1
_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx



[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux