Mukul posted a patch for this already. "drm/amdgpu: Fix module unload hang with RAS enabled" Thanks, Lijo On 1/24/2024 9:09 AM, YiPeng Chai wrote: > The following is the error message: > [ 484.495995] task:rmmod state:D stack: 0 pid: 2195 ppid: 2194 flags:0x00004002 > [ 484.496000] Call Trace: > [ 484.496002] <TASK> > [ 484.496007] __schedule+0xaf8/0x1870 > [ 484.496015] ? update_load_avg+0x74/0x7b0 > [ 484.496021] schedule+0x58/0xc0 > [ 484.496022] schedule_timeout+0x276/0x480 > [ 484.496024] ? ttwu_do_activate+0x9f/0x570 > [ 484.496028] wait_for_completion+0x8b/0x130 > [ 484.496030] kthread_stop+0x71/0x1a0 > [ 484.496037] amdgpu_ras_pre_fini+0x5b/0xe0 [amdgpu] > [ 484.496202] amdgpu_device_fini_hw+0x165/0x4fc [amdgpu] > [ 484.496406] ? blocking_notifier_chain_unregister+0x56/0xb0 > [ 484.496409] amdgpu_driver_unload_kms+0x51/0x60 [amdgpu] > [ 484.496522] amdgpu_pci_remove+0x3b/0x70 [amdgpu] > [ 484.496627] pci_device_remove+0x39/0xa0 > [ 484.496631] device_remove+0x46/0x70 > [ 484.496634] device_release_driver_internal+0xcd/0x140 > [ 484.496636] driver_detach+0x4a/0x90 > [ 484.496638] bus_remove_driver+0x6c/0xf0 > [ 484.496641] driver_unregister+0x31/0x70 > [ 484.496643] pci_unregister_driver+0x40/0x90 > [ 484.496647] amdgpu_exit+0x15/0x22b [amdgpu] > [ 484.496849] __x64_sys_delete_module+0x14a/0x260 > [ 484.496853] ? syscall_exit_to_user_mode+0x26/0x40 > [ 484.496856] ? __x64_sys_close+0x12/0x40 > [ 484.496860] do_syscall_64+0x5c/0x80 > [ 484.496861] ? __x64_sys_read+0x1a/0x20 > [ 484.496863] ? do_syscall_64+0x69/0x80 > [ 484.496864] ? syscall_exit_to_user_mode+0x26/0x40 > [ 484.496866] ? do_syscall_64+0x69/0x80 > [ 484.496866] ? exc_page_fault+0x87/0x170 > [ 484.496868] ? asm_exc_page_fault+0x8/0x30 > [ 484.496871] entry_SYSCALL_64_after_hwframe+0x44/0xae > > Signed-off-by: YiPeng Chai <YiPeng.Chai@xxxxxxx> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c > index 8c1cb3ec2762..768a98f4bd22 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c > @@ -2674,7 +2674,11 @@ static int amdgpu_ras_page_retirement_thread(void *param) > while (!kthread_should_stop()) { > > wait_event_interruptible(con->page_retirement_wq, > - atomic_read(&con->page_retirement_req_cnt)); > + atomic_read(&con->page_retirement_req_cnt) || > + kthread_should_stop()); > + > + if (kthread_should_stop()) > + break; > > dev_info(adev->dev, "Start processing page retirement. request:%d\n", > atomic_read(&con->page_retirement_req_cnt));