On 2019-10-18 11:40 a.m., Kuehling, Felix wrote: > On 2019-10-18 10:27 a.m., Yang, Philip wrote: >> If device is locked for suspend and resume, kfd open should return >> failed -EAGAIN without creating process, otherwise the application exit >> to release the process will hang to wait for resume is done if the suspend >> and resume is stuck somewhere. This is backtrace: > > This doesn't fix processes that were created before suspend/resume got > stuck. They would still get stuck with the same backtrace. So this is > jut a band-aid. The real underlying problem, that is not getting > addressed, is suspend/resume getting stuck. > > Am I missing something? > This is to address application stuck to quit issue after suspend/resume got stuck. The real underlying suspend/resume issue should be addressed separately. I will submit v2 patch to fix processes that were created before suspend/resume got stuck. Philip > Regards, > Felix > > >> >> [Thu Oct 17 16:43:37 2019] INFO: task rocminfo:3024 blocked for more >> than 120 seconds. >> [Thu Oct 17 16:43:37 2019] Not tainted >> 5.0.0-rc1-kfd-compute-rocm-dkms-no-npi-1131 #1 >> [Thu Oct 17 16:43:37 2019] "echo 0 > >> /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> [Thu Oct 17 16:43:37 2019] rocminfo D 0 3024 2947 >> 0x80000000 >> [Thu Oct 17 16:43:37 2019] Call Trace: >> [Thu Oct 17 16:43:37 2019] ? __schedule+0x3d9/0x8a0 >> [Thu Oct 17 16:43:37 2019] schedule+0x32/0x70 >> [Thu Oct 17 16:43:37 2019] schedule_preempt_disabled+0xa/0x10 >> [Thu Oct 17 16:43:37 2019] __mutex_lock.isra.9+0x1e3/0x4e0 >> [Thu Oct 17 16:43:37 2019] ? __call_srcu+0x264/0x3b0 >> [Thu Oct 17 16:43:37 2019] ? process_termination_cpsch+0x24/0x2f0 >> [amdgpu] >> [Thu Oct 17 16:43:37 2019] process_termination_cpsch+0x24/0x2f0 >> [amdgpu] >> [Thu Oct 17 16:43:37 2019] >> kfd_process_dequeue_from_all_devices+0x42/0x60 [amdgpu] >> [Thu Oct 17 16:43:37 2019] kfd_process_notifier_release+0x1be/0x220 >> [amdgpu] >> [Thu Oct 17 16:43:37 2019] __mmu_notifier_release+0x3e/0xc0 >> [Thu Oct 17 16:43:37 2019] exit_mmap+0x160/0x1a0 >> [Thu Oct 17 16:43:37 2019] ? __handle_mm_fault+0xba3/0x1200 >> [Thu Oct 17 16:43:37 2019] ? exit_robust_list+0x5a/0x110 >> [Thu Oct 17 16:43:37 2019] mmput+0x4a/0x120 >> [Thu Oct 17 16:43:37 2019] do_exit+0x284/0xb20 >> [Thu Oct 17 16:43:37 2019] ? handle_mm_fault+0xfa/0x200 >> [Thu Oct 17 16:43:37 2019] do_group_exit+0x3a/0xa0 >> [Thu Oct 17 16:43:37 2019] __x64_sys_exit_group+0x14/0x20 >> [Thu Oct 17 16:43:37 2019] do_syscall_64+0x4f/0x100 >> [Thu Oct 17 16:43:37 2019] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >> >> Signed-off-by: Philip Yang <Philip.Yang@xxxxxxx> >> --- >> drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 6 +++--- >> 1 file changed, 3 insertions(+), 3 deletions(-) >> >> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c >> index d9e36dbf13d5..40d75c39f08e 100644 >> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c >> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c >> @@ -120,13 +120,13 @@ static int kfd_open(struct inode *inode, struct file *filep) >> return -EPERM; >> } >> >> + if (kfd_is_locked()) >> + return -EAGAIN; >> + >> process = kfd_create_process(filep); >> if (IS_ERR(process)) >> return PTR_ERR(process); >> >> - if (kfd_is_locked()) >> - return -EAGAIN; >> - >> dev_dbg(kfd_device, "process %d opened, compat mode (32 bit) - %d\n", >> process->pasid, process->is_32bit_user_mode); >> _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx