On 2019-10-18 10:27 a.m., Yang, Philip wrote: > If device is locked for suspend and resume, kfd open should return > failed -EAGAIN without creating process, otherwise the application exit > to release the process will hang to wait for resume is done if the suspend > and resume is stuck somewhere. This is backtrace: This doesn't fix processes that were created before suspend/resume got stuck. They would still get stuck with the same backtrace. So this is jut a band-aid. The real underlying problem, that is not getting addressed, is suspend/resume getting stuck. Am I missing something? Regards, Felix > > [Thu Oct 17 16:43:37 2019] INFO: task rocminfo:3024 blocked for more > than 120 seconds. > [Thu Oct 17 16:43:37 2019] Not tainted > 5.0.0-rc1-kfd-compute-rocm-dkms-no-npi-1131 #1 > [Thu Oct 17 16:43:37 2019] "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [Thu Oct 17 16:43:37 2019] rocminfo D 0 3024 2947 > 0x80000000 > [Thu Oct 17 16:43:37 2019] Call Trace: > [Thu Oct 17 16:43:37 2019] ? __schedule+0x3d9/0x8a0 > [Thu Oct 17 16:43:37 2019] schedule+0x32/0x70 > [Thu Oct 17 16:43:37 2019] schedule_preempt_disabled+0xa/0x10 > [Thu Oct 17 16:43:37 2019] __mutex_lock.isra.9+0x1e3/0x4e0 > [Thu Oct 17 16:43:37 2019] ? __call_srcu+0x264/0x3b0 > [Thu Oct 17 16:43:37 2019] ? process_termination_cpsch+0x24/0x2f0 > [amdgpu] > [Thu Oct 17 16:43:37 2019] process_termination_cpsch+0x24/0x2f0 > [amdgpu] > [Thu Oct 17 16:43:37 2019] > kfd_process_dequeue_from_all_devices+0x42/0x60 [amdgpu] > [Thu Oct 17 16:43:37 2019] kfd_process_notifier_release+0x1be/0x220 > [amdgpu] > [Thu Oct 17 16:43:37 2019] __mmu_notifier_release+0x3e/0xc0 > [Thu Oct 17 16:43:37 2019] exit_mmap+0x160/0x1a0 > [Thu Oct 17 16:43:37 2019] ? __handle_mm_fault+0xba3/0x1200 > [Thu Oct 17 16:43:37 2019] ? exit_robust_list+0x5a/0x110 > [Thu Oct 17 16:43:37 2019] mmput+0x4a/0x120 > [Thu Oct 17 16:43:37 2019] do_exit+0x284/0xb20 > [Thu Oct 17 16:43:37 2019] ? handle_mm_fault+0xfa/0x200 > [Thu Oct 17 16:43:37 2019] do_group_exit+0x3a/0xa0 > [Thu Oct 17 16:43:37 2019] __x64_sys_exit_group+0x14/0x20 > [Thu Oct 17 16:43:37 2019] do_syscall_64+0x4f/0x100 > [Thu Oct 17 16:43:37 2019] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > Signed-off-by: Philip Yang <Philip.Yang@xxxxxxx> > --- > drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c > index d9e36dbf13d5..40d75c39f08e 100644 > --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c > @@ -120,13 +120,13 @@ static int kfd_open(struct inode *inode, struct file *filep) > return -EPERM; > } > > + if (kfd_is_locked()) > + return -EAGAIN; > + > process = kfd_create_process(filep); > if (IS_ERR(process)) > return PTR_ERR(process); > > - if (kfd_is_locked()) > - return -EAGAIN; > - > dev_dbg(kfd_device, "process %d opened, compat mode (32 bit) - %d\n", > process->pasid, process->is_32bit_user_mode); > _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx