On 04/24/2018 12:23 PM, Eric W. Biederman wrote: > Andrey Grodzovsky <andrey.grodzovsky at amd.com> writes: > >> Avoid calling wait_event_killable when you are possibly being called >> from get_signal routine since in that case you end up in a deadlock >> where you are alreay blocked in singla processing any trying to wait >> on a new signal. > I am curious what the call path that is problematic here. Here is the problematic call stack [<0>] drm_sched_entity_fini+0x10a/0x3a0 [gpu_sched] [<0>] amdgpu_ctx_do_release+0x129/0x170 [amdgpu] [<0>] amdgpu_ctx_mgr_fini+0xd5/0xe0 [amdgpu] [<0>] amdgpu_driver_postclose_kms+0xcd/0x440 [amdgpu] [<0>] drm_release+0x414/0x5b0 [drm] [<0>] __fput+0x176/0x350 [<0>] task_work_run+0xa1/0xc0 [<0>] do_exit+0x48f/0x1280 [<0>] do_group_exit+0x89/0x140 [<0>] get_signal+0x375/0x8f0 [<0>] do_signal+0x79/0xaa0 [<0>] exit_to_usermode_loop+0x83/0xd0 [<0>] do_syscall_64+0x244/0x270 [<0>] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 [<0>] 0xffffffffffffffff On exit from system call you process all the signals you received and encounter a fatal signal which triggers process termination. > > In general waiting seems wrong when the process has already been > fatally killed as indicated by PF_SIGNALED. So indeed this patch avoids wait in this case. > > Returning -ERESTARTSYS seems wrong as nothing should make it back even > to the edge of userspace here. Can you clarify please - what should be returned here instead ? Andrey > > Given that this is the only use of PF_SIGNALED outside of bsd process > accounting I find this code very suspicious. > > It looks the code path that gets called during exit is buggy and needs > to be sorted out. > > Eric > > >> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com> >> --- >> drivers/gpu/drm/scheduler/gpu_scheduler.c | 5 +++-- >> 1 file changed, 3 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c b/drivers/gpu/drm/scheduler/gpu_scheduler.c >> index 088ff2b..09fd258 100644 >> --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c >> +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c >> @@ -227,9 +227,10 @@ void drm_sched_entity_do_release(struct drm_gpu_scheduler *sched, >> return; >> /** >> * The client will not queue more IBs during this fini, consume existing >> - * queued IBs or discard them on SIGKILL >> + * queued IBs or discard them when in death signal state since >> + * wait_event_killable can't receive signals in that state. >> */ >> - if ((current->flags & PF_SIGNALED) && current->exit_code == SIGKILL) >> + if (current->flags & PF_SIGNALED) >> entity->fini_status = -ERESTARTSYS; >> else >> entity->fini_status = wait_event_killable(sched->job_scheduled,