Andrey Grodzovsky <Andrey.Grodzovsky@xxxxxxx> writes: > On 04/24/2018 05:21 PM, Eric W. Biederman wrote: >> Andrey Grodzovsky <Andrey.Grodzovsky@xxxxxxx> writes: >> >>> On 04/24/2018 03:44 PM, Daniel Vetter wrote: >>>> On Tue, Apr 24, 2018 at 05:46:52PM +0200, Michel Dänzer wrote: >>>>> Adding the dri-devel list, since this is driver independent code. >>>>> >>>>> >>>>> On 2018-04-24 05:30 PM, Andrey Grodzovsky wrote: >>>>>> Avoid calling wait_event_killable when you are possibly being called >>>>>> from get_signal routine since in that case you end up in a deadlock >>>>>> where you are alreay blocked in singla processing any trying to wait >>>>> Multiple typos here, "[...] already blocked in signal processing and [...]"? >>>>> >>>>> >>>>>> on a new signal. >>>>>> >>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@xxxxxxx> >>>>>> --- >>>>>> drivers/gpu/drm/scheduler/gpu_scheduler.c | 5 +++-- >>>>>> 1 file changed, 3 insertions(+), 2 deletions(-) >>>>>> >>>>>> diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c b/drivers/gpu/drm/scheduler/gpu_scheduler.c >>>>>> index 088ff2b..09fd258 100644 >>>>>> --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c >>>>>> +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c >>>>>> @@ -227,9 +227,10 @@ void drm_sched_entity_do_release(struct drm_gpu_scheduler *sched, >>>>>> return; >>>>>> /** >>>>>> * The client will not queue more IBs during this fini, consume existing >>>>>> - * queued IBs or discard them on SIGKILL >>>>>> + * queued IBs or discard them when in death signal state since >>>>>> + * wait_event_killable can't receive signals in that state. >>>>>> */ >>>>>> - if ((current->flags & PF_SIGNALED) && current->exit_code == SIGKILL) >>>>>> + if (current->flags & PF_SIGNALED) >>>> You want fatal_signal_pending() here, instead of inventing your own broken >>>> version. >>> I rely on current->flags & PF_SIGNALED because this being set from >>> within get_signal, >> It doesn't mean that. Unless you are called by do_coredump (you >> aren't). > > Looking in latest code here > https://elixir.bootlin.com/linux/v4.17-rc2/source/kernel/signal.c#L2449 > i see that current->flags |= PF_SIGNALED; is out side of > if (sig_kernel_coredump(signr)) {...} scope In small words. You showed me the backtrace and I have read the code. PF_SIGNALED means you got killed by a signal. get_signal do_coredump do_group_exit do_exit exit_signals sets PF_EXITING exit_mm calls fput on mmaps calls sched_task_work exit_files calls fput on open files calls sched_task_work exit_task_work task_work_run /* you are here */ So strictly speaking you are inside of get_signal it is not meaningful to speak of yourself as within get_signal. I am a little surprised to see task_work_run called so early. I was mostly expecting it to happen when the dead task was scheduling away, like normally happens. Testing for PF_SIGNALED does not give you anything at all that testing for PF_EXITING (the flag that signal handling is shutdown) does not get you. There is no point in distinguishing PF_SIGNALED from any other path to do_exit. do_exit never returns. The task is dead. Blocking indefinitely while shutting down a task is a bad idea. Blocking indefinitely while closing a file descriptor is a bad idea. The task has been killed it can't get more dead. SIGKILL is meaningless at this point. So you need a timeout, or not to wait at all. Eric _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel