On 04/25/2018 03:14 AM, Daniel Vetter wrote: > On Tue, Apr 24, 2018 at 05:37:08PM -0400, Andrey Grodzovsky wrote: >> >> On 04/24/2018 05:21 PM, Eric W. Biederman wrote: >>> Andrey Grodzovsky <Andrey.Grodzovsky at amd.com> writes: >>> >>>> On 04/24/2018 03:44 PM, Daniel Vetter wrote: >>>>> On Tue, Apr 24, 2018 at 05:46:52PM +0200, Michel Dänzer wrote: >>>>>> Adding the dri-devel list, since this is driver independent code. >>>>>> >>>>>> >>>>>> On 2018-04-24 05:30 PM, Andrey Grodzovsky wrote: >>>>>>> Avoid calling wait_event_killable when you are possibly being called >>>>>>> from get_signal routine since in that case you end up in a deadlock >>>>>>> where you are alreay blocked in singla processing any trying to wait >>>>>> Multiple typos here, "[...] already blocked in signal processing and [...]"? >>>>>> >>>>>> >>>>>>> on a new signal. >>>>>>> >>>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com> >>>>>>> --- >>>>>>> drivers/gpu/drm/scheduler/gpu_scheduler.c | 5 +++-- >>>>>>> 1 file changed, 3 insertions(+), 2 deletions(-) >>>>>>> >>>>>>> diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c b/drivers/gpu/drm/scheduler/gpu_scheduler.c >>>>>>> index 088ff2b..09fd258 100644 >>>>>>> --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c >>>>>>> +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c >>>>>>> @@ -227,9 +227,10 @@ void drm_sched_entity_do_release(struct drm_gpu_scheduler *sched, >>>>>>> return; >>>>>>> /** >>>>>>> * The client will not queue more IBs during this fini, consume existing >>>>>>> - * queued IBs or discard them on SIGKILL >>>>>>> + * queued IBs or discard them when in death signal state since >>>>>>> + * wait_event_killable can't receive signals in that state. >>>>>>> */ >>>>>>> - if ((current->flags & PF_SIGNALED) && current->exit_code == SIGKILL) >>>>>>> + if (current->flags & PF_SIGNALED) >>>>> You want fatal_signal_pending() here, instead of inventing your own broken >>>>> version. >>>> I rely on current->flags & PF_SIGNALED because this being set from >>>> within get_signal, >>> It doesn't mean that. Unless you are called by do_coredump (you >>> aren't). >> Looking in latest code here >> https://elixir.bootlin.com/linux/v4.17-rc2/source/kernel/signal.c#L2449 >> i see that current->flags |= PF_SIGNALED; is out side of >> if (sig_kernel_coredump(signr)) {...} scope > Ok I read some more about this, and I guess you go through process exit > and then eventually close. But I'm not sure. > > The code in drm_sched_entity_fini also looks strange: You unpark the > scheduler thread before you remove all the IBs. At least from the comment > that doesn't sound like what you want to do. I think it should be safe for the dying scheduler entity since before that (in drm_sched_entity_do_release) we set it's runqueue to NULL so no new jobs will be dequeued form it by the scheduler thread. > > But in general, PF_SIGNALED is really something deeply internal to the > core (used for some book-keeping and accounting). The drm scheduler is the > only thing looking at it, so smells like a layering violation. I suspect > (but without knowing what you're actually trying to achive here can't be > sure) you want to look at something else. > > E.g. PF_EXITING seems to be used in a lot more places to cancel stuff > that's no longer relevant when a task exits, not PF_SIGNALED. There's the > TIF_MEMDIE flag if you're hacking around issues with the oom-killer. > > This here on the other hand looks really fragile, and probably only does > what you want to do by accident. > -Daniel Yes , that what Eric also said and in the V2 patches i will try to change PF_EXITING Another issue is changing wait_event_killable to wait_event_timeout where I need to understand what TO value is acceptable for all the drivers using the scheduler, or maybe it should come as a property of drm_sched_entity. Andrey > >> Andrey >> >>> The closing of files does not happen in do_coredump. >>> Which means you are being called from do_exit. >>> In fact you are being called after exit_files which closes >>> the files. The actual __fput processing happens in task_work_run. >>> >>>> meaning I am within signal processing in which case I want to avoid >>>> any signal based wait for that task, >>>> From what i see in the code, task_struct.pending.signal is being set >>>> for other threads in same >>>> group (zap_other_threads) or for other scenarios, those task are still >>>> able to receive signals >>>> so calling wait_event_killable there will not have problem. >>> Excpet that you are geing called after from do_exit and after exit_files >>> which is after exit_signal. Which means that PF_EXITING has been set. >>> Which implies that the kernel signal handling machinery has already >>> started being torn down. >>> >>> Not as much as I would like to happen at that point as we are still >>> left with some old CLONE_PTHREAD messes in the code that need to be >>> cleaned up. >>> >>> Still given the fact you are task_work_run it is quite possible even >>> release_task has been run on that task before the f_op->release method >>> is called. So you simply can not count on signals working. >>> >>> Which in practice leaves a timeout for ending your wait. That code can >>> legitimately be in a context that is neither interruptible nor killable. >>> >>>>>>> entity->fini_status = -ERESTARTSYS; >>>>>>> else >>>>>>> entity->fini_status = wait_event_killable(sched->job_scheduled, >>>>> But really this smells like a bug in wait_event_killable, since >>>>> wait_event_interruptible does not suffer from the same bug. It will return >>>>> immediately when there's a signal pending. >>>> Even when wait_event_interruptible is called as following - >>>> ...->do_signal->get_signal->....->wait_event_interruptible ? >>>> I haven't tried it but wait_event_interruptible is very much alike to >>>> wait_event_killable so I would assume it will also >>>> not be interrupted if called like that. (Will give it a try just out >>>> of curiosity anyway) >>> As PF_EXITING is set want_signal should fail and the signal state of the >>> task should not be updatable by signals. >>> >>> Eric >>> >>> >> _______________________________________________ >> dri-devel mailing list >> dri-devel at lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/dri-devel