Andrey Grodzovsky <Andrey.Grodzovsky at amd.com> writes: > On 04/24/2018 12:42 PM, Eric W. Biederman wrote: >> Andrey Grodzovsky <andrey.grodzovsky at amd.com> writes: >> >>> Currently calling wait_event_killable as part of exiting process >>> will stall forever since SIGKILL generation is suppresed by PF_EXITING. >>> >>> In our partilaur case AMDGPU driver wants to flush all GPU jobs in >>> flight before shutting down. But if some job hangs the pipe we still want to >>> be able to kill it and avoid a process in D state. >> I should clarify. This absolutely can not be done. >> PF_EXITING is set just before a task starts tearing down it's signal >> handling. >> >> So delivering any signal, or otherwise depending on signal handling >> after PF_EXITING is set can not be done. That abstraction is gone. > > I see, so you suggest it's the driver responsibility to avoid creating > such code path that ends up > calling wait_event_killable from exit call stack (PF_EXITING == 1) ? I don't just suggest. I am saying clearly that any dependency on receiving SIGKILL after PF_EXITING is set is a bug. It looks safe (the bitmap is not freed) to use wait_event_killable on a dual use code path, but you can't expect SIGKILL ever to be delivered during fop->release, as f_op->release is called from exit after signal handling has been shutdown. The best generic code could do would be to always have fatal_signal_pending return true after PF_EXITING is set. Increasingly I am thinking that drm_sched_entity_fini should have a wait_event_timeout or no wait at all. The cleanup code should have a progress guarantee of it's own. Eric