Hi, Eric: On 10/15/18 4:28 PM, Eric W. Biederman wrote: > Enke Chen <enkechen@xxxxxxxxx> writes: > >> For simplicity and consistency, this patch provides an implementation >> for signal-based fault notification prior to the coredump of a child >> process. A new prctl command, PR_SET_PREDUMP_SIG, is defined that can >> be used by an application to express its interest and to specify the >> signal (SIGCHLD or SIGUSR1 or SIGUSR2) for such a notification. A new >> signal code (si_code), CLD_PREDUMP, is also defined for SIGCHLD. >> >> Background: >> >> As the coredump of a process may take time, in certain time-sensitive >> applications it is necessary for a parent process (e.g., a process >> manager) to be notified of a child's imminent death before the coredump >> so that the parent process can act sooner, such as re-spawning an >> application process, or initiating a control-plane fail-over. > > You talk about time senstive and then you talk about bash scripts. > I don't think your definition of time-sensitive and my definition match. It's certainly not my preference to have a process manager (or one for each application) written in bash scripts. But they do work, and are deployed. > > With that said I think the best solution would be to figure out how to > allow the coredump to run in parallel with the usual exit signal, and > exit code reaping of the process> > That would solve the problem for everyone, and would not introduce any > new complicated APIs. That would certainly help. But given the huge deployment of Linux, I don't think it would be feasible to change this fundamental behavior (signal post coredump). > > Short of that having the prctl in the process that receives the signals > they you are doing is the right way to go. Thanks for for the encouragement. > > You are however calling do_notify_parent_predump from the wrong > function, and frankly with the wrong locking. There are multiple paths > to the do_coredump function so you really want this notification from > do_coredump. This makes two - Oleg also suggested doing it in do_coredump(). I will look into it, perhaps also relocated proc_coredump_connector(). > > But I still think it would be better to solve the root cause problem and > change the coredump logic to be able to run in parallel with the normal > exit notification and zombie reaping logic. Then the problem you are > trying to solve goes away and everyone's code gets simpler. > > Eric > Thanks. -- Enke