Re: [PATCH v5 1/2] kernel/signal: Signal-based pre-coredump notification

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi, Dave:

Thanks for your comments. You have indeed missed some of the prior reviews
and discussions. But that is OK.

Please see my replies inline.

On 11/28/18 7:19 AM, Dave Martin wrote:
> On Tue, Nov 27, 2018 at 10:54:41PM +0000, Enke Chen wrote:
>> [Repost as a series, as suggested by Andrew Morton]
>>
>> For simplicity and consistency, this patch provides an implementation
>> for signal-based fault notification prior to the coredump of a child
>> process. A new prctl command, PR_SET_PREDUMP_SIG, is defined that can
>> be used by an application to express its interest and to specify the
>> signal for such a notification.
>>
>> Changes to prctl(2):
>>
>>    PR_SET_PREDUMP_SIG (since Linux 4.20.x)
>>           Set the child pre-coredump signal of the calling process to
>>           arg2 (either a signal value in the range 1..maxsig, or 0 to
>>           clear). This is the signal that the calling process will get
>>           prior to the coredump of a child process. This value is
>>           cleared across execve(2), or for the child of a fork(2).
>>
>>    PR_GET_PREDUMP_SIG (since Linux 4.20.x)
>>           Return the current value of the child pre-coredump signal,
>>           in the location pointed to by (int *) arg2.
>>
>> Background:
>>
>> As the coredump of a process may take time, in certain time-sensitive
>> applications it is necessary for a parent process (e.g., a process
>> manager) to be notified of a child's imminent death before the coredump
>> so that the parent process can act sooner, such as re-spawning an
>> application process, or initiating a control-plane fail-over.
>>
>> One application is BFD. The early fault notification is a critical
>> component for maintaining BFD sessions (with a timeout value of
>> 50 msec or 100 msec) across a control-plane failure.
>>
>> Currently there are two ways for a parent process to be notified of a
>> child process's state change. One is to use the POSIX signal, and
>> another is to use the kernel connector module. The specific events and
>> actions are summarized as follows:
>>
>> Process Event    POSIX Signal                Connector-based
>> ----------------------------------------------------------------------
>> ptrace_attach()  do_notify_parent_cldstop()  proc_ptrace_connector()
>>                  SIGCHLD / CLD_STOPPED
>>
>> ptrace_detach()  do_notify_parent_cldstop()  proc_ptrace_connector()
>>                  SIGCHLD / CLD_CONTINUED
>>
>> pre_coredump/    N/A                         proc_coredump_connector()
>> get_signal()
>>
>> post_coredump/   do_notify_parent()          proc_exit_connector()
>> do_exit()        SIGCHLD / exit_signal
>> ----------------------------------------------------------------------
>>
>> As shown in the table, the signal-based pre-coredump notification is not
>> currently available. In some cases using a connector-based notification
>> can be quite complicated (e.g., when a process manager is written in shell
>> scripts and thus is subject to certain inherent limitations), and a
>> signal-based notification would be simpler and better suited.
> 
> Since this is a notification of a change of process status, would it be
> more natural to send it through SIGCHLD?
> 
> As with other supplementary child status events, a flag could be added
> for wait and sigaction.sa_flags to indicate whether the parent wants
> this event to be reported or not.
> 
> Then a suitable CLD_XXX could be defined for this, and we could
> piggyback on PR_{SET,GET}_PDEATHSIG rather than having to have something
> new.
> 

> (I hadn't been watching this thread closely, so apologies if this has
> been discussed already.)

Indeed, I defined the signal code CLD_PREDUMP for SIGCHLD initially, but it
was removed after discussion:

v3 --> v4:

Addressed review comments from Oleg Nesterov, and Eric W. Biederman,
including:
o remove the definition CLD_PREDUMP.
---

You can look at the discussions in the email thread, in particular several
issues pointed out by Eric Biederman, and my reply to Eric.

There are two models 1:1 (one process manager with one child process), and 1:N
(one process manager with many child processes). A legacy signal like SIGCHLD
would not work in the 1:N model due to compression/loss of legacy signals. One
need to use a RT signal in that case. 

One more point in my reply:

When an application chooses a signal for pre-coredump notification, it is much
simpler and robust for the signal to be dedicated for that purpose (in the parent)
and not be mixed with other semantics. The "signo + pid" should be sufficient for
the parent process in both 1:1 and 1:N models.

> 
>>
>> Signed-off-by: Enke Chen <enkechen@xxxxxxxxx>
>> Reviewed-by: Oleg Nesterov <oleg@xxxxxxxxxx>
>> ---
>> v4 -> v5:
>> Addressed review comments from Oleg Nesterov:
>> o use rcu_read_lock instead.
>> o revert back to notify the real_parent.
>>
>>  fs/coredump.c                | 23 +++++++++++++++++++++++
>>  fs/exec.c                    |  3 +++
>>  include/linux/sched/signal.h |  3 +++
>>  include/uapi/linux/prctl.h   |  4 ++++
>>  kernel/sys.c                 | 13 +++++++++++++
>>  5 files changed, 46 insertions(+)
>>
>> diff --git a/fs/coredump.c b/fs/coredump.c
>> index e42e17e..740b1bb 100644
>> --- a/fs/coredump.c
>> +++ b/fs/coredump.c
>> @@ -536,6 +536,24 @@ static int umh_pipe_setup(struct subprocess_info *info, struct cred *new)
>>  	return err;
>>  }
>>  
>> +/*
>> + * While do_notify_parent() notifies the parent of a child's death post
>> + * its coredump, this function lets the parent (if so desired) know about
>> + * the imminent death of a child just prior to its coredump.
>> + */
>> +static void do_notify_parent_predump(void)
>> +{
>> +	struct task_struct *parent;
>> +	int sig;
>> +
>> +	rcu_read_lock();
>> +	parent = rcu_dereference(current->real_parent);
>> +	sig = parent->signal->predump_signal;
>> +	if (sig != 0)
>> +		do_send_sig_info(sig, SEND_SIG_NOINFO, parent, PIDTYPE_TGID);
> 
> Doesn't this send si_code == SI_USER.  That seems wrong: the receiving
> process wouldn't not be able to distinguish a real pre-coredump
> notification from a bogus one sent by kill(2) et
The receiving process (i.e., the process manager) knows the PID of all
its child processes. Thus it can easily tell whether the signal is from
a child or not.

> 
> SEND_SIG_PRIV also looks wrong, because it assumes that the sender is
> "the kernel" so there is no si_pid.

That is why SEND_SIG_NOINFO is chosen here for carrying the "pid". What
matters to the parent process is the "signo + pid" for identifying the
child process for the pre-coredump notification.

> 
> This may be another reason for building on top of SIGCHLD which already
> has the required (but weird) "si_pid == affected process" semantics,
> rather than si_pid indicating the sender.
> 
>> +	rcu_read_unlock();
>> +}
>> +
>>  void do_coredump(const kernel_siginfo_t *siginfo)
>>  {
>>  	struct core_state core_state;
>> @@ -590,6 +608,11 @@ void do_coredump(const kernel_siginfo_t *siginfo)
>>  	if (retval < 0)
>>  		goto fail_creds;
>>  
>> +	/*
>> +	 * Send the pre-coredump signal to the parent if requested.
>> +	 */
>> +	do_notify_parent_predump();
>> +
>>  	old_cred = override_creds(cred);
>>  
>>  	ispipe = format_corename(&cn, &cprm);
>> diff --git a/fs/exec.c b/fs/exec.c
>> index fc281b7..7714da7 100644
>> --- a/fs/exec.c
>> +++ b/fs/exec.c
>> @@ -1181,6 +1181,9 @@ static int de_thread(struct task_struct *tsk)
>>  	/* we have changed execution domain */
>>  	tsk->exit_signal = SIGCHLD;
>>  
>> +	/* Clear the pre-coredump signal before loading a new binary */
>> +	sig->predump_signal = 0;
>> +
>>  #ifdef CONFIG_POSIX_TIMERS
>>  	exit_itimers(sig);
>>  	flush_itimer_signals();
>> diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
>> index 13789d1..728ef68 100644
>> --- a/include/linux/sched/signal.h
>> +++ b/include/linux/sched/signal.h
>> @@ -112,6 +112,9 @@ struct signal_struct {
>>  	int			group_stop_count;
>>  	unsigned int		flags; /* see SIGNAL_* flags below */
>>  
>> +	/* The signal sent prior to a child's coredump */
>> +	int			predump_signal;
>> +
>>  	/*
>>  	 * PR_SET_CHILD_SUBREAPER marks a process, like a service
>>  	 * manager, to re-parent orphan (double-forking) child processes
>> diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
>> index c0d7ea0..79f0a8a 100644
>> --- a/include/uapi/linux/prctl.h
>> +++ b/include/uapi/linux/prctl.h
>> @@ -219,4 +219,8 @@ struct prctl_mm_map {
>>  # define PR_SPEC_DISABLE		(1UL << 2)
>>  # define PR_SPEC_FORCE_DISABLE		(1UL << 3)
>>  
>> +/* Whether to receive signal prior to child's coredump */
>> +#define PR_SET_PREDUMP_SIG	54
>> +#define PR_GET_PREDUMP_SIG	55
>> +
>>  #endif /* _LINUX_PRCTL_H */
>> diff --git a/kernel/sys.c b/kernel/sys.c
>> index 123bd73..39aa3b8 100644
>> --- a/kernel/sys.c
>> +++ b/kernel/sys.c
>> @@ -2476,6 +2476,19 @@ int __weak arch_prctl_spec_ctrl_set(struct task_struct *t, unsigned long which,
>>  			return -EINVAL;
>>  		error = arch_prctl_spec_ctrl_set(me, arg2, arg3);
>>  		break;
>> +	case PR_SET_PREDUMP_SIG:
>> +		if (arg3 || arg4 || arg5)
> 
> glibc has
> 
> 	int prctl(int option, ...);
> 
> Some prctls() police extra arguments for zeros, but this means that
> the userspace caller also has to supply pointless 0 arguments.
> 
> It's debatable which is the preferred approach.  Did you have any
> particular rationale for your choice here?
>

The initial version did not check the values of these unused arguments.
But Jann Horn pointed out the new convention is to enforce the 0 values
so I followed ...

Thanks.   -- Enke



[Index of Archives]     [Linux Kernel]     [Kernel Newbies]     [x86 Platform Driver]     [Netdev]     [Linux Wireless]     [Netfilter]     [Bugtraq]     [Linux Filesystems]     [Yosemite Discussion]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]

  Powered by Linux