Re: [PATCH 1/6] signal: Remove the bogus sigkill_pending in ptrace_stop

Kees Cook <keescook@xxxxxxxxxxxx> · Fri, 24 Sep 2021 12:06:41 -0700

On Fri, Sep 24, 2021 at 10:48:18AM -0500, Eric W. Biederman wrote:
> Kees Cook <keescook@xxxxxxxxxxxx> writes:
> 
> > On Thu, Sep 23, 2021 at 07:09:34PM -0500, Eric W. Biederman wrote:
> >> 
> >> The existence of sigkill_pending is a little silly as it is
> >> functionally a duplicate of fatal_signal_pending that is used in
> >> exactly one place.
> >
> > sigkill_pending() checks for &tsk->signal->shared_pending.signal but
> > fatal_signal_pending() doesn't.
> 
> The extra test is unnecessary as all SIGKILL's visit complete_signal
> immediately run the loop:
> 
> 			/*
> 			 * Start a group exit and wake everybody up.
> 			 * This way we don't have other threads
> 			 * running and doing things after a slower
> 			 * thread has the fatal signal pending.
> 			 */
> 			signal->flags = SIGNAL_GROUP_EXIT;
> 			signal->group_exit_code = sig;
> 			signal->group_stop_count = 0;
> 			t = p;
> 			do {
> 				task_clear_jobctl_pending(t, JOBCTL_PENDING_MASK);
> 				sigaddset(&t->pending.signal, SIGKILL);
> 				signal_wake_up(t, 1);
> 			} while_each_thread(p, t);
> 			return;
> 
> Which sets SIGKILL in the task specific queue.  Which means only the
> non-shared queue needs to be tested.  Further fatal_signal_pending would
> be buggy if this was not the case.

Okay, so SIGKILL is special from the perspective of shared_pending. Why
was it tested for before? Or rather: how could SIGKILL ever have gotten
set in shared_pending?

Oh, I think I see what you mean about complete_signal() now: that's just
looking at sig, and doesn't care where it got written. i.e. SIGKILL gets
immediately written to pending, even if the prior path through
__send_signal() only wrote it to shared_pending.

> 
> >> Checking for pending fatal signals and returning early in ptrace_stop
> >> is actively harmful.  It casues the ptrace_stop called by
> >> ptrace_signal to return early before setting current->exit_code.
> >> Later when ptrace_signal reads the signal number from
> >> current->exit_code is undefined, making it unpredictable what will
> >> happen.
> >> 
> >> Instead rely on the fact that schedule will not sleep if there is a
> >> pending signal that can awaken a task.
> >
> > This reasoning sound fine, but I can't see where it's happening.
> > It looks like recalc_sigpending() is supposed to happen at the start
> > of scheduling? I see it at the end of ptrace_stop(), though, so it looks
> > like it's reasonable to skip checking shared_pending.
> >
> > (Does the scheduler deal with shared_pending directly?)
> 
> In the call of signal_pending_state from kernel/core/.c:__schedule().
> 
> ptrace_stop would actually be badly broken today if that was not the
> case as several places enter into ptrace_event without testing signals
> first.
> 
> >> Removing the explict sigkill_pending test fixes fixes ptrace_signal
> >> when ptrace_stop does not stop because current->exit_code is always
> >> set to to signr.
> >> 
> >> Cc: stable@xxxxxxxxxxxxxxx
> >> Fixes: 3d749b9e676b ("ptrace: simplify ptrace_stop()->sigkill_pending() path")
> >> Fixes: 1a669c2f16d4 ("Add arch_ptrace_stop")
> >> Signed-off-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx>
> >> ---
> >>  kernel/signal.c | 18 ++++--------------
> >>  1 file changed, 4 insertions(+), 14 deletions(-)
> >> 
> >> diff --git a/kernel/signal.c b/kernel/signal.c
> >> index 952741f6d0f9..9f2dc9cf3208 100644
> >> --- a/kernel/signal.c
> >> +++ b/kernel/signal.c
> >> @@ -2182,15 +2182,6 @@ static inline bool may_ptrace_stop(void)
> >>  	return true;
> >>  }
> >>  
> >> -/*
> >> - * Return non-zero if there is a SIGKILL that should be waking us up.
> >> - * Called with the siglock held.
> >> - */
> >> -static bool sigkill_pending(struct task_struct *tsk)
> >> -{
> >> -	return sigismember(&tsk->pending.signal, SIGKILL) ||
> >> -	       sigismember(&tsk->signal->shared_pending.signal, SIGKILL);
> >> -}
> >>  
> >>  /*
> >>   * This must be called with current->sighand->siglock held.
> >> @@ -2217,17 +2208,16 @@ static void ptrace_stop(int exit_code, int why, int clear_code, kernel_siginfo_t
> >>  		 * calling arch_ptrace_stop, so we must release it now.
> >>  		 * To preserve proper semantics, we must do this before
> >>  		 * any signal bookkeeping like checking group_stop_count.
> >> -		 * Meanwhile, a SIGKILL could come in before we retake the
> >> -		 * siglock.  That must prevent us from sleeping in TASK_TRACED.
> >> -		 * So after regaining the lock, we must check for SIGKILL.
> >
> > Where is the sleep this comment is talking about?
> >
> > i.e. will recalc_sigpending() have been called before the above sleep
> > would happen? I assume it's after ptrace_stop() returns... But I want to
> > make sure the sleep isn't in ptrace_stop() itself somewhere I can't see.
> > I *do* see freezable_schedule() called, and that dumps us into
> > __schedule(), and I don't see a recalc before it checks
> > signal_pending_state().
> >
> > Does a recalc need to happen in plce of the old sigkill_pending()
> > call?
> 
> You read that correctly freezable_schedule is where ptrace_stop sleeps.
> 
> The call chain you are looking for looks something like:
> send_signal
>   complete_signal
>      signal_wake_up
>        signal_wake_up_state
>          set_tsk_thread_flag(t, TIF_SIGPENDING)
> 
> That is to say complete_signal sets TIF_SIGPENDING and
> the per task siqueue SIGKILL entry.
> 
> Calling recalc_sigpending is only needed when a signal is removed from
> the queues, not when a signal is added.

Got it; thanks! Yeah, it was mainly I didn't see where SIGKILL got
handled specially, and now I do. :)

Reviewed-by: Kees Cook <keescook@xxxxxxxxxxxx>

-- 
Kees Cook