On Tue, May 14, 2019 at 06:01:59PM +0200, Oleg Nesterov wrote: > Roman, > > Sorry, I can't agree with this patch. And even the changelog doesn't > look right. > > On 05/13, Roman Gushchin wrote: > > > > The ptrace_stop() function contains the cgroup_enter_frozen() call, > > but no cgroup_leave_frozen(). When ptrace_stop() is called from the > > do_jobctl_trap() path, it's correct, because corresponding > > cgroup_leave_frozen() calls in get_signal() will guarantee that > > the task won't leave the signal handler loop frozen. > > > > However, if ptrace_stop() is called from ptrace_signal() or > > ptrace_notify(), there is no such guarantee, and the task may leave > > with the frozen bit set. > > ptrace_signal() looks fine in that the task can't return to user-mode, > get_signal() will be called again exactly because ->frozen == 1 means > TIF_SIGPENDING. So I an not surre I understand why ptrace_signal() does > ptrace_stop(false) with your patch. But this is minor. > > > It leads to the regression, reported by Alex Xu. Write system call > > gets mistakenly interrupted by fake TIF_SIGPENDING, which is set > > by recalc_sigpending_tsk() because of the set frozen bit. > > IMHO, the real problem is not that syscall was interrupted. The problem > is that a frozen task must never start the syscall. > > > --------------------------------------------------------------------------- > > Can't we add the unconditional leave_frozen() into ptrace_stop() for now ? > > Sure, this is not what we want. Debugger can disturb CGRP_FROZEN. > > But. The "may_remain_frozen" argument uglifies this code too much (imo) and > at the same time it doesn't solve the problem above: CGRP_FROZEN can be cleared > "for no reason". > > Say, why ptrace_event_pid() should do leave_frozen(true) ? And if there is any > reason, then why wait_for_vfork_done() can do leave_frozen(false) ? > > Or syscall-exit path. It can't miss get_signal(), it doesn't need leave_frozen(). > > > In short, I believe that compared to the unconditional leave_frozen() in ptrace_stop() > this patch buys almost nothing, but makes the code and the whole logic much uglier. > > Oleg. > Hi Oleg! I agree that "may_remain_frozen" adds a lot of ugliness, so let's fix the regression with the unconditional leave_frozen(true). The patch below. Please, let me know if it's not what you meant. The problem is that it makes the ptrace freezer kselftest to flap, so it's good only as a temporarily solution. But it looks like we agree here. Thank you! -- >From 2602261b066a06f6884057d2cd7369951768b9ed Mon Sep 17 00:00:00 2001 From: Roman Gushchin <guro@xxxxxx> Date: Tue, 14 May 2019 10:13:19 -0700 Subject: [PATCH] signal: unconditionally leave the frozen state in ptrace_stop() Alex Xu reported a regression in strace, caused by the introduction of the cgroup v2 freezer. The regression can be reproduced by stracing the following simple program: #include <unistd.h> int main() { write(1, "a", 1); return 0; } An attempt to run strace ./a.out leads to the infinite loop: [ pre-main omitted ] write(1, "a", 1) = ? ERESTARTSYS (To be restarted if SA_RESTART is set) write(1, "a", 1) = ? ERESTARTSYS (To be restarted if SA_RESTART is set) write(1, "a", 1) = ? ERESTARTSYS (To be restarted if SA_RESTART is set) write(1, "a", 1) = ? ERESTARTSYS (To be restarted if SA_RESTART is set) write(1, "a", 1) = ? ERESTARTSYS (To be restarted if SA_RESTART is set) write(1, "a", 1) = ? ERESTARTSYS (To be restarted if SA_RESTART is set) [ repeats forever ] The problem occurs because the traced task leaves ptrace_stop() (and the signal handling loop) with the frozen bit set. So let's call cgroup_leave_frozen(true) unconditionally after sleeping in ptrace_stop(). With this patch applied, strace works as expected: [ pre-main omitted ] write(1, "a", 1) = 1 exit_group(0) = ? +++ exited with 0 +++ Reported-by: Alex Xu <alex_y_xu@xxxxxxxx> Fixes: 76f969e8948d ("cgroup: cgroup v2 freezer") Signed-off-by: Roman Gushchin <guro@xxxxxx> Cc: Oleg Nesterov <oleg@xxxxxxxxxx> Cc: Tejun Heo <tj@xxxxxxxxxx> --- kernel/signal.c | 1 + 1 file changed, 1 insertion(+) diff --git a/kernel/signal.c b/kernel/signal.c index 8607b11ff936..565ba14d89d5 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -2112,6 +2112,7 @@ static void ptrace_stop(int exit_code, int why, int clear_code, kernel_siginfo_t preempt_enable_no_resched(); cgroup_enter_frozen(); freezable_schedule(); + cgroup_leave_frozen(true); } else { /* * By the time we got the lock, our tracer went away. -- 2.20.1