Alexander Gordeev <agordeev@xxxxxxxxxxxxx> writes: > On Sat, Jun 25, 2022 at 11:34:46AM -0500, Eric W. Biederman wrote: >> I haven't gotten as far as reproducing this but I have started giving >> this issue some thought. >> >> This entire thing smells like a memory barrier is missing somewhere. >> However by definition the lock implementations in linux provide all the >> needed memory barriers, and in the ptrace_stop and ptrace_check_attach >> path I don't see cases where these values are sampled outside of a lock >> except in wait_task_inactive. Does doing that perhaps require a >> barrier? >> >> The two things I can think of that could shed light on what is going on >> is enabling lockdep, to enable the debug check in signal_wake_up_state >> and verifying bits of state that should be constant while the task >> is frozen for ptrace are indeed constant when task is frozen for ptrace. >> Something like my patch below. >> >> If you could test that when you have a chance that would help narrow >> down what is going on. >> >> Thank you, >> Eric >> >> diff --git a/kernel/ptrace.c b/kernel/ptrace.c >> index 156a99283b11..6467a2b1c3bc 100644 >> --- a/kernel/ptrace.c >> +++ b/kernel/ptrace.c >> @@ -268,9 +268,13 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state) >> } >> read_unlock(&tasklist_lock); >> >> - if (!ret && !ignore_state && >> - WARN_ON_ONCE(!wait_task_inactive(child, __TASK_TRACED))) >> + if (!ret && !ignore_state) { >> + WARN_ON_ONCE(!(child->jobctl & JOBCTL_PTRACE_FROZEN)); >> + WARN_ON_ONCE(!(child->joctctl & JOBCTL_TRACED)); >> + WARN_ON_ONCE(READ_ONCE(child->__state) != __TASK_TRACED); >> + WARN_ON_ONCE(!wait_task_inactive(child, __TASK_TRACED)); >> ret = -ESRCH; >> + } >> >> return ret; >> } > > I modified your chunk a bit - hope that is what you had in mind: Yes. > diff --git a/kernel/ptrace.c b/kernel/ptrace.c > index 156a99283b11..f0e9a9a4d63c 100644 > --- a/kernel/ptrace.c > +++ b/kernel/ptrace.c > @@ -268,9 +268,19 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state) > } > read_unlock(&tasklist_lock); > > - if (!ret && !ignore_state && > - WARN_ON_ONCE(!wait_task_inactive(child, __TASK_TRACED))) > - ret = -ESRCH; > + if (!ret && !ignore_state) { > + unsigned int __state; > + > + WARN_ON_ONCE(!(child->jobctl & JOBCTL_PTRACE_FROZEN)); > + WARN_ON_ONCE(!(child->jobctl & JOBCTL_TRACED)); > + __state = READ_ONCE(child->__state); > + if (__state != __TASK_TRACED) { > + pr_err("%s(%d) __state %x", __FUNCTION__, __LINE__, __state); > + WARN_ON_ONCE(1); > + } > + if (WARN_ON_ONCE(!wait_task_inactive(child, __TASK_TRACED))) > + ret = -ESRCH; > + } > > return ret; > } > > > When WARN_ON_ONCE(1) hits the child __state is always zero/TASK_RUNNING, > as reported by the preceding pr_err(). Yet, in the resulting core dump > it is always __TASK_TRACED. Did you enable CONFIG_LOCKDEP? I am just wanting to ensure that every caller of signal_wake_up_state was holding siglock. > Removing WARN_ON_ONCE(1) while looping until (__state != __TASK_TRACED) > confirms the unexpected __state is always TASK_RUNNING. It never observed > more than one iteration and gets printed once in 30-60 mins. Hmm. This does smell lock a missing barrier. > So probably when the condition is entered __state is TASK_RUNNING more > often, but gets overwritten with __TASK_TRACED pretty quickly. Which kind > of consistent with my previous observation that kernel/sched/core.c:3305 > is where return 0 makes wait_task_inactive() fail. > > No other WARN_ON_ONCE() hit ever. Yes. This smells like something is missing. I am completely rusty at rolling barriers by hand but does something like the below clear up those mysterious warnings? diff --git a/kernel/ptrace.c b/kernel/ptrace.c index 156a99283b11..cb85bcf84640 100644 --- a/kernel/ptrace.c +++ b/kernel/ptrace.c @@ -202,6 +202,7 @@ static bool ptrace_freeze_traced(struct task_struct *task) spin_lock_irq(&task->sighand->siglock); if (task_is_traced(task) && !looks_like_a_spurious_pid(task) && !__fatal_signal_pending(task)) { + smp_rmb(); task->jobctl |= JOBCTL_PTRACE_FROZEN; ret = true; } diff --git a/kernel/signal.c b/kernel/signal.c index edb1dc9b00dc..bcd576e9de66 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -2233,6 +2233,7 @@ static int ptrace_stop(int exit_code, int why, unsigned long message, return exit_code; set_special_state(TASK_TRACED); + smp_wmb(); current->jobctl |= JOBCTL_TRACED; /* Eric