From: Oleg Nesterov <oleg@xxxxxxxxxx> [ Upstream commit 18f649ef344127ef6de23a5a4272dbe2fdb73dde ] The PF_EXITING check in task_wants_autogroup() is no longer needed. Remove it, but see the next patch. However the comment is correct in that autogroup_move_group() must always change task_group() for every thread so the sysctl_ check is very wrong; we can race with cgroups and even sys_setsid() is not safe because a task running with task_group() == ag->tg must participate in refcounting: int main(void) { int sctl = open("/proc/sys/kernel/sched_autogroup_enabled", O_WRONLY); assert(sctl > 0); if (fork()) { wait(NULL); // destroy the child's ag/tg pause(); } assert(pwrite(sctl, "1\n", 2, 0) == 2); assert(setsid() > 0); if (fork()) pause(); kill(getppid(), SIGKILL); sleep(1); // The child has gone, the grandchild runs with kref == 1 assert(pwrite(sctl, "0\n", 2, 0) == 2); assert(setsid() > 0); // runs with the freed ag/tg for (;;) sleep(1); return 0; } crashes the kernel. It doesn't really need sleep(1), it doesn't matter if autogroup_move_group() actually frees the task_group or this happens later. ----- Without this commit, 4.4.y + hikey kernel crashes while running 'autogroup01' test in ltp (it seems to be written to catch this break itself): Unable to handle kernel NULL pointer dereference at virtual address 00000080 pgd = ffffffc050b11000 [00000080] *pgd=0000000070744003, *pud=0000000070744003, *pmd=0000000000000000 Internal error: Oops: 96000006 [#1] PREEMPT SMP Modules linked in: bnep arc4 wl18xx wlcore mac80211 cfg80211 wlcore_sdio spidev btwilink st_drv bluetooth rfkill CPU: 7 PID: 7965 Comm: autogroup01 Tainted: G W 4.4.92-rc1+ #1 Hardware name: HiKey Development Board (DT) task: ffffffc04ccf0000 ti: ffffffc0491ec000 task.ti: ffffffc0491ec000 PC is at pick_next_task_fair+0x620/0x8f0 LR is at pick_next_task_fair+0x620/0x8f0 pc : [<ffffffc0000f3dd0>] lr : [<ffffffc0000f3dd0>] pstate: 600001c5 sp : ffffffc0491efcc0 ... ... ... Call trace: [<ffffffc0000f3dd0>] pick_next_task_fair+0x620/0x8f0 [<ffffffc00080fb7c>] __schedule+0x124/0x768 [<ffffffc000810204>] schedule+0x44/0xb8 [<ffffffc0008148c4>] do_nanosleep+0x94/0x168 [<ffffffc00012340c>] hrtimer_nanosleep+0x94/0x148 [<ffffffc000123534>] SyS_nanosleep+0x74/0xa0 [<ffffffc000085e30>] el0_svc_naked+0x24/0x28 Code: 54ffdae1 aa1303e0 aa1403e1 97ffdb61 (f9404013) With this, the test case passes: hikey:/opt/ltp# ./testcases/bin/autogroup01 tst_test.c:934: INFO: Timeout per run is 0h 05m 00s autogroup01.c:71: PASS: Bug not reproduced Summary: passed 1 failed 0 skipped 0 warnings 0 Reported-by: Vern Lovejoy <vlovejoy@xxxxxxxxxx> Signed-off-by: Oleg Nesterov <oleg@xxxxxxxxxx> Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx> Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> Cc: Mike Galbraith <efault@xxxxxx> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> Cc: hartsjc@xxxxxxxxxx Cc: vbendel@xxxxxxxxxx Link: http://lkml.kernel.org/r/20161114184609.GA15965@xxxxxxxxxx Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx> Signed-off-by: Sumit Semwal <sumit.semwal@xxxxxxxxxx> [sumits: submit to 4.4 LTS, post testing on Hikey] --- kernel/sched/auto_group.c | 23 ++++++++++++----------- 1 file changed, 12 insertions(+), 11 deletions(-) diff --git a/kernel/sched/auto_group.c b/kernel/sched/auto_group.c index 750ed601ddf7..8620fd01b3d0 100644 --- a/kernel/sched/auto_group.c +++ b/kernel/sched/auto_group.c @@ -111,14 +111,11 @@ bool task_wants_autogroup(struct task_struct *p, struct task_group *tg) { if (tg != &root_task_group) return false; - /* - * We can only assume the task group can't go away on us if - * autogroup_move_group() can see us on ->thread_group list. + * If we race with autogroup_move_group() the caller can use the old + * value of signal->autogroup but in this case sched_move_task() will + * be called again before autogroup_kref_put(). */ - if (p->flags & PF_EXITING) - return false; - return true; } @@ -138,13 +135,17 @@ autogroup_move_group(struct task_struct *p, struct autogroup *ag) } p->signal->autogroup = autogroup_kref_get(ag); - - if (!READ_ONCE(sysctl_sched_autogroup_enabled)) - goto out; - + /* + * We can't avoid sched_move_task() after we changed signal->autogroup, + * this process can already run with task_group() == prev->tg or we can + * race with cgroup code which can read autogroup = prev under rq->lock. + * In the latter case for_each_thread() can not miss a migrating thread, + * cpu_cgroup_attach() must not be possible after cgroup_exit() and it + * can't be removed from thread list, we hold ->siglock. + */ for_each_thread(p, t) sched_move_task(t); -out: + unlock_task_sighand(p, &flags); autogroup_kref_put(prev); } -- 2.7.4