The patch titled sched_exit: move the callsite to do_exit() has been added to the -mm tree. Its filename is sched_exit-move-the-callsite-to-do_exit.patch See http://www.zip.com.au/~akpm/linux/patches/stuff/added-to-mm.txt to find out what to do about this ------------------------------------------------------ Subject: sched_exit: move the callsite to do_exit() From: Oleg Nesterov <oleg@xxxxxxxxxx> sched_exit() is called by release_task(). If the task auto-reaps itself this call happens a bit too early (the task still uses cpu after that). If the task goes to TASK_ZOMBIE this call is unpredictably delayed. I think it is better to do sched_exit() right before the last schedule(). We can use read_lock_rcu() instead of write_lock(tasklist). In that case it is possible that sched_exit() changes ->time_slice/->sleep_avg of the already dead ->parent, but I think this is tolerable. Signed-off-by: Oleg Nesterov <oleg@xxxxxxxxxx> Cc: Ingo Molnar <mingo@xxxxxxx> Cc: Nick Piggin <nickpiggin@xxxxxxxxxxxx> Cc: Con Kolivas <kernel@xxxxxxxxxxx> Cc: Peter Williams <pwil3058@xxxxxxxxxxxxxx> Cc: "Paul E. McKenney" <paulmck@xxxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxx> --- kernel/exit.c | 3 ++- kernel/sched.c | 6 +++++- 2 files changed, 7 insertions(+), 2 deletions(-) diff -puN kernel/exit.c~sched_exit-move-the-callsite-to-do_exit kernel/exit.c --- a/kernel/exit.c~sched_exit-move-the-callsite-to-do_exit +++ a/kernel/exit.c @@ -171,7 +171,6 @@ repeat: zap_leader = (leader->exit_signal == -1); } - sched_exit(p); write_unlock_irq(&tasklist_lock); spin_unlock(&p->proc_lock); proc_pid_flush(proc_dentry); @@ -950,6 +949,8 @@ fastcall NORET_TYPE void do_exit(long co if (tsk->splice_pipe) __free_pipe_info(tsk->splice_pipe); + sched_exit(tsk); + /* PF_DEAD causes final put_task_struct after we schedule. */ preempt_disable(); BUG_ON(tsk->flags & PF_DEAD); diff -puN kernel/sched.c~sched_exit-move-the-callsite-to-do_exit kernel/sched.c --- a/kernel/sched.c~sched_exit-move-the-callsite-to-do_exit +++ a/kernel/sched.c @@ -1606,10 +1606,13 @@ void fastcall wake_up_new_task(task_t *p */ void fastcall sched_exit(task_t *p) { - task_t *parent = p->parent; + task_t *parent; unsigned long flags; runqueue_t *rq; + rcu_read_lock(); + parent = p->real_parent; + /* * If the child was a (relative-) CPU hog then decrease * the sleep_avg of the parent as well. @@ -1625,6 +1628,7 @@ void fastcall sched_exit(task_t *p) (EXIT_WEIGHT + 1) * EXIT_WEIGHT + p->sleep_avg / (EXIT_WEIGHT + 1); task_rq_unlock(rq, &flags); + rcu_read_unlock(); } /** _ Patches currently in -mm which might be from oleg@xxxxxxxxxx are avoid-tasklist_lock-at-getrusage-for-multithreaded-case-too.patch ptrace-document-the-locking-rules.patch list-introduce-list_replace-helper.patch list-use-list_replace_init-instead-of-list_splice_init.patch when-config_base_samll=1-the-kernel-261611-cascade-in-kernel-timerc-may-enter-the-infinite-loop.patch when-config_base_samll=1-the-kernel-261611-cascade-in-kernel-timerc-may-enter-the-infinite-loop-use-list_replace_init.patch sched_exit-fix-parent-time_slice-calculation.patch sched_exit-move-the-callsite-to-do_exit.patch proc-remove-tasklist_lock-from-proc_pid_readdir-simply-fix-first_tgid.patch proc-dont-lock-task_structs-indefinitely.patch proc-dont-lock-task_structs-indefinitely-task_mmu-small-fixes.patch simplify-fix-first_tid.patch cleanup-next_tid.patch de_thread-fix-lockless-do_each_thread.patch coredump-optimize-mm-users-traversal.patch coredump-speedup-sigkill-sending.patch coredump-kill-ptrace-related-stuff.patch coredump-kill-ptrace-related-stuff-fix.patch coredump-dont-take-tasklist_lock.patch coredump-some-code-relocations.patch coredump-shutdown-current-process-first.patch coredump-copy_process-dont-check-signal_group_exit.patch pidhash-temporary-debug-checks.patch - To unsubscribe from this list: send the line "unsubscribe mm-commits" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html