On Mon, May 05, 2014 at 02:37:06PM +0200, Peter Zijlstra wrote: > On Wed, Apr 16, 2014 at 12:40:01AM -0700, tip-bot for Frederic Weisbecker wrote: > > Commit-ID: 72aacf0259bb7d53b7a3b5b2f7bf982acaa52b61 > > Gitweb: http://git.kernel.org/tip/72aacf0259bb7d53b7a3b5b2f7bf982acaa52b61 > > Author: Frederic Weisbecker <fweisbec@xxxxxxxxx> > > AuthorDate: Tue, 18 Mar 2014 21:12:53 +0100 > > Committer: Frederic Weisbecker <fweisbec@xxxxxxxxx> > > CommitDate: Thu, 3 Apr 2014 18:05:21 +0200 > > > > nohz: Move full nohz kick to its own IPI > > > > Now that we have smp_queue_function_single() which can be used to > > safely queue IPIs when interrupts are disabled and without worrying > > about concurrent callers, lets use it for the full dynticks kick to > > notify a CPU that it's exiting single task mode. > > > > This unbloats a bit the scheduler IPI that the nohz code was abusing > > for its cool "callable anywhere/anytime" properties. > > > > Reviewed-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx> > > Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > > Cc: Ingo Molnar <mingo@xxxxxxxxxx> > > Cc: Jens Axboe <axboe@xxxxxx> > > Cc: Kevin Hilman <khilman@xxxxxxxxxx> > > Cc: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx> > > Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx> > > Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> > > Signed-off-by: Frederic Weisbecker <fweisbec@xxxxxxxxx> > > So I suspect this is the patch that makes Ingo's machines unhappy, they > appear to get stuck thusly: > > [10513.382910] RIP: 0010:[<ffffffff8112b7da>] [<ffffffff8112b7da>] generic_exec_single+0x9a/0x180 > > [10513.481704] [<ffffffff8112c092>] smp_queue_function_single+0x42/0xa0 > [10513.488251] [<ffffffff81126ce0>] tick_nohz_full_kick_cpu+0x50/0x80 > [10513.494661] [<ffffffff810f4b0e>] enqueue_task_fair+0x59e/0x6c0 > [10513.506469] [<ffffffff810e3d6a>] enqueue_task+0x3a/0x60 > [10513.511836] [<ffffffff810e8ac3>] __migrate_task+0x123/0x150 > [10513.523535] [<ffffffff810e8b0d>] migration_cpu_stop+0x1d/0x30 > [10513.529401] [<ffffffff81143460>] cpu_stopper_thread+0x70/0x120 > > I'm not entirely sure how yet, but this is by far the most likely > candidate. Ingo, if you still have the vmlinuz matching this trace (your > hang2.txt) could you have a peek where that RIP lands? > > If that is indeed the csd_lock() function, then this is it and > something's buggered. On a kernel build from your .config the +0x9a is indeed very close to that wait loop; of course 0x9a isn't even an instruction boundary for me so its all a bit of a guess. -- To unsubscribe from this list: send the line "unsubscribe linux-tip-commits" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html