On Sat, Sep 22, 2012 at 06:16:15PM +0000, Paul Walmsley wrote: > Hi Paul > > On Fri, 21 Sep 2012, Paul E. McKenney wrote: > > > I am wondering if your system somehow figured out how to start a grace > > period that had no RCU callbacks waiting for it. If that happened, > > then a CONFIG_NO_HZ=y system could in theory get into a state where all > > CPUs are in dyntick-idle mode, so that none of them is doing anything > > to force the grace period to complete. > > > > That should be easy to diagnose, anyway. Please see below, which > > includes the earlier diagnostic patch. > > Here you go. > > - Paul > > [ 248.902618] INFO: rcu_sched self-detected stall on CPU > [ 248.905456] 0: (1 ticks this GP) idle=933/1/0 > [ 248.907897] (t=26570 jiffies g=11 c=10 q=0) Bingo!!! (q=0, in case you were wondering. And thank you for testing this!) Strangely enough, I believe that I have inadvertently fixed this in my -rcu tree: git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git rcu/next Nevertheless, if you get a chance to try it, I would be interested to hear if my guess is correct. The trick is that a kthread drives the grace period in -rcu, regardless of whether or not there are callbacks. However, the backport would not be something that -stable would be happy with, so I will be putting together a fix for mainline. This thing has been in the kernel since about 2004, not sure why you didn't hit it earlier. Thanx, Paul > [ 248.910339] [<c001bc90>] (unwind_backtrace+0x0/0xf0) from [<c00ad800>] (rcu_check_callbacks+0x220/0x714) > [ 248.915527] [<c00ad800>] (rcu_check_callbacks+0x220/0x714) from [<c00532a0>] (update_process_times+0x38/0x68) > [ 248.920928] [<c00532a0>] (update_process_times+0x38/0x68) from [<c008c9e8>] (tick_sched_timer+0x80/0xec) > [ 248.926116] [<c008c9e8>] (tick_sched_timer+0x80/0xec) from [<c0068ed4>] (__run_hrtimer+0x7c/0x1e0) > [ 248.930999] [<c0068ed4>] (__run_hrtimer+0x7c/0x1e0) from [<c0069cb8>] (hrtimer_interrupt+0x11c/0x2d0) > [ 248.936035] [<c0069cb8>] (hrtimer_interrupt+0x11c/0x2d0) from [<c001a3cc>] (twd_handler+0x30/0x44) > [ 248.940948] [<c001a3cc>] (twd_handler+0x30/0x44) from [<c00a7bd0>] (handle_percpu_devid_irq+0x90/0x13c) > [ 248.946075] [<c00a7bd0>] (handle_percpu_devid_irq+0x90/0x13c) from [<c00a4344>] (generic_handle_irq+0x30/0x48) > [ 248.951538] [<c00a4344>] (generic_handle_irq+0x30/0x48) from [<c0014e38>] (handle_IRQ+0x4c/0xac) > [ 248.956329] [<c0014e38>] (handle_IRQ+0x4c/0xac) from [<c00084cc>] (gic_handle_irq+0x28/0x5c) > [ 248.960937] [<c00084cc>] (gic_handle_irq+0x28/0x5c) from [<c04fb1a4>] (__irq_svc+0x44/0x5c) > [ 248.965484] Exception stack(0xc0729f58 to 0xc0729fa0) > [ 248.968231] 9f40: 0003b832 00000001 > [ 248.972686] 9f60: 00000000 c074a8e8 c0728000 c07c42c8 c05065a0 c074bdc8 00000000 411fc092 > [ 248.977142] 9f80: c074bfe8 00000000 00000001 c0729fa0 0003b833 c0015130 20000113 ffffffff > [ 248.981597] [<c04fb1a4>] (__irq_svc+0x44/0x5c) from [<c0015130>] (default_idle+0x20/0x44) > [ 248.986083] [<c0015130>] (default_idle+0x20/0x44) from [<c001535c>] (cpu_idle+0x9c/0x114) > [ 248.990539] [<c001535c>] (cpu_idle+0x9c/0x114) from [<c06d77b0>] (start_kernel+0x2b4/0x304) > -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html