* Allen Pais | 2014-01-27 13:50:43 [+0530]: >Hi, Hi, >[ 1143.894099] INFO: rcu_preempt self-detected stall on CPU { 36} (t=2100 jiffies g=373 c=372 q=61) >[ 1143.894130] CPU[ 0]: TSTATE[0000009980001602] TPC[000000000048d1ac] TNPC[000000000048d1b0] TASK[ksoftirqd/0:3] >[ 1143.894151] TPC[idle_cpu+0x2c/0x80] O7[cpumask_next_and+0x18/0x80] I7[find_busiest_group+0x21c/0xa40] RPC[load_balance+0xe8/0x880] so you have CPU stall on cpu36 > >[ 1150.135499] BUG: soft lockup - CPU#36 stuck for 23s! [swapper/36:0] >[ 1150.135564] TPC: <rtrap_no_irq_enable+0x0/0xc> >[ 1150.135572] RPC: <tl0_irq15+0x14/0x20> >[ 1150.135581] I7: <tl0_irq15+0x14/0x20> This is your NMI handler where you are right now… >[ 1150.135582] Call Trace: >[ 1150.135584] [0000000000404b54] rtrap_no_irq_enable+0x0/0xc >[ 1150.135586] [0000000000404b54] rtrap_no_irq_enable+0x0/0xc >[ 1150.135588] [0000000000404b54] rtrap_no_irq_enable+0x0/0xc >[ 1150.135590] [0000000000404b54] rtrap_no_irq_enable+0x0/0xc >[ 1150.135592] [0000000000404b54] rtrap_no_irq_enable+0x0/0xc >[ 1150.135594] [0000000000404b54] rtrap_no_irq_enable+0x0/0xc >[ 1150.135595] [0000000000404b54] rtrap_no_irq_enable+0x0/0xc >[ 1150.135597] [0000000000404b54] rtrap_no_irq_enable+0x0/0xc >[ 1150.135599] [0000000000404b54] rtrap_no_irq_enable+0x0/0xc >[ 1150.135600] [0000000000404b54] rtrap_no_irq_enable+0x0/0xc This is the stack where the NMI got injected. Not sure why this is more than once here but then this the sparc64 trace I see :) >[ 1150.135604] [00000000004acc00] in_lock_functions+0x0/0x40 >[ 1150.135608] [000000000080a038] add_preempt_count+0xd8/0x140 >[ 1150.135610] [000000000080617c] __schedule+0x1c/0x500 >[ 1150.135613] [0000000000806b7c] schedule+0x1c/0xc0 >[ 1150.135615] [0000000000806f8c] schedule_preempt_disabled+0xc/0x40 >[ 1150.135617] [000000000049dd10] cpu_startup_entry+0x150/0x300 And this where the CPU was before the NMI. Doesn't look blocking. in_lock_functions() compares a few values no locking involved so the CPU probably was here while the NMI hit and an usec later it might be an instruction later. What I thnig is odd, is that it is exactly at the begin of the function, not an instruction later. >Am yet to debug what went wrong. > >- Allen Sebastian -- To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html