On Tue, Mar 04, 2014 at 05:16:27PM -0800, Florian Fainelli wrote: > 2014-03-04 17:03 GMT-08:00 Florian Fainelli <f.fainelli@xxxxxxxxx>: > > 2014-03-04 16:48 GMT-08:00 Eric Dumazet <eric.dumazet@xxxxxxxxx>: > >> On Tue, 2014-03-04 at 15:55 -0800, Florian Fainelli wrote: > >>> Hi all, > >>> > >>> I am seeing the following RCU stalls messages appearing on an ARMv7 > >>> 4xCPUs system running 3.14-rc4: > >>> > >>> [ 42.974327] INFO: rcu_sched detected stalls on CPUs/tasks: > >>> [ 42.979839] (detected by 0, t=2102 jiffies, g=4294967082, > >>> c=4294967081, q=516) > >>> [ 42.987169] INFO: Stall ended before state dump start > >>> > >>> this is happening under the following conditions: > >>> > >>> - the attached bumper.c binary alters various kernel thread priorities > >>> based on the contents of bumpup.cfg and > >>> - malloc_crazy is running from a NFS share > >>> - malloc_crazy.c is running in a loop allocating chunks of memory but > >>> never freeing it > >>> > >>> when the priorities are altered, instead of getting the OOM killer to > >>> be invoked, the RCU stalls are happening. Taking NFS out of the > >>> equation does not allow me to reproduce the problem even with the > >>> priorities altered. > >>> > >>> This "problem" seems to have been there for quite a while now since I > >>> was able to get 3.8.13 to trigger that bug as well, with a slightly > >>> more detailed RCU debugging trace which points the finger at kswapd0. > >>> > >>> You should be able to get that reproduced under QEMU with the > >>> Versatile Express platform emulating a Cortex A15 CPU and the attached > >>> files. > >>> > >>> Any help or suggestions would be greatly appreciated. Thanks! > >> > >> Do you have a more complete trace, including stack traces ? > > > > Attatched is what I get out of SysRq-t, which is the only thing I have > > (note that the kernel is built with CONFIG_RCU_CPU_STALL_INFO=y): > > QEMU for Versatile Express w/ 2 CPUs yields something slightly > different than the real HW platform this is happening with, but it > does produce the RCU stall anyway: > > [ 125.762946] BUG: soft lockup - CPU#1 stuck for 53s! [malloc_crazy:91] This soft-lockup condition can result in RCU CPU stall warnings. Fix the problem causing the soft lockup, and I bet that your RCU CPU stall warnings go away. Thanx, Paul > [ 125.766841] Modules linked in: > [ 125.768389] > [ 125.769199] CPU: 1 PID: 91 Comm: malloc_crazy Not tainted 3.14.0-rc4 #39 > [ 125.769883] task: edbded00 ti: c089c000 task.ti: c089c000 > [ 125.771743] PC is at load_balance+0x4b0/0x760 > [ 125.772069] LR is at cpumask_next_and+0x44/0x5c > [ 125.772387] pc : [<c004ff58>] lr : [<c01db940>] psr: 60000113 > [ 125.772387] sp : c089db48 ip : 80000113 fp : edfd8cf4 > [ 125.773128] r10: c0de871c r9 : ed893840 r8 : 00000000 > [ 125.773452] r7 : c0de8458 r6 : edfd8840 r5 : edfd8840 r4 : ed89389c > [ 125.773825] r3 : 000012d8 r2 : 80000113 r1 : 00000023 r0 : 00000000 > [ 125.774332] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user > [ 125.774753] Control: 30c7387d Table: 80835d40 DAC: 00000000 > [ 125.775266] CPU: 1 PID: 91 Comm: malloc_test_bcm Not tainted 3.14.0-rc4 #39 > [ 125.776392] [<c0015624>] (unwind_backtrace) from [<c00111a4>] > (show_stack+0x10/0x14) > [ 125.777026] [<c00111a4>] (show_stack) from [<c04c1bd4>] > (dump_stack+0x84/0x94) > [ 125.777429] [<c04c1bd4>] (dump_stack) from [<c007e7f0>] > (watchdog_timer_fn+0x144/0x17c) > [ 125.777865] [<c007e7f0>] (watchdog_timer_fn) from [<c003f58c>] > (__run_hrtimer.isra.32+0x54/0xe4) > [ 125.778333] [<c003f58c>] (__run_hrtimer.isra.32) from [<c003fea4>] > (hrtimer_interrupt+0x11c/0x2d0) > [ 125.778814] [<c003fea4>] (hrtimer_interrupt) from [<c03c4f80>] > (arch_timer_handler_virt+0x28/0x30) > [ 125.779297] [<c03c4f80>] (arch_timer_handler_virt) from > [<c006280c>] (handle_percpu_devid_irq+0x6c/0x84) > [ 125.779734] [<c006280c>] (handle_percpu_devid_irq) from > [<c005edec>] (generic_handle_irq+0x2c/0x3c) > [ 125.780145] [<c005edec>] (generic_handle_irq) from [<c000eb7c>] > (handle_IRQ+0x40/0x90) > [ 125.780513] [<c000eb7c>] (handle_IRQ) from [<c0008568>] > (gic_handle_irq+0x2c/0x5c) > [ 125.780867] [<c0008568>] (gic_handle_irq) from [<c0011d00>] > (__irq_svc+0x40/0x50) > [ 125.781312] Exception stack(0xc089db00 to 0xc089db48) > [ 125.781787] db00: 00000000 00000023 80000113 000012d8 ed89389c > edfd8840 edfd8840 c0de8458 > [ 125.782234] db20: 00000000 ed893840 c0de871c edfd8cf4 80000113 > c089db48 c01db940 c004ff58 > [ 125.782594] db40: 60000113 ffffffff > [ 125.782864] [<c0011d00>] (__irq_svc) from [<c004ff58>] > (load_balance+0x4b0/0x760) > [ 125.783215] [<c004ff58>] (load_balance) from [<c005035c>] > (rebalance_domains+0x154/0x284) > [ 125.783595] [<c005035c>] (rebalance_domains) from [<c00504c0>] > (run_rebalance_domains+0x34/0x164) > [ 125.784000] [<c00504c0>] (run_rebalance_domains) from [<c0025aac>] > (__do_softirq+0x110/0x24c) > [ 125.784388] [<c0025aac>] (__do_softirq) from [<c0025e6c>] > (irq_exit+0xac/0xf4) > [ 125.784726] [<c0025e6c>] (irq_exit) from [<c000eb80>] (handle_IRQ+0x44/0x90) > [ 125.785059] [<c000eb80>] (handle_IRQ) from [<c0008568>] > (gic_handle_irq+0x2c/0x5c) > [ 125.785412] [<c0008568>] (gic_handle_irq) from [<c0011d00>] > (__irq_svc+0x40/0x50) > [ 125.785742] Exception stack(0xc089dcf8 to 0xc089dd40) > [ 125.785983] dce0: > ee4e38c0 00000000 > [ 125.786360] dd00: 000200da 00000001 ee4e38a0 c0de2340 2d201000 > edfe3358 c05c0c18 00000001 > [ 125.786737] dd20: c05c0c2c c0e1e180 00000000 c089dd40 c0086050 > c0086140 40000113 ffffffff > [ 125.787120] [<c0011d00>] (__irq_svc) from [<c0086140>] > (get_page_from_freelist+0x2bc/0x638) > [ 125.787507] [<c0086140>] (get_page_from_freelist) from [<c0087018>] > (__alloc_pages_nodemask+0x114/0x8f4) > [ 125.787949] [<c0087018>] (__alloc_pages_nodemask) from [<c00a20c8>] > (handle_mm_fault+0x9f8/0xcdc) > [ 125.788357] [<c00a20c8>] (handle_mm_fault) from [<c001793c>] > (do_page_fault+0x194/0x27c) > [ 125.788726] [<c001793c>] (do_page_fault) from [<c000844c>] > (do_DataAbort+0x30/0x90) > [ 125.789080] [<c000844c>] (do_DataAbort) from [<c0011e34>] > (__dabt_usr+0x34/0x40) > [ 125.789403] Exception stack(0xc089dfb0 to 0xc089dff8) > [ 125.789652] dfa0: ae55e008 > 11111111 000dc000 ae582000 > [ 125.790029] dfc0: be967d18 00000000 00008390 00000000 00000000 > 00000000 b6f29000 be967d04 > [ 125.790404] dfe0: 11111111 be967ce8 00008550 b6e5ce48 20000010 ffffffff > [ 125.791282] BUG: soft lockup - CPU#0 stuck for 53s! [kworker/0:1:41] > [ 125.791710] Modules linked in: > [ 125.792046] > [ 125.792836] CPU: 0 PID: 41 Comm: kworker/0:1 Not tainted 3.14.0-rc4 #39 > [ 125.793832] task: ed893840 ti: c0826000 task.ti: c0826000 > [ 125.795257] PC is at finish_task_switch+0x40/0xec > [ 125.795562] LR is at __schedule+0x1fc/0x51c > [ 125.795856] pc : [<c0044724>] lr : [<c04c3274>] psr: 600f0013 > [ 125.795856] sp : c0827ec8 ip : 00000000 fp : c0827edc > [ 125.796416] r10: eda56e00 r9 : c0deac28 r8 : 00000000 > [ 125.796698] r7 : c0de871c r6 : c0826020 r5 : ed893840 r4 : 00000001 > [ 125.797028] r3 : eda56e00 r2 : 000012d7 r1 : ed854780 r0 : edfd8840 > [ 125.797488] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM > Segment kernel > [ 125.797866] Control: 30c7387d Table: 808353c0 DAC: fffffffd > [ 125.798321] CPU: 0 PID: 41 Comm: kworker/0:1 Not tainted 3.14.0-rc4 #39 > [ 125.800604] [<c0015624>] (unwind_backtrace) from [<c00111a4>] > (show_stack+0x10/0x14) > [ 125.801205] [<c00111a4>] (show_stack) from [<c04c1bd4>] > (dump_stack+0x84/0x94) > [ 125.801786] [<c04c1bd4>] (dump_stack) from [<c007e7f0>] > (watchdog_timer_fn+0x144/0x17c) > [ 125.802238] [<c007e7f0>] (watchdog_timer_fn) from [<c003f58c>] > (__run_hrtimer.isra.32+0x54/0xe4) > [ 125.802679] [<c003f58c>] (__run_hrtimer.isra.32) from [<c003fea4>] > (hrtimer_interrupt+0x11c/0x2d0) > [ 125.803108] [<c003fea4>] (hrtimer_interrupt) from [<c03c4f80>] > (arch_timer_handler_virt+0x28/0x30) > [ 125.803530] [<c03c4f80>] (arch_timer_handler_virt) from > [<c006280c>] (handle_percpu_devid_irq+0x6c/0x84) > [ 125.803965] [<c006280c>] (handle_percpu_devid_irq) from > [<c005edec>] (generic_handle_irq+0x2c/0x3c) > [ 125.804380] [<c005edec>] (generic_handle_irq) from [<c000eb7c>] > (handle_IRQ+0x40/0x90) > [ 125.804774] [<c000eb7c>] (handle_IRQ) from [<c0008568>] > (gic_handle_irq+0x2c/0x5c) > [ 125.805181] [<c0008568>] (gic_handle_irq) from [<c0011d00>] > (__irq_svc+0x40/0x50) > [ 125.805706] Exception stack(0xc0827e80 to 0xc0827ec8) > [ 125.806185] 7e80: edfd8840 ed854780 000012d7 eda56e00 00000001 > ed893840 c0826020 c0de871c > [ 125.806623] 7ea0: 00000000 c0deac28 eda56e00 c0827edc 00000000 > c0827ec8 c04c3274 c0044724 > [ 125.807032] 7ec0: 600f0013 ffffffff > [ 125.807328] [<c0011d00>] (__irq_svc) from [<c0044724>] > (finish_task_switch+0x40/0xec) > [ 125.807752] [<c0044724>] (finish_task_switch) from [<c04c3274>] > (__schedule+0x1fc/0x51c) > [ 125.808175] [<c04c3274>] (__schedule) from [<c0037590>] > (worker_thread+0x210/0x404) > [ 125.808570] [<c0037590>] (worker_thread) from [<c003cba0>] > (kthread+0xd4/0xec) > [ 125.808952] [<c003cba0>] (kthread) from [<c000e338>] > (ret_from_fork+0x14/0x3c) > [ 246.080014] INFO: rcu_sched detected stalls on CPUs/tasks: > [ 246.080611] (detected by 0, t=6972 jiffies, g=4294967160, > c=4294967159, q=127) > [ 246.081576] INFO: Stall ended before state dump start > [ 246.082510] BUG: soft lockup - CPU#1 stuck for 69s! [grep:93] > [ 246.082849] Modules linked in: > [ 246.083046] > [ 246.083179] CPU: 1 PID: 93 Comm: grep Not tainted 3.14.0-rc4 #39 > [ 246.083548] task: edbdf480 ti: c09e6000 task.ti: c09e6000 > [ 246.083897] PC is at rebalance_domains+0x0/0x284 > [ 246.084154] LR is at run_rebalance_domains+0x34/0x164 > [ 246.084430] pc : [<c0050208>] lr : [<c00504c0>] psr: 20000113 > [ 246.084430] sp : c09e7ef8 ip : c0dde340 fp : 40000005 > [ 246.084992] r10: 00000007 r9 : c0ddf840 r8 : c0ddf840 > [ 246.085267] r7 : edfe0840 r6 : 00000100 r5 : c0de209c r4 : 00000001 > [ 246.085606] r3 : c005048c r2 : 2d201000 r1 : 00000001 r0 : edfe0840 > [ 246.085944] Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user > [ 246.086315] Control: 30c7387d Table: 808351c0 DAC: 00000000 > [ 246.086626] CPU: 1 PID: 93 Comm: grep Not tainted 3.14.0-rc4 #39 > [ 246.086992] [<c0015624>] (unwind_backtrace) from [<c00111a4>] > (show_stack+0x10/0x14) > [ 246.087420] [<c00111a4>] (show_stack) from [<c04c1bd4>] > (dump_stack+0x84/0x94) > [ 246.087825] [<c04c1bd4>] (dump_stack) from [<c007e7f0>] > (watchdog_timer_fn+0x144/0x17c) > [ 246.088260] [<c007e7f0>] (watchdog_timer_fn) from [<c003f58c>] > (__run_hrtimer.isra.32+0x54/0xe4) > [ 246.088726] [<c003f58c>] (__run_hrtimer.isra.32) from [<c003fea4>] > (hrtimer_interrupt+0x11c/0x2d0) > [ 246.089190] [<c003fea4>] (hrtimer_interrupt) from [<c03c4f80>] > (arch_timer_handler_virt+0x28/0x30) > [ 246.089670] [<c03c4f80>] (arch_timer_handler_virt) from > [<c006280c>] (handle_percpu_devid_irq+0x6c/0x84) > [ 246.090155] [<c006280c>] (handle_percpu_devid_irq) from > [<c005edec>] (generic_handle_irq+0x2c/0x3c) > [ 246.090822] [<c005edec>] (generic_handle_irq) from [<c000eb7c>] > (handle_IRQ+0x40/0x90) > [ 246.091233] [<c000eb7c>] (handle_IRQ) from [<c0008568>] > (gic_handle_irq+0x2c/0x5c) > [ 246.091631] [<c0008568>] (gic_handle_irq) from [<c0011d00>] > (__irq_svc+0x40/0x50) > [ 246.092007] Exception stack(0xc09e7eb0 to 0xc09e7ef8) > [ 246.092291] 7ea0: edfe0840 > 00000001 2d201000 c005048c > [ 246.092719] 7ec0: 00000001 c0de209c 00000100 edfe0840 c0ddf840 > c0ddf840 00000007 40000005 > [ 246.093142] 7ee0: c0dde340 c09e7ef8 c00504c0 c0050208 20000113 ffffffff > [ 246.093647] [<c0011d00>] (__irq_svc) from [<c0050208>] > (rebalance_domains+0x0/0x284) > [ 246.094422] [<c0050208>] (rebalance_domains) from [<c09e6028>] (0xc09e6028) > -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html