Eric Dumazet wrote... [ commit 83fccfc3940c4a2db90fd7e7079f5b465cd8c6af ] > It definitely should help ! Yesterday, I've experienced issues somewhat similar to this, but I'm not entirely sure: Four of five systems running 4.1.9 stopped working. No reaction on network, keyboard, serial console. In one case, the stack trace as below made it to the loghost. Two things are quite different. First, the systems had a reasonable uptime, about a week. And second, the scary part: All incidents happened within a rather short time span of three minutes the most, beginning after 16:41:28 and before 16:41:54 UTC. So I assumed a brownout first - until I realized the systems faded away at slightly different times, and one is at a different location. While other systems using different kernel versions continued to operate on both sites. So, I'd be glad for answers for - Is this the same issue or should I be even more afraid? - What might be the reason for this temporal coincidence? I have no plausible idea. Confused, Christoph INFO: rcu_sched self-detected stall on CPU { 3} (t=6000 jiffies g=8932806 c=8932805 q=58491) rcu_sched kthread starved for 5999 jiffies! Task dump for CPU 3: swapper/3 R running task 0 0 1 0x00000008 ffffffff81e396c0 ffff88042dcc3b20 ffffffff810807da 0000000000000003 ffffffff81e396c0 ffff88042dcc3b40 ffffffff81083b78 ffff88042dcc3b80 0000000000000003 ffff88042dcc3b70 ffffffff810a945c ffff88042dcd5740 Call Trace: <IRQ> [<ffffffff810807da>] sched_show_task+0xaa/0x110 [<ffffffff81083b78>] dump_cpu_task+0x38/0x40 [<ffffffff810a945c>] rcu_dump_cpu_stacks+0x8c/0xc0 [<ffffffff810abf31>] rcu_check_callbacks+0x3b1/0x680 [<ffffffff810e7bb7>] ? acct_account_cputime+0x17/0x20 [<ffffffff8108484e>] ? account_system_time+0x8e/0x180 [<ffffffff810ae4d3>] update_process_times+0x33/0x60 [<ffffffff810bcae0>] tick_sched_handle.isra.14+0x30/0x40 [<ffffffff810bcbd3>] tick_sched_timer+0x43/0x80 [<ffffffff810aea2a>] __run_hrtimer.isra.32+0x4a/0xd0 [<ffffffff810af225>] hrtimer_interrupt+0xd5/0x1f0 [<ffffffff81034d84>] local_apic_timer_interrupt+0x34/0x60 INFO: rcu_sched self-detected stall on CPU { 3} (t=6000 jiffies g=8932806 c=8932805 q=58491) rcu_sched kthread starved for 5999 jiffies! Task dump for CPU 3: swapper/3 R running task 0 0 1 0x00000008 ffffffff81e396c0 ffff88042dcc3b20 ffffffff810807da 0000000000000003 ffffffff81e396c0 ffff88042dcc3b40 ffffffff81083b78 ffff88042dcc3b80 0000000000000003 ffff88042dcc3b70 ffffffff810a945c ffff88042dcd5740 Call Trace: <IRQ> [<ffffffff810807da>] sched_show_task+0xaa/0x110 [<ffffffff81083b78>] dump_cpu_task+0x38/0x40 [<ffffffff8103516c>] smp_apic_timer_interrupt+0x3c/0x60 [<ffffffff8190db7b>] apic_timer_interrupt+0x6b/0x70 [<ffffffff8190c8a9>] ? _raw_spin_unlock_irqrestore+0x9/0x10 [<ffffffff810ade58>] try_to_del_timer_sync+0x48/0x60 [<ffffffff810adeb2>] ? del_timer_sync+0x42/0x60 [<ffffffff810adeba>] del_timer_sync+0x4a/0x60 [<ffffffff8178b7da>] inet_csk_reqsk_queue_drop+0x7a/0x1f0 [<ffffffff8178ba7f>] reqsk_timer_handler+0x12f/0x290 [<ffffffff8178b950>] ? inet_csk_reqsk_queue_drop+0x1f0/0x1f0 [<ffffffff810ad9e6>] call_timer_fn.isra.26+0x26/0x80 [<ffffffff810a945c>] rcu_dump_cpu_stacks+0x8c/0xc0 [<ffffffff810abf31>] rcu_check_callbacks+0x3b1/0x680 [<ffffffff810e7bb7>] ? acct_account_cputime+0x17/0x20 [<ffffffff8108484e>] ? account_system_time+0x8e/0x180 [<ffffffff810ae4d3>] update_process_times+0x33/0x60 [<ffffffff810bcae0>] tick_sched_handle.isra.14+0x30/0x40 [<ffffffff810bcbd3>] tick_sched_timer+0x43/0x80 [<ffffffff810aea2a>] __run_hrtimer.isra.32+0x4a/0xd0 [<ffffffff810af225>] hrtimer_interrupt+0xd5/0x1f0 [<ffffffff81034d84>] local_apic_timer_interrupt+0x34/0x60 [<ffffffff810ae1ae>] run_timer_softirq+0x18e/0x220 [<ffffffff81060b1a>] __do_softirq+0xda/0x1f0 [<ffffffff81060e16>] irq_exit+0x76/0xa0 [<ffffffff81035175>] smp_apic_timer_interrupt+0x45/0x60 [<ffffffff8190db7b>] apic_timer_interrupt+0x6b/0x70 <EOI> [<ffffffff810844be>] ? sched_clock_cpu+0x9e/0xb0 [<ffffffff8100bc15>] ? amd_e400_idle+0x35/0xd0 [<ffffffff8100bc13>] ? amd_e400_idle+0x33/0xd0 [<ffffffff8100c42a>] arch_cpu_idle+0xa/0x10 [<ffffffff810929e3>] cpu_startup_entry+0x2c3/0x330 [<ffffffff8103516c>] smp_apic_timer_interrupt+0x3c/0x60 [<ffffffff8190db7b>] apic_timer_interrupt+0x6b/0x70 [<ffffffff8190c8a9>] ? _raw_spin_unlock_irqrestore+0x9/0x10 [<ffffffff810ade58>] try_to_del_timer_sync+0x48/0x60 [<ffffffff810adeb2>] ? del_timer_sync+0x42/0x60 [<ffffffff810adeba>] del_timer_sync+0x4a/0x60 [<ffffffff8178b7da>] inet_csk_reqsk_queue_drop+0x7a/0x1f0 [<ffffffff8178ba7f>] reqsk_timer_handler+0x12f/0x290 [<ffffffff8178b950>] ? inet_csk_reqsk_queue_drop+0x1f0/0x1f0 [<ffffffff810ad9e6>] call_timer_fn.isra.26+0x26/0x80 [<ffffffff810332dc>] start_secondary+0x17c/0x1a0 [<ffffffff810ae1ae>] run_timer_softirq+0x18e/0x220 [<ffffffff81060b1a>] __do_softirq+0xda/0x1f0 [<ffffffff81060e16>] irq_exit+0x76/0xa0 [<ffffffff81035175>] smp_apic_timer_interrupt+0x45/0x60 [<ffffffff8190db7b>] apic_timer_interrupt+0x6b/0x70 <EOI> [<ffffffff810844be>] ? sched_clock_cpu+0x9e/0xb0 [<ffffffff8100bc15>] ? amd_e400_idle+0x35/0xd0 [<ffffffff8100bc13>] ? amd_e400_idle+0x33/0xd0 [<ffffffff8100c42a>] arch_cpu_idle+0xa/0x10 [<ffffffff810929e3>] cpu_startup_entry+0x2c3/0x330 [<ffffffff810332dc>] start_secondary+0x17c/0x1a0 -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html