rcu_sched stall in a dual-core dedicated server on a datacenter

miguel.telleria@xxxxxxxxxx · Thu, 04 Jun 2015 17:38:16 +0200

Dear kernelnewbies,

In one of the dedicated servers that our company has for shared web 
hosting we have had a series of log lines:

Jun  3 00:33:53 hostname kernel: INFO: rcu_sched self-detected stall on 
CPU { 0}  (t=15001 jiffies g=66854079 c=66854078 q=686736)
Jun  3 00:33:53 hostname kernel: INFO: rcu_sched detected stalls on 
CPUs/tasks: { 0} (detected by 1, t=15002 jiffies, g=66854079, 
c=66854078, q=686736)

The lines appeared every 3 minutes from 00:33:54 to 05:12:33 with the 
last line being:

Jun  3 05:12:33 ns25 kernel: INFO: rcu_sched self-detected stall on CPU 
{ 0}  (t=4155459 jiffies g=66854079 c=66854078 q=10628601)
Jun  3 05:12:33 ns25 kernel: INFO: rcu_sched detected stalls on 
CPUs/tasks: { 0} (detected by 1, t=4155460 jiffies, g=66854079, 
c=66854078, q=10628601)

     with all of them there are traces of CPU states with the kernel 
function trace, CPU registers, etc.

Some more info about this:

-  The problem ended silently as it started, no server restart took 
place, uptime still shows plenty of time.

-  The value in t=nnnnnn seems to be the time elapsed in this state in 
Jiffies.  4155460 = 16621 sg = 4h 36min,
    roughly the time difference between 05:12 and 00:33

-  Kernel 3.10 from our datacenter.  Dumping /proc/config.gz I obtain 
the following info:

    # RCU Subsystem
    CONFIG_TREE_RCU=y
    # CONFIG_PREEMPT_RCU is not set
    CONFIG_RCU_STALL_COMMON=y
    # CONFIG_RCU_USER_QS is not set
    CONFIG_RCU_FANOUT=64
    CONFIG_RCU_FANOUT_LEAF=16
    # CONFIG_RCU_FANOUT_EXACT is not set
    # CONFIG_RCU_FAST_NO_HZ is not set
    # CONFIG_TREE_RCU_TRACE is not set
    # CONFIG_RCU_NOCB_CPU is not set
    # RCU Debugging
    # CONFIG_SPARSE_RCU_POINTER is not set
    CONFIG_RCU_CPU_STALL_TIMEOUT=60

    CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
    # CONFIG_CGROUP_SCHED is not set
    # CONFIG_SCHED_AUTOGROUP is not set
    CONFIG_DEFAULT_IOSCHED="deadline"
    CONFIG_SCHED_OMIT_FRAME_POINTER=y
    CONFIG_SCHED_SMT=y
    CONFIG_SCHED_MC=y
    CONFIG_SCHED_HRTICK=y
    CONFIG_NET_SCHED=y
    # Queueing/Scheduling
    # CONFIG_USB_EHCI_TT_NEWSCHED is not set

    CONFIG_PREEMPT_NOTIFIERS=y
    CONFIG_PREEMPT_NONE=y
    # CONFIG_PREEMPT_VOLUNTARY is not set
    # CONFIG_PREEMPT is not set

My questions are
----------------

1. Am I right assuming that the whole CPU was stalled for all that 
period?

    We have proof that the machine was still working (log info among it),
    but maybe it was only working with one core.

2. The task that was running on the stalled core, was it trapped
    without running the whole time?  With the log lines above I have 
always had this line:

      kernel: CPU: 0 PID: 1288 Comm: php-cgi Not tainted 
3.10.23-xxxx-std-ipv6-64 #1

3. Can there be other tasks trapped?  AFAIK modern Linux 3.10 does not 
keep separate run-queues for each core, right?

4. What are the meaning of the t=, g= and c= values in the rcu_sched 
log?

    t=4155459 jiffies g=66854079 c=66854078 q=10628601

    AFAICT, t= represents the time elapsed since this situation started. 
g= and c= are constant values and q= is an increasing figure.

5. Any ideas of where can this come from?  Here is the trace if it 
helps:

[<ffffffff8160806c>] __const_udelay+0x2c/0x30
[<ffffffff8106338a>] arch_trigger_all_cpu_backtrace+0x6a/0xa0
  [<ffffffff81115cdd>] rcu_check_callbacks+0x2ed/0x560
  [<ffffffff81117c17>] ? acct_account_cputime+0x17/0x20
  [<ffffffff810e97df>] ? account_system_time+0xcf/0x180
  [<ffffffff810ca2c3>] update_process_times+0x43/0x80
  [<ffffffff810f95c1>] tick_sched_handle.isra.12+0x31/0x40
  [<ffffffff810f9704>] tick_sched_timer+0x44/0x70
  [<ffffffff810df07a>] __run_hrtimer.isra.29+0x4a/0xd0
  [<ffffffff810df9b5>] hrtimer_interrupt+0xf5/0x230
  [<ffffffff810626c4>] smp_apic_timer_interrupt+0x64/0xa0
  [<ffffffff81d421ca>] apic_timer_interrupt+0x6a/0x70
  <EOI>
  [<ffffffff81183e80>] ? switch_names+0xc0/0xc0
  [<ffffffff81d409ee>] ? _raw_spin_lock+0xe/0x30
  [<ffffffff81184eb3>] shrink_dentry_list+0x43/0x1f0
  [<ffffffff811854b7>] prune_dcache_sb+0x137/0x160
  [<ffffffff81173ab8>] prune_super+0x168/0x1a0
  [<ffffffff81131119>] shrink_slab+0x129/0x1e0
  [<ffffffff811331f8>] ? shrink_zone+0xa8/0xd0
  [<ffffffff811337af>] do_try_to_free_pages+0x31f/0x410
  [<ffffffff811339dd>] try_to_free_pages+0x9d/0xc0
  [<ffffffff8112b96e>] __alloc_pages_nodemask+0x4de/0x840
  [<ffffffff81159560>] alloc_pages_vma+0xa0/0x1a0
  [<ffffffff81142b67>] handle_pte_fault+0x7b7/0x9f0
  [<ffffffff81146906>] ? vma_gap_update+0x26/0x30
  [<ffffffff8113e293>] ? anon_vma_interval_tree_insert+0x83/0x90
  [<ffffffff81143047>] handle_mm_fault+0x147/0x1e0
  [<ffffffff8106c391>] __do_page_fault+0x151/0x4b0
  [<ffffffff811478ff>] ? vma_merge+0x1ff/0x320
  [<ffffffff81148a4b>] ? do_brk+0x1bb/0x320
  [<ffffffff8106c719>] do_page_fault+0x9/0x10
  [<ffffffff81d41172>] page_fault+0x22/0x30

Any ideas will be welcome.

_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@xxxxxxxxxxxxxxxxx
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies