[no subject]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



linux-kernel@xxxxxxxxxxxxxxx,jack@xxxxxxxx,decui@xxxxxxxxxxxxx
Bcc: 
Subject: Re: kernel panics with 4.14.X versions
Reply-To: 
In-Reply-To: <47d114b6-cf57-152a-32ad-07a541b05198@xxxxxxxxx>

Fwiw, there have been already reports of similar soft lockups in
fsnotify() on 4.14: https://lkml.org/lkml/2018/3/2/1038

We have also noticed similar softlockups with 4.14.22 here.

On 16 Apr 13:54, Pavlos Parissis wrote:
>
> Hi all,
> 
> We have observed kernel panics on several master kubernetes clusters, where we run
> kubernetes API services and not application workloads.
> 
> Those clusters use kernel version 4.14.14 and 4.14.32, but we switched everything
> to kernel version 4.14.32 as a way to address the issue.
> 
> We have HP and Dell hardware on those clusters, and network cards are also different,
> we have bnx2x and mlx5_core in use.
> 
> We also run kernel version 4.14.32 on different type of workloads, software load
> balancing using HAProxy, and we don't have any crashes there.
> 
> Since the crash happens on different hardware, we think it could be a kernel issue,
> but we aren't sure about it. Thus, I am contacting kernel people in order to get some
> hint, which can help us to figure out what causes this.
> 
> In our kubernetes clusters, we have instructed the kernel to panic upon soft lockup,
> we use 'kernel.softlockup_panic=1', 'kernel.hung_task_panic=1' and 'kernel.watchdog_thresh=10'.
> Thus, we see the stack traces. Today, we have disabled this, later I will explain why.
> 
> I believe we have two discint types of panics, one is trigger upon soft lockup and another one
> where the call trace is about scheduler("sched: Unexpected reschedule of offline CPU#8!)
> 
> 
> Let me walk you through the kernel panics and some observations.
> 
> The followin series of stack traces are happening when one CPU (CPU 24) is stuck for ~22 seconds.
> watchdog_thresh is set to 10 and as far as I remember softlockup threshold is (2 * watchdog_thresh),
> so it makes sense to see the kernel crashing after ~20seconds.
> 
> After the stack trace, we have the output of sar for CPU#24 and we see that just before the
> crash CPU utilization for system level went to 100%. Now let's move to another panic.
> 
> [373782.361064] watchdog: BUG: soft lockup - CPU#24 stuck for 22s! [kube-apiserver:24261]
> [373782.378225] Modules linked in: binfmt_misc sctp_diag sctp dccp_diag dccp tcp_diag udp_diag
> inet_diag unix_diag cfg80211 rfkill dell_rbu 8021q garp mrp xfs libcrc32c loop x86_pkg_temp_thermal
> intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel
> pcbc aesni_intel vfat fat crypto_simd glue_helper cryptd intel_cstate intel_rapl_perf iTCO_wdt ses
> iTCO_vendor_support mxm_wmi ipmi_si dcdbas enclosure mei_me pcspkr ipmi_devintf lpc_ich sg mei
> ipmi_msghandler mfd_core shpchp wmi acpi_power_meter netconsole nfsd auth_rpcgss nfs_acl lockd grace
> sunrpc ip_tables ext4 mbcache jbd2 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt
> fb_sys_fops sd_mod ttm crc32c_intel ahci libahci mlx5_core drm mlxfw mpt3sas ptp libata raid_class
> pps_core scsi_transport_sas
> [373782.516807]  dm_mirror dm_region_hash dm_log dm_mod dax
> [373782.531739] CPU: 24 PID: 24261 Comm: kube-apiserver Not tainted 4.14.32-1.el7.x86_64 #1
> [373782.549848] Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS 2.4.3 01/17/2017
> [373782.567486] task: ffff882f66d28000 task.stack: ffffc9002120c000
> [373782.583441] RIP: 0010:fsnotify+0x197/0x510
> [373782.597319] RSP: 0018:ffffc9002120fdb8 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff10
> [373782.615308] RAX: 0000000000000000 RBX: ffff882f9ec65c20 RCX: 0000000000000002
> [373782.632950] RDX: 0000000000028700 RSI: 0000000000000002 RDI: ffffffff8269a4e0
> [373782.650616] RBP: ffffc9002120fe98 R08: 0000000000000000 R09: 0000000000000000
> [373782.668287] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> [373782.685918] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [373782.703302] FS:  000000c42009f090(0000) GS:ffff882fbf900000(0000) knlGS:0000000000000000
> [373782.721887] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [373782.737741] CR2: 00007f82b6539244 CR3: 0000002f3de2a005 CR4: 00000000003606e0
> [373782.755247] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [373782.772722] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [373782.790043] Call Trace:
> [373782.802041]  vfs_write+0x151/0x1b0
> [373782.815081]  ? syscall_trace_enter+0x1cd/0x2b0
> [373782.829175]  SyS_write+0x55/0xc0
> [373782.841870]  do_syscall_64+0x79/0x1b0
> [373782.855073]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> [373782.869807] RIP: 0033:0x483084
> [373782.882293] RSP: 002b:000000c4387e57f0 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> [373782.899997] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000483084
> [373782.917177] RDX: 00000000000002b3 RSI: 000000c42e27d800 RDI: 000000000000014b
> [373782.934268] RBP: 000000c4387e5840 R08: 0000000000000000 R09: 0000000000000000
> [373782.951297] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> [373782.968208] R13: 00000000000000f2 R14: 0000000000000032 R15: 0000000000000002
> [373782.985003] Code: 0f 84 f6 02 00 00 48 8b 45 a0 4d 85 d2 48 8b 00 48 89 45 a8 48 89 45 a0 0f 85
> ef 02 00 00 48 8b 45 b0 48 89 45 98 48 83 7d a0 00 <0f> 95 c0 48 83 7d 98 00 0f 95 c2 89 d1 08 c1 0f
> 84 fc 02 00 00
> [373783.024208] Kernel panic - not syncing: softlockup: hung tasks
> [373783.039881] CPU: 24 PID: 24261 Comm: kube-apiserver Tainted: G             L
> 4.14.32-1.el7.x86_64 #1
> [373783.059497] Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS 2.4.3 01/17/2017
> [373783.077206] Call Trace:
> [373783.089115]  <IRQ>
> [373783.100422]  dump_stack+0x63/0x88
> [373783.113081]  panic+0xe8/0x258
> [373783.125109]  watchdog_timer_fn+0x21a/0x230
> [373783.138546]  ? watchdog+0x30/0x30
> [373783.150870]  __hrtimer_run_queues+0xe7/0x230
> [373783.164081]  hrtimer_interrupt+0xa8/0x1a0
> [373783.176703]  smp_apic_timer_interrupt+0x6b/0x140
> [373783.189788]  apic_timer_interrupt+0x8e/0xa0
> [373783.202198]  </IRQ>
> [373783.211900] RIP: 0010:fsnotify+0x197/0x510
> [373783.223746] RSP: 0018:ffffc9002120fdb8 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff10
> [373783.239434] RAX: 0000000000000000 RBX: ffff882f9ec65c20 RCX: 0000000000000002
> [373783.254599] RDX: 0000000000028700 RSI: 0000000000000002 RDI: ffffffff8269a4e0
> [373783.269673] RBP: ffffc9002120fe98 R08: 0000000000000000 R09: 0000000000000000
> [373783.284629] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> [373783.299460] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [373783.314200]  ? fsnotify+0x4bb/0x510
> [373783.324757]  vfs_write+0x151/0x1b0
> [373783.335115]  ? syscall_trace_enter+0x1cd/0x2b0
> [373783.346617]  SyS_write+0x55/0xc0
> [373783.356735]  do_syscall_64+0x79/0x1b0
> [373783.367330]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> [373783.379606] RIP: 0033:0x483084
> [373783.389540] RSP: 002b:000000c4387e57f0 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> [373783.404657] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000483084
> [373783.419294] RDX: 00000000000002b3 RSI: 000000c42e27d800 RDI: 000000000000014b
> [373783.433922] RBP: 000000c4387e5840 R08: 0000000000000000 R09: 0000000000000000
> [373783.448565] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> [373783.463128] R13: 00000000000000f2 R14: 0000000000000032 R15: 0000000000000002
> [373783.477744] Kernel Offset: disabled
> [373783.492343] ---[ end Kernel panic - not syncing: softlockup: hung tasks
> [373783.506452] ------------[ cut here ]------------
> [373783.518376] WARNING: CPU: 24 PID: 24261 at kernel/sched/core.c:1179 set_task_cpu+0x197/0x1a0
> [373783.534730] Modules linked in: binfmt_misc sctp_diag sctp dccp_diag dccp tcp_diag udp_diag
> inet_diag unix_diag cfg80211 rfkill dell_rbu 8021q garp mrp xfs libcrc32c loop x86_pkg_temp_thermal
> intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel
> pcbc aesni_intel vfat fat crypto_simd glue_helper cryptd intel_cstate intel_rapl_perf iTCO_wdt ses
> iTCO_vendor_support mxm_wmi ipmi_si dcdbas enclosure mei_me pcspkr ipmi_devintf lpc_ich sg mei
> ipmi_msghandler mfd_core shpchp wmi acpi_power_meter netconsole nfsd auth_rpcgss nfs_acl lockd grace
> sunrpc ip_tables ext4 mbcache jbd2 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt
> fb_sys_fops sd_mod ttm crc32c_intel ahci libahci mlx5_core drm mlxfw mpt3sas ptp libata raid_class
> pps_core scsi_transport_sas
> [373783.667938]  dm_mirror dm_region_hash dm_log dm_mod dax
> [373783.682082] CPU: 24 PID: 24261 Comm: kube-apiserver Tainted: G             L
> 4.14.32-1.el7.x86_64 #1
> [373783.700753] Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS 2.4.3 01/17/2017
> [373783.717501] task: ffff882f66d28000 task.stack: ffffc9002120c000
> [373783.732386] RIP: 0010:set_task_cpu+0x197/0x1a0
> [373783.745458] RSP: 0018:ffff882fbf903b88 EFLAGS: 00010046
> [373783.759432] RAX: 0000000000000200 RBX: ffff885fb3cb45c0 RCX: 0000000000000001
> [373783.775692] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff885fb3cb45c0
> [373783.791999] RBP: ffff882fbf903ba8 R08: 0000000000000000 R09: 0000000000000000
> [373783.808362] R10: 0000000000000000 R11: 0000000000000000 R12: ffff885fb3cb516c
> [373783.824785] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000022ac0
> [373783.841196] FS:  000000c42009f090(0000) GS:ffff882fbf900000(0000) knlGS:0000000000000000
> [373783.858761] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [373783.873710] CR2: 00007f82b6539244 CR3: 0000002f3de2a005 CR4: 00000000003606e0
> [373783.890304] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [373783.906951] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [373783.923503] Call Trace:
> [373783.934742]  <IRQ>
> [373783.945346]  try_to_wake_up+0x16c/0x480
> [373783.957961]  default_wake_function+0x12/0x20
> [373783.971086]  autoremove_wake_function+0x16/0x60
> [373783.984483]  __wake_up_common+0x8f/0x160
> [373783.997154]  __wake_up_common_lock+0x7e/0xc0
> [373784.010293]  __wake_up+0x13/0x20
> [373784.022125]  wake_up_klogd_work_func+0x40/0x60
> [373784.035365]  irq_work_run_list+0x53/0x80
> [373784.048042]  irq_work_run+0x2c/0x30
> [373784.060132]  flush_smp_call_function_queue+0x88/0x110
> [373784.074076]  generic_smp_call_function_single_interrupt+0x13/0x30
> [373784.089312]  smp_call_function_single_interrupt+0x3a/0xe0
> [373784.103788]  call_function_single_interrupt+0x8e/0xa0
> [373784.117820] RIP: 0010:panic+0x206/0x258
> [373784.130402] RSP: 0018:ffff882fbf903e80 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff04
> [373784.147325] RAX: 000000000000003b RBX: 0000000000000000 RCX: 0000000000000006
> [373784.163842] RDX: 0000000000000000 RSI: 0000000000000092 RDI: ffff882fbf9169d0
> [373784.180394] RBP: ffff882fbf903ef0 R08: 0000000000000001 R09: 00000000000006b1
> [373784.197041] R10: 0000000000000001 R11: 0000000000000002 R12: ffffffff81e6be9f
> [373784.213609] R13: 0000000000000000 R14: 0000000000000000 R15: 00000000ee6b2800
> [373784.230077]  watchdog_timer_fn+0x21a/0x230
> [373784.243095]  ? watchdog+0x30/0x30
> [373784.255113]  __hrtimer_run_queues+0xe7/0x230
> [373784.267974]  hrtimer_interrupt+0xa8/0x1a0
> [373784.280195]  smp_apic_timer_interrupt+0x6b/0x140
> [373784.292919]  apic_timer_interrupt+0x8e/0xa0
> [373784.304979]  </IRQ>
> [373784.314365] RIP: 0010:fsnotify+0x197/0x510
> [373784.325739] RSP: 0018:ffffc9002120fdb8 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff10
> [373784.340979] RAX: 0000000000000000 RBX: ffff882f9ec65c20 RCX: 0000000000000002
> [373784.355767] RDX: 0000000000028700 RSI: 0000000000000002 RDI: ffffffff8269a4e0
> [373784.370474] RBP: ffffc9002120fe98 R08: 0000000000000000 R09: 0000000000000000
> [373784.385000] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> [373784.399438] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [373784.413725]  ? fsnotify+0x4bb/0x510
> [373784.423875]  vfs_write+0x151/0x1b0
> [373784.433861]  ? syscall_trace_enter+0x1cd/0x2b0
> [373784.444973]  SyS_write+0x55/0xc0
> [373784.454738]  do_syscall_64+0x79/0x1b0
> [373784.464901]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> [373784.476731] RIP: 0033:0x483084
> [373784.486201] RSP: 002b:000000c4387e57f0 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> [373784.500878] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000483084
> [373784.515015] RDX: 00000000000002b3 RSI: 000000c42e27d800 RDI: 000000000000014b
> [373784.529155] RBP: 000000c4387e5840 R08: 0000000000000000 R09: 0000000000000000
> [373784.543400] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> [373784.557490] R13: 00000000000000f2 R14: 0000000000000032 R15: 0000000000000002
> [373784.571578] Code: ff 80 8b ac 08 00 00 04 e9 20 ff ff ff 0f 0b e9 b9 fe ff ff f7 83 84 00 00 00
> fd ff ff ff 0f 84 c3 fe ff ff 0f 0b e9 bc fe ff ff <0f> 0b e9 cb fe ff ff 66 90 0f 1f 44 00 00 55 48
> 89 e5 41 56 49
> [373784.605527] ---[ end trace d3faf76bdc3ca403 ]---
> [373784.617188] sched: Unexpected reschedule of offline CPU#0!
> [373784.629856] ------------[ cut here ]------------
> [373784.641694] WARNING: CPU: 24 PID: 24261 at arch/x86/kernel/smp.c:128
> native_smp_send_reschedule+0x42/0x50
> [373784.659370] Modules linked in: binfmt_misc sctp_diag sctp dccp_diag dccp tcp_diag udp_diag
> inet_diag unix_diag cfg80211 rfkill dell_rbu 8021q garp mrp xfs libcrc32c loop x86_pkg_temp_thermal
> intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel
> pcbc aesni_intel vfat fat crypto_simd glue_helper cryptd intel_cstate intel_rapl_perf iTCO_wdt ses
> iTCO_vendor_support mxm_wmi ipmi_si dcdbas enclosure mei_me pcspkr ipmi_devintf lpc_ich sg mei
> ipmi_msghandler mfd_core shpchp wmi acpi_power_meter netconsole nfsd auth_rpcgss nfs_acl lockd grace
> sunrpc ip_tables ext4 mbcache jbd2 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt
> fb_sys_fops sd_mod ttm crc32c_intel ahci libahci mlx5_core drm mlxfw mpt3sas ptp libata raid_class
> pps_core scsi_transport_sas
> [373784.793557]  dm_mirror dm_region_hash dm_log dm_mod dax
> [373784.807848] CPU: 24 PID: 24261 Comm: kube-apiserver Tainted: G        W    L
> 4.14.32-1.el7.x86_64 #1
> [373784.826743] Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS 2.4.3 01/17/2017
> [373784.843685] task: ffff882f66d28000 task.stack: ffffc9002120c000
> [373784.858935] RIP: 0010:native_smp_send_reschedule+0x42/0x50
> [373784.873706] RSP: 0018:ffff882fbf903b10 EFLAGS: 00010046
> [373784.888200] RAX: 000000000000002e RBX: 0000000000000000 RCX: 0000000000000006
> [373784.904979] RDX: 0000000000000000 RSI: 0000000000000092 RDI: ffff882fbf9169d0
> [373784.921626] RBP: ffff882fbf903b10 R08: 0000000000000001 R09: 00000000000006f8
> [373784.938313] R10: 0000000000000001 R11: 0000000000000000 R12: ffff882fbf622ac0
> [373784.955106] R13: ffff885fb3cb45c0 R14: ffff882fbf903bc8 R15: ffff882fbf622ac0
> [373784.971891] FS:  000000c42009f090(0000) GS:ffff882fbf900000(0000) knlGS:0000000000000000
> [373784.989852] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [373785.005204] CR2: 00007f82b6539244 CR3: 0000002f3de2a005 CR4: 00000000003606e0
> [373785.022197] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [373785.039227] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [373785.056132] Call Trace:
> [373785.067623]  <IRQ>
> [373785.078506]  resched_curr+0xae/0xd0
> [373785.091051]  check_preempt_curr+0x79/0xa0
> [373785.104217]  ttwu_do_wakeup+0x1e/0x160
> [373785.117171]  ttwu_do_activate+0x7a/0x90
> [373785.130058]  try_to_wake_up+0x1e7/0x480
> [373785.142959]  default_wake_function+0x12/0x20
> [373785.156411]  autoremove_wake_function+0x16/0x60
> [373785.170119]  __wake_up_common+0x8f/0x160
> [373785.183152]  __wake_up_common_lock+0x7e/0xc0
> [373785.196508]  __wake_up+0x13/0x20
> [373785.208612]  wake_up_klogd_work_func+0x40/0x60
> [373785.222065]  irq_work_run_list+0x53/0x80
> [373785.234885]  irq_work_run+0x2c/0x30
> [373785.247071]  flush_smp_call_function_queue+0x88/0x110
> [373785.261146]  generic_smp_call_function_single_interrupt+0x13/0x30
> [373785.276556]  smp_call_function_single_interrupt+0x3a/0xe0
> [373785.291300]  call_function_single_interrupt+0x8e/0xa0
> [373785.305485] RIP: 0010:panic+0x206/0x258
> [373785.318154] RSP: 0018:ffff882fbf903e80 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff04
> [373785.335001] RAX: 000000000000003b RBX: 0000000000000000 RCX: 0000000000000006
> [373785.351418] RDX: 0000000000000000 RSI: 0000000000000092 RDI: ffff882fbf9169d0
> [373785.367776] RBP: ffff882fbf903ef0 R08: 0000000000000001 R09: 00000000000006b1
> [373785.383990] R10: 0000000000000001 R11: 0000000000000002 R12: ffffffff81e6be9f
> [373785.400019] R13: 0000000000000000 R14: 0000000000000000 R15: 00000000ee6b2800
> [373785.415792]  watchdog_timer_fn+0x21a/0x230
> [373785.427910]  ? watchdog+0x30/0x30
> [373785.438891]  __hrtimer_run_queues+0xe7/0x230
> [373785.450736]  hrtimer_interrupt+0xa8/0x1a0
> [373785.462037]  smp_apic_timer_interrupt+0x6b/0x140
> [373785.473814]  apic_timer_interrupt+0x8e/0xa0
> [373785.485054]  </IRQ>
> [373785.493740] RIP: 0010:fsnotify+0x197/0x510
> [373785.504592] RSP: 0018:ffffc9002120fdb8 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff10
> [373785.519343] RAX: 0000000000000000 RBX: ffff882f9ec65c20 RCX: 0000000000000002
> [373785.533627] RDX: 0000000000028700 RSI: 0000000000000002 RDI: ffffffff8269a4e0
> [373785.547934] RBP: ffffc9002120fe98 R08: 0000000000000000 R09: 0000000000000000
> [373785.562192] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> [373785.576431] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [373785.590592]  ? fsnotify+0x4bb/0x510
> [373785.600647]  vfs_write+0x151/0x1b0
> [373785.610507]  ? syscall_trace_enter+0x1cd/0x2b0
> [373785.621459]  SyS_write+0x55/0xc0
> [373785.630952]  do_syscall_64+0x79/0x1b0
> [373785.640818]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> [373785.652319] RIP: 0033:0x483084
> [373785.661599] RSP: 002b:000000c4387e57f0 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> [373785.676059] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000483084
> [373785.690181] RDX: 00000000000002b3 RSI: 000000c42e27d800 RDI: 000000000000014b
> [373785.704317] RBP: 000000c4387e5840 R08: 0000000000000000 R09: 0000000000000000
> [373785.718448] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> [373785.732562] R13: 00000000000000f2 R14: 0000000000000032 R15: 0000000000000002
> [373785.746624] Code: c0 74 1a 48 8b 05 7f 44 ec 00 be fd 00 00 00 48 8b 80 a0 00 00 00 e8 ae 1a 9b
> 00 5d c3 89 fe 48 c7 c7 b8 26 e5 81 e8 21 45 09 00 <0f> 0b 5d c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f
> 44 00 00 55 48
> [373785.780531] ---[ end trace d3faf76bdc3ca404 ]---
> [373785.792207] sched: Unexpected reschedule of offline CPU#42!
> [373785.804993] ------------[ cut here ]------------
> [373785.816775] WARNING: CPU: 24 PID: 24261 at arch/x86/kernel/smp.c:128
> native_smp_send_reschedule+0x42/0x50
> [373785.834478] Modules linked in: binfmt_misc sctp_diag sctp dccp_diag dccp tcp_diag udp_diag
> inet_diag unix_diag cfg80211 rfkill dell_rbu 8021q garp mrp xfs libcrc32c loop x86_pkg_temp_thermal
> intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel
> pcbc aesni_intel vfat fat crypto_simd glue_helper cryptd intel_cstate intel_rapl_perf iTCO_wdt ses
> iTCO_vendor_support mxm_wmi ipmi_si dcdbas enclosure mei_me pcspkr ipmi_devintf lpc_ich sg mei
> ipmi_msghandler mfd_core shpchp wmi acpi_power_meter netconsole nfsd auth_rpcgss nfs_acl lockd grace
> sunrpc ip_tables ext4 mbcache jbd2 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt
> fb_sys_fops sd_mod ttm crc32c_intel ahci libahci mlx5_core drm mlxfw mpt3sas ptp libata raid_class
> pps_core scsi_transport_sas
> [373785.968794]  dm_mirror dm_region_hash dm_log dm_mod dax
> [373785.983020] CPU: 24 PID: 24261 Comm: kube-apiserver Tainted: G        W    L
> 4.14.32-1.el7.x86_64 #1
> [373786.001870] Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS 2.4.3 01/17/2017
> [373786.018790] task: ffff882f66d28000 task.stack: ffffc9002120c000
> [373786.034031] RIP: 0010:native_smp_send_reschedule+0x42/0x50
> [373786.048836] RSP: 0018:ffff882fbf9039e0 EFLAGS: 00010046
> [373786.063302] RAX: 000000000000002f RBX: 000000000000002a RCX: 0000000000000006
> [373786.080012] RDX: 0000000000000000 RSI: 0000000000000092 RDI: ffff882fbf9169d0
> [373786.096647] RBP: ffff882fbf9039e0 R08: 0000000000000001 R09: 0000000000000743
> [373786.113328] R10: 0000000000000001 R11: 0000000000000000 R12: ffff882fbfb62ac0
> [373786.130019] R13: ffff882fb3f61740 R14: ffff882fbf903a98 R15: ffff882fbfb62ac0
> [373786.146724] FS:  000000c42009f090(0000) GS:ffff882fbf900000(0000) knlGS:0000000000000000
> [373786.164613] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [373786.179892] CR2: 00007f82b6539244 CR3: 0000002f3de2a005 CR4: 00000000003606e0
> [373786.196879] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [373786.213858] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [373786.230669] Call Trace:
> [373786.242081]  <IRQ>
> [373786.252989]  resched_curr+0xae/0xd0
> [373786.265510]  check_preempt_curr+0x79/0xa0
> [373786.278628]  ttwu_do_wakeup+0x1e/0x160
> [373786.291544]  ttwu_do_activate+0x7a/0x90
> [373786.304508]  try_to_wake_up+0x1e7/0x480
> [373786.317475]  ? check_preempt_curr+0x79/0xa0
> [373786.330755]  default_wake_function+0x12/0x20
> [373786.344077]  __wake_up_common+0x8f/0x160
> [373786.357105]  __wake_up_locked+0x16/0x20
> [373786.369982]  complete+0x42/0x60
> [373786.381975]  mlx5_cmd_comp_handler+0x28f/0x4b0 [mlx5_core]
> [373786.396534]  mlx5_eq_int+0x1ae/0x550 [mlx5_core]
> [373786.410080]  ? __wake_up_common+0x8f/0x160
> [373786.423054]  __handle_irq_event_percpu+0x42/0x1a0
> [373786.436719]  handle_irq_event_percpu+0x32/0x80
> [373786.450184]  handle_irq_event+0x3b/0x60
> [373786.462935]  handle_edge_irq+0x95/0x1a0
> [373786.475441]  handle_irq+0xb5/0x140
> [373786.487323]  ? irq_work_run+0x2c/0x30
> [373786.499336]  ? flush_smp_call_function_queue+0x88/0x110
> [373786.513191]  do_IRQ+0x48/0xe0
> [373786.524434]  common_interrupt+0x8e/0x8e
> [373786.536517] RIP: 0010:panic+0x206/0x258
> [373786.548351] RSP: 0018:ffff882fbf903e80 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff7e
> [373786.564290] RAX: 000000000000003b RBX: 0000000000000000 RCX: 0000000000000006
> [373786.579556] RDX: 0000000000000000 RSI: 0000000000000092 RDI: ffff882fbf9169d0
> [373786.594559] RBP: ffff882fbf903ef0 R08: 0000000000000001 R09: 00000000000006b1
> [373786.609374] R10: 0000000000000001 R11: 0000000000000002 R12: ffffffff81e6be9f
> [373786.623990] R13: 0000000000000000 R14: 0000000000000000 R15: 00000000ee6b2800
> [373786.638331]  watchdog_timer_fn+0x21a/0x230
> [373786.649202]  ? watchdog+0x30/0x30
> [373786.659024]  __hrtimer_run_queues+0xe7/0x230
> [373786.669762]  hrtimer_interrupt+0xa8/0x1a0
> [373786.680120]  smp_apic_timer_interrupt+0x6b/0x140
> [373786.691100]  apic_timer_interrupt+0x8e/0xa0
> [373786.701618]  </IRQ>
> [373786.709633] RIP: 0010:fsnotify+0x197/0x510
> [373786.719960] RSP: 0018:ffffc9002120fdb8 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff10
> [373786.734322] RAX: 0000000000000000 RBX: ffff882f9ec65c20 RCX: 0000000000000002
> [373786.748258] RDX: 0000000000028700 RSI: 0000000000000002 RDI: ffffffff8269a4e0
> [373786.762175] RBP: ffffc9002120fe98 R08: 0000000000000000 R09: 0000000000000000
> [373786.776003] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> [373786.789766] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [373786.803354]  ? fsnotify+0x4bb/0x510
> [373786.812823]  vfs_write+0x151/0x1b0
> [373786.822215]  ? syscall_trace_enter+0x1cd/0x2b0
> [373786.832724]  SyS_write+0x55/0xc0
> [373786.841898]  do_syscall_64+0x79/0x1b0
> [373786.851586]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> [373786.862893] RIP: 0033:0x483084
> [373786.871921] RSP: 002b:000000c4387e57f0 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> [373786.886319] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000483084
> [373786.900279] RDX: 00000000000002b3 RSI: 000000c42e27d800 RDI: 000000000000014b
> [373786.914247] RBP: 000000c4387e5840 R08: 0000000000000000 R09: 0000000000000000
> [373786.928229] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> [373786.942195] R13: 00000000000000f2 R14: 0000000000000032 R15: 0000000000000002
> [373786.956171] Code: c0 74 1a 48 8b 05 7f 44 ec 00 be fd 00 00 00 48 8b 80 a0 00 00 00 e8 ae 1a 9b
> 00 5d c3 89 fe 48 c7 c7 b8 26 e5 81 e8 21 45 09 00 <0f> 0b 5d c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f
> 44 00 00 55 48
> [373786.989819] ---[ end trace d3faf76bdc3ca405 ]---
> [373787.001313] sched: Unexpected reschedule of offline CPU#36!
> [373787.013940] ------------[ cut here ]------------
> [373787.025482] WARNING: CPU: 24 PID: 24261 at arch/x86/kernel/smp.c:128
> native_smp_send_reschedule+0x42/0x50
> [373787.042884] Modules linked in: binfmt_misc sctp_diag sctp dccp_diag dccp tcp_diag udp_diag
> inet_diag unix_diag cfg80211 rfkill dell_rbu 8021q garp mrp xfs libcrc32c loop x86_pkg_temp_thermal
> intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel
> pcbc aesni_intel vfat fat crypto_simd glue_helper cryptd intel_cstate intel_rapl_perf iTCO_wdt ses
> iTCO_vendor_support mxm_wmi ipmi_si dcdbas enclosure mei_me pcspkr ipmi_devintf lpc_ich sg mei
> ipmi_msghandler mfd_core shpchp wmi acpi_power_meter netconsole nfsd auth_rpcgss nfs_acl lockd grace
> sunrpc ip_tables ext4 mbcache jbd2 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt
> fb_sys_fops sd_mod ttm crc32c_intel ahci libahci mlx5_core drm mlxfw mpt3sas ptp libata raid_class
> pps_core scsi_transport_sas
> [373787.175654]  dm_mirror dm_region_hash dm_log dm_mod dax
> [373787.189862] CPU: 24 PID: 24261 Comm: kube-apiserver Tainted: G        W    L
> 4.14.32-1.el7.x86_64 #1
> [373787.208727] Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS 2.4.3 01/17/2017
> [373787.225686] task: ffff882f66d28000 task.stack: ffffc9002120c000
> [373787.240916] RIP: 0010:native_smp_send_reschedule+0x42/0x50
> [373787.255668] RSP: 0018:ffff882fbf9039e0 EFLAGS: 00010046
> [373787.270138] RAX: 000000000000002f RBX: 0000000000000024 RCX: 0000000000000006
> [373787.286911] RDX: 0000000000000000 RSI: 0000000000000092 RDI: ffff882fbf9169d0
> [373787.303602] RBP: ffff882fbf9039e0 R08: 0000000000000001 R09: 0000000000000793
> [373787.320314] R10: 0000000000000001 R11: 0000000000000000 R12: ffff882fbfaa2ac0
> [373787.337037] R13: ffff882fb78bdd00 R14: ffff882fbf903a98 R15: ffff882fbfaa2ac0
> [373787.353793] FS:  000000c42009f090(0000) GS:ffff882fbf900000(0000) knlGS:0000000000000000
> [373787.371708] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [373787.387114] CR2: 00007f82b6539244 CR3: 0000002f3de2a005 CR4: 00000000003606e0
> [373787.404143] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [373787.421146] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [373787.438016] Call Trace:
> [373787.449503]  <IRQ>
> [373787.460353]  resched_curr+0xae/0xd0
> [373787.472913]  check_preempt_curr+0x79/0xa0
> [373787.486064]  ttwu_do_wakeup+0x1e/0x160
> [373787.499014]  ttwu_do_activate+0x7a/0x90
> [373787.511930]  try_to_wake_up+0x1e7/0x480
> [373787.524803]  ? check_preempt_curr+0x79/0xa0
> [373787.538097]  default_wake_function+0x12/0x20
> [373787.551463]  __wake_up_common+0x8f/0x160
> [373787.564411]  __wake_up_locked+0x16/0x20
> [373787.577191]  complete+0x42/0x60
> [373787.589104]  mlx5_cmd_comp_handler+0x28f/0x4b0 [mlx5_core]
> [373787.603704]  mlx5_eq_int+0x1ae/0x550 [mlx5_core]
> [373787.617258]  ? __wake_up_common+0x8f/0x160
> [373787.630170]  __handle_irq_event_percpu+0x42/0x1a0
> [373787.643819]  handle_irq_event_percpu+0x32/0x80
> [373787.657224]  handle_irq_event+0x3b/0x60
> [373787.670045]  handle_edge_irq+0x95/0x1a0
> [373787.682656]  handle_irq+0xb5/0x140
> [373787.694520]  ? irq_work_run+0x2c/0x30
> [373787.706546]  ? flush_smp_call_function_queue+0x88/0x110
> [373787.720372]  do_IRQ+0x48/0xe0
> [373787.731599]  common_interrupt+0x8e/0x8e
> [373787.743630] RIP: 0010:panic+0x206/0x258
> [373787.755405] RSP: 0018:ffff882fbf903e80 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff7e
> [373787.771355] RAX: 000000000000003b RBX: 0000000000000000 RCX: 0000000000000006
> [373787.786634] RDX: 0000000000000000 RSI: 0000000000000092 RDI: ffff882fbf9169d0
> [373787.801646] RBP: ffff882fbf903ef0 R08: 0000000000000001 R09: 00000000000006b1
> [373787.816462] R10: 0000000000000001 R11: 0000000000000002 R12: ffffffff81e6be9f
> [373787.831010] R13: 0000000000000000 R14: 0000000000000000 R15: 00000000ee6b2800
> [373787.845323]  watchdog_timer_fn+0x21a/0x230
> [373787.856160]  ? watchdog+0x30/0x30
> [373787.866021]  __hrtimer_run_queues+0xe7/0x230
> [373787.876785]  hrtimer_interrupt+0xa8/0x1a0
> [373787.887167]  smp_apic_timer_interrupt+0x6b/0x140
> [373787.898177]  apic_timer_interrupt+0x8e/0xa0
> [373787.908668]  </IRQ>
> [373787.916761] RIP: 0010:fsnotify+0x197/0x510
> [373787.927091] RSP: 0018:ffffc9002120fdb8 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff10
> [373787.941434] RAX: 0000000000000000 RBX: ffff882f9ec65c20 RCX: 0000000000000002
> [373787.955328] RDX: 0000000000028700 RSI: 0000000000000002 RDI: ffffffff8269a4e0
> [373787.969286] RBP: ffffc9002120fe98 R08: 0000000000000000 R09: 0000000000000000
> [373787.983117] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> [373787.996820] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [373788.010389]  ? fsnotify+0x4bb/0x510
> [373788.019908]  vfs_write+0x151/0x1b0
> [373788.029296]  ? syscall_trace_enter+0x1cd/0x2b0
> [373788.039801]  SyS_write+0x55/0xc0
> [373788.048985]  do_syscall_64+0x79/0x1b0
> [373788.058645]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> [373788.069978] RIP: 0033:0x483084
> [373788.079028] RSP: 002b:000000c4387e57f0 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> [373788.093401] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000483084
> [373788.107361] RDX: 00000000000002b3 RSI: 000000c42e27d800 RDI: 000000000000014b
> [373788.121337] RBP: 000000c4387e5840 R08: 0000000000000000 R09: 0000000000000000
> [373788.135346] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> [373788.149304] R13: 00000000000000f2 R14: 0000000000000032 R15: 0000000000000002
> [373788.163236] Code: c0 74 1a 48 8b 05 7f 44 ec 00 be fd 00 00 00 48 8b 80 a0 00 00 00 e8 ae 1a 9b
> 00 5d c3 89 fe 48 c7 c7 b8 26 e5 81 e8 21 45 09 00 <0f> 0b 5d c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f
> 44 00 00 55 48
> [373788.196867] ---[ end trace d3faf76bdc3ca406 ]---
> 
> ------[ sar -f ./sa15 -s 20:16:00 -P 24 ]-----------
> Linux 4.14.32-1.el7.x86_64 (foobar)        04/15/2018      _x86_64_        (56 CPU)
> 
> 08:16:00 PM     CPU     %user     %nice   %system   %iowait    %steal     %idle
> 08:16:01 PM      24      0.00      0.00      0.00      0.00      0.00    100.00
> 08:16:02 PM      24      0.00      0.00      0.00      0.00      0.00    100.00
> 08:16:03 PM      24      0.99      0.00      0.00      0.00      0.00     99.01
> 08:16:04 PM      24      0.00      0.00      0.00      0.00      0.00    100.00
> 08:16:05 PM      24      1.00      0.00      0.00      0.00      0.00     99.00
> 08:16:06 PM      24      3.00      0.00      0.00      0.00      0.00     97.00
> 08:16:07 PM      24      2.00      0.00      0.00      0.00      0.00     98.00
> 08:16:08 PM      24      1.00      0.00      1.00      0.00      0.00     98.00
> 08:16:09 PM      24      0.99      0.00      0.00      0.00      0.00     99.01
> 08:16:10 PM      24      0.00      0.00      0.00      0.00      0.00    100.00
> 08:16:11 PM      24      1.00      0.00      0.00      0.00      0.00     99.00
> 08:16:12 PM      24      0.00      0.00      0.00      0.00      0.00    100.00
> 08:16:13 PM      24      1.01      0.00      0.00      0.00      0.00     98.99
> 08:16:14 PM      24      0.00      0.00      0.00      0.00      0.00    100.00
> 08:16:15 PM      24      0.00      0.00      0.00      0.00      0.00    100.00
> 08:16:16 PM      24      0.00      0.00      0.00      0.00      0.00    100.00
> 08:16:17 PM      24      0.00      0.00      0.00      0.00      0.00    100.00
> 08:16:18 PM      24      0.00      0.00      0.99      0.00      0.00     99.01
> 08:16:19 PM      24      0.00      0.00      0.00      0.00      0.00    100.00
> 08:16:20 PM      24      0.00      0.00      0.00      0.00      0.00    100.00
> 08:16:21 PM      24      1.00      0.00      0.00      0.00      0.00     99.00
> 08:16:22 PM      24      0.00      0.00      0.00      0.00      0.00    100.00
> 08:16:23 PM      24      1.00      0.00     17.00      0.00      0.00     82.00
> 08:16:24 PM      24      0.00      0.00    100.00      0.00      0.00      0.00
> 08:16:25 PM      24      0.00      0.00    100.00      0.00      0.00      0.00
> 08:16:26 PM      24      0.00      0.00    100.00      0.00      0.00      0.00
> 08:16:27 PM      24      0.00      0.00    100.00      0.00      0.00      0.00
> 08:16:28 PM      24      0.00      0.00    100.00      0.00      0.00      0.00
> 08:16:29 PM      24      0.00      0.00    100.00      0.00      0.00      0.00
> 08:16:30 PM      24      0.00      0.00    100.00      0.00      0.00      0.00
> 08:16:31 PM      24      0.00      0.00    100.00      0.00      0.00      0.00
> 08:16:32 PM      24      0.00      0.00    100.00      0.00      0.00      0.00
> 08:16:33 PM      24      0.00      0.00    100.00      0.00      0.00      0.00
> 08:16:34 PM      24      0.00      0.00    100.00      0.00      0.00      0.00
> 08:16:35 PM      24      0.00      0.00    100.00      0.00      0.00      0.00
> 08:16:36 PM      24      0.00      0.00    100.00      0.00      0.00      0.00
> 08:16:37 PM      24      0.00      0.00    100.00      0.00      0.00      0.00
> 08:16:38 PM      24      0.00      0.00    100.00      0.00      0.00      0.00
> 08:16:39 PM      24      0.00      0.00    100.00      0.00      0.00      0.00
> 08:16:40 PM      24      0.00      0.00    100.00      0.00      0.00      0.00
> 08:16:41 PM      24      0.00      0.00    100.00      0.00      0.00      0.00
> 08:16:42 PM      24      0.00      0.00    100.00      0.00      0.00      0.00
> ------[ sar -f ./sa15 -s 20:16:00 -P 24 ]-----------
> 
> 
> 
> 
> The following panic is from a different server and we see the same symptom, kernel panics
> due to a soft lockup and CPU#21 has 100% utilization for system level. In this panic we see
> a timeout from the network driver for queuing packets, I believe this is the symptom and not
> the cause, as a server with mellox driver had a similar soft lockup.
> 
> 
> 
> 391838.033960] NETDEV WATCHDOG: eth0 (bnx2x): transmit queue 2 timed out
> [391838.065545] ------------[ cut here ]------------
> [391838.088431] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:320 dev_watchdog+0x22b/0x230
> [391838.128800] Modules linked in: binfmt_misc sctp_diag sctp dccp_diag dccp tcp_diag udp_diag
> inet_diag unix_diag cfg80211 rfkill 8021q garp mrp xfs loop vfat fat x86_pkg_temp_thermal
> intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel
> pcbc aesni_intel crypto_simd glue_helper cryptd intel_cstate iTCO_wdt iTCO_vendor_support
> intel_rapl_perf sg hpilo hpwdt ipmi_si pcspkr lpc_ich ioatdma ipmi_devintf dca mfd_core i2c_i801
> shpchp wmi ipmi_msghandler nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2
> i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm sd_mod bnx2x mdio drm
> libcrc32c crc32c_intel hpsa ptp scsi_transport_sas pps_core dm_mirror dm_region_hash dm_log dm_mod dax
> [391838.456941] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.32-1.el7.x86_64 #1
> [391838.491589] Hardware name: HP ProLiant BL460c Gen9, BIOS I36 10/25/2017
> [391838.524202] task: ffffffff82012480 task.stack: ffffffff82000000
> [391838.553322] RIP: 0010:dev_watchdog+0x22b/0x230
> [391838.575252] RSP: 0018:ffff88103fa03e60 EFLAGS: 00010246
> [391838.601054] RAX: 0000000000000039 RBX: 0000000000000002 RCX: 0000000000000000
> [391838.636022] RDX: 0000000000000000 RSI: ffff88103fa169d8 RDI: ffff88103fa169d8
> [391838.671651] RBP: ffff88103fa03e90 R08: 0000000000000000 R09: 00000000000004df
> [391838.707021] R10: 0000000000000001 R11: 0000000000aaaaaa R12: ffff881036674000
> [391838.758515] R13: 000000000000005b R14: ffff88103667f100 R15: 0000000000000000
> [391838.810815] FS:  0000000000000000(0000) GS:ffff88103fa00000(0000) knlGS:0000000000000000
> [391838.867323] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [391838.912602] CR2: 00007f912eb7fff0 CR3: 000000000200a006 CR4: 00000000003606f0
> [391838.964401] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [391839.016170] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [391839.067361] Call Trace:
> [391839.096085]  <IRQ>
> [391839.122674]  ? dev_deactivate_queue.constprop.30+0x60/0x60
> [391839.166424]  call_timer_fn+0x37/0x140
> [391839.201029]  run_timer_softirq+0x1eb/0x450
> [391839.238196]  ? timerqueue_add+0x59/0x90
> [391839.273260]  ? ktime_get+0x3e/0xa0
> [391839.306253]  __do_softirq+0xd2/0x27c
> [391839.340016]  irq_exit+0xd9/0xf0
> [391839.371464]  smp_apic_timer_interrupt+0x75/0x140
> [391839.410012]  apic_timer_interrupt+0x8e/0xa0
> [391839.446764]  </IRQ>
> [391839.472682] RIP: 0010:cpuidle_enter_state+0xdd/0x2b0
> [391839.512914] RSP: 0018:ffffffff82003e00 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
> [391839.565090] RAX: ffff88103fa22ac0 RBX: ffffe8f000200000 RCX: 000000000000001f
> [391839.615998] RDX: 0000000000000000 RSI: fff936788221f82b RDI: 0000000000000000
> [391839.666639] RBP: ffffffff82003e38 R08: 000000000000034d R09: 00000000ffffffff
> [391839.717691] R10: 000000000000037a R11: 0000000000000008 R12: 0000000000000004
> [391839.768401] R13: 0000000000000000 R14: ffffffff8216d980 R15: 0001645fe6c35649
> [391839.819280]  cpuidle_enter+0x17/0x20
> [391839.852911]  call_cpuidle+0x23/0x40
> [391839.885828]  do_idle+0x172/0x1e0
> [391839.916662]  cpu_startup_entry+0x73/0x80
> [391839.950559]  rest_init+0xaa/0xb0
> [391839.981142]  start_kernel+0x4b7/0x4d8
> [391840.013407]  ? set_init_arg+0x5a/0x5a
> [391840.045237]  x86_64_start_reservations+0x2a/0x2c
> [391840.081722]  x86_64_start_kernel+0x72/0x75
> [391840.114722]  secondary_startup_64+0xa5/0xb0
> [391840.149320] Code: 60 04 00 00 eb 89 4c 89 e7 c6 05 77 bb b2 00 01 e8 6b 38 fd ff 89 d9 48 89 c2
> 4c 89 e6 48 c7 c7 98 6a ef 81 31 c0 e8 b8 52 a2 ff <0f> 0b eb b9 90 0f 1f 44 00 00 55 48 89 e5 41 57
> 49 89 d7 41 56
> [391840.265586] ---[ end trace c661065d595325a9 ]---
> [391842.302965] bnx2x: [bnx2x_clean_tx_queue:1205(eth0)]timeout waiting for queue[2]:
> txdata->tx_pkt_prod(11525) != txdata->tx_pkt_cons(11500)
> [391844.388943] bnx2x: [bnx2x_clean_tx_queue:1205(eth0)]timeout waiting for queue[2]:
> txdata->tx_pkt_prod(11525) != txdata->tx_pkt_cons(11500)
> [391850.094964] watchdog: BUG: soft lockup - CPU#21 stuck for 22s! [kube-apiserver:60495]
> [391850.146079] Modules linked in: binfmt_misc sctp_diag sctp dccp_diag dccp tcp_diag udp_diag
> inet_diag unix_diag cfg80211 rfkill 8021q garp mrp xfs loop vfat fat x86_pkg_temp_thermal
> intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel
> pcbc aesni_intel crypto_simd glue_helper cryptd intel_cstate iTCO_wdt iTCO_vendor_support
> intel_rapl_perf sg hpilo hpwdt ipmi_si pcspkr lpc_ich ioatdma ipmi_devintf dca mfd_core i2c_i801
> shpchp wmi ipmi_msghandler nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2
> i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm sd_mod bnx2x mdio drm
> libcrc32c crc32c_intel hpsa ptp scsi_transport_sas pps_core dm_mirror dm_region_hash dm_log dm_mod dax
> [391850.573524] CPU: 21 PID: 60495 Comm: kube-apiserver Tainted: G        W
> 4.14.32-1.el7.x86_64 #1
> [391850.634311] Hardware name: HP ProLiant BL460c Gen9, BIOS I36 10/25/2017
> [391850.682799] task: ffff881022172e80 task.stack: ffffc9000b874000
> [391850.727891] RIP: 0010:fsnotify+0x218/0x510
> [391850.763842] RSP: 0018:ffffc9000b877db8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
> [391850.820076] RAX: ffff882001c77a98 RBX: ffff882001c77a70 RCX: 0000000000000002
> [391850.873470] RDX: 0000000000028400 RSI: 0000000000000002 RDI: ffffffff8269a4e0
> [391850.925414] RBP: ffffc9000b877e98 R08: 0000000000000000 R09: 0000000000000000
> [391850.976777] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> [391851.028138] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [391851.079135] FS:  000000c42be02090(0000) GS:ffff88103fd40000(0000) knlGS:0000000000000000
> [391851.135142] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [391851.180107] CR2: 00007f5c3c0690c0 CR3: 0000000fc47c4004 CR4: 00000000003606e0
> [391851.231704] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [391851.283258] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [391851.335898] Call Trace:
> [391851.367161]  vfs_write+0x151/0x1b0
> [391851.401673]  ? syscall_trace_enter+0x1cd/0x2b0
> [391851.440900]  SyS_write+0x55/0xc0
> [391851.474214]  do_syscall_64+0x79/0x1b0
> [391851.510034]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> [391851.551320] RIP: 0033:0x483084
> [391851.583001] RSP: 002b:000000c43197d7f0 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> [391851.636289] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000483084
> [391851.688719] RDX: 00000000000002a9 RSI: 000000c424283c00 RDI: 0000000000000040
> [391851.740825] RBP: 000000c43197d840 R08: 0000000000000000 R09: 0000000000000000
> [391851.792257] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> [391851.843292] R13: 00000000000000f2 R14: 0000000000000032 R15: 0000000000000002
> [391851.896703] Code: 0f 85 08 02 00 00 48 85 db 41 0f 94 c4 4d 85 ed 0f 94 c1 84 c9 0f 85 ef 02 00
> 00 8b 4d 90 85 c9 74 26 48 85 db 74 0d f6 43 44 01 <75> 07 c7 43 40 00 00 00 00 4d 85 ed 74 0f 41 f6
> 45 44 01 75 08
> [391852.022198] Kernel panic - not syncing: softlockup: hung tasks
> [391852.068204] CPU: 21 PID: 60495 Comm: kube-apiserver Tainted: G        W    L
> 4.14.32-1.el7.x86_64 #1
> [391852.130544] Hardware name: HP ProLiant BL460c Gen9, BIOS I36 10/25/2017
> [391852.180598] Call Trace:
> [391852.210411]  <IRQ>
> [391852.237477]  dump_stack+0x63/0x88
> [391852.270360]  panic+0xe8/0x258
> [391852.301307]  watchdog_timer_fn+0x21a/0x230
> [391852.337395]  ? watchdog+0x30/0x30
> [391852.368943]  __hrtimer_run_queues+0xe7/0x230
> [391852.405003]  hrtimer_interrupt+0xa8/0x1a0
> [391852.439190]  smp_apic_timer_interrupt+0x6b/0x140
> [391852.476151]  apic_timer_interrupt+0x8e/0xa0
> [391852.511089]  </IRQ>
> [391852.535014] RIP: 0010:fsnotify+0x218/0x510
> [391852.568048] RSP: 0018:ffffc9000b877db8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
> [391852.617533] RAX: ffff882001c77a98 RBX: ffff882001c77a70 RCX: 0000000000000002
> [391852.664520] RDX: 0000000000028400 RSI: 0000000000000002 RDI: ffffffff8269a4e0
> [391852.711835] RBP: ffffc9000b877e98 R08: 0000000000000000 R09: 0000000000000000
> [391852.758813] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> [391852.805527] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [391852.851877]  ? fsnotify+0x4bb/0x510
> [391852.880665]  vfs_write+0x151/0x1b0
> [391852.909135]  ? syscall_trace_enter+0x1cd/0x2b0
> [391852.942798]  SyS_write+0x55/0xc0
> [391852.969978]  do_syscall_64+0x79/0x1b0
> [391852.999194]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> [391853.035095] RIP: 0033:0x483084
> [391853.061289] RSP: 002b:000000c43197d7f0 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> [391853.109641] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000483084
> [391853.155956] RDX: 00000000000002a9 RSI: 000000c424283c00 RDI: 0000000000000040
> [391853.202552] RBP: 000000c43197d840 R08: 0000000000000000 R09: 0000000000000000
> [391853.248842] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> [391853.295051] R13: 00000000000000f2 R14: 0000000000000032 R15: 0000000000000002
> [391853.341016] Kernel Offset: disabled
> [391853.375061] ---[ end Kernel panic - not syncing: softlockup: hung tasks
> [391853.419102] sched: Unexpected reschedule of offline CPU#0!
> [391853.457084] ------------[ cut here ]------------
> [391853.491472] WARNING: CPU: 21 PID: 60495 at arch/x86/kernel/smp.c:128
> native_smp_send_reschedule+0x42/0x50
> [391853.549474] Modules linked in: binfmt_misc sctp_diag sctp dccp_diag dccp tcp_diag udp_diag
> inet_diag unix_diag cfg80211 rfkill 8021q garp mrp xfs loop vfat fat x86_pkg_temp_thermal
> intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel
> pcbc aesni_intel crypto_simd glue_helper cryptd intel_cstate iTCO_wdt iTCO_vendor_support
> intel_rapl_perf sg hpilo hpwdt ipmi_si pcspkr lpc_ich ioatdma ipmi_devintf dca mfd_core i2c_i801
> shpchp wmi ipmi_msghandler nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2
> i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm sd_mod bnx2x mdio drm
> libcrc32c crc32c_intel hpsa ptp scsi_transport_sas pps_core dm_mirror dm_region_hash dm_log dm_mod dax
> [391853.967080] CPU: 21 PID: 60495 Comm: kube-apiserver Tainted: G        W    L
> 4.14.32-1.el7.x86_64 #1
> [391854.026457] Hardware name: HP ProLiant BL460c Gen9, BIOS I36 10/25/2017
> [391854.073417] task: ffff881022172e80 task.stack: ffffc9000b874000
> [391854.116927] RIP: 0010:native_smp_send_reschedule+0x42/0x50
> [391854.158063] RSP: 0018:ffff88103fd43b10 EFLAGS: 00010046
> [391854.197408] RAX: 000000000000002e RBX: 0000000000000000 RCX: 0000000000000000
> [391854.246409] RDX: 0000000000000000 RSI: ffff88103fd569d8 RDI: ffff88103fd569d8
> [391854.295777] RBP: ffff88103fd43b10 R08: 0000000000000000 R09: 0000000000000556
> [391854.345373] R10: 0000000000000001 R11: 0000000000aaaaaa R12: ffff88103fa22ac0
> [391854.395334] R13: ffff880f8be48000 R14: ffff88103fd43bc8 R15: ffff88103fa22ac0
> [391854.444983] FS:  000000c42be02090(0000) GS:ffff88103fd40000(0000) knlGS:0000000000000000
> [391854.498575] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [391854.541675] CR2: 00007f5c3c0690c0 CR3: 0000000fc47c4004 CR4: 00000000003606e0
> [391854.591999] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [391854.642263] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [391854.692678] Call Trace:
> [391854.719793]  <IRQ>
> [391854.744771]  resched_curr+0xae/0xd0
> [391854.776585]  check_preempt_curr+0x79/0xa0
> [391854.811170]  ttwu_do_wakeup+0x1e/0x160
> [391854.844514]  ttwu_do_activate+0x7a/0x90
> [391854.877774]  try_to_wake_up+0x1e7/0x480
> [391854.910892]  default_wake_function+0x12/0x20
> [391854.946665]  autoremove_wake_function+0x16/0x60
> [391854.984069]  __wake_up_common+0x8f/0x160
> [391855.018321]  __wake_up_common_lock+0x7e/0xc0
> [391855.053398]  __wake_up+0x13/0x20
> [391855.083708]  wake_up_klogd_work_func+0x40/0x60
> [391855.119905]  irq_work_run_list+0x53/0x80
> [391855.153377]  irq_work_run+0x2c/0x30
> [391855.184508]  flush_smp_call_function_queue+0x88/0x110
> [391855.223509]  generic_smp_call_function_single_interrupt+0x13/0x30
> [391855.267592]  smp_call_function_single_interrupt+0x3a/0xe0
> [391855.308323]  call_function_single_interrupt+0x8e/0xa0
> [391855.347202] RIP: 0010:panic+0x206/0x258
> [391855.380345] RSP: 0018:ffff88103fd43e80 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff04
> [391855.431894] RAX: 000000000000003b RBX: 0000000000000000 RCX: 0000000000000006
> [391855.481301] RDX: 0000000000000000 RSI: 0000000000000092 RDI: ffff88103fd569d0
> [391855.530810] RBP: ffff88103fd43ef0 R08: 0000000000000000 R09: 0000000000000555
> [391855.579985] R10: 0000000000000001 R11: 0000000000aaaaaa R12: ffffffff81e6be9f
> [391855.629525] R13: 0000000000000000 R14: 0000000000000000 R15: 00000000ee6b2800
> [391855.677925]  watchdog_timer_fn+0x21a/0x230
> [391855.711211]  ? watchdog+0x30/0x30
> [391855.740236]  __hrtimer_run_queues+0xe7/0x230
> [391855.773231]  hrtimer_interrupt+0xa8/0x1a0
> [391855.804713]  smp_apic_timer_interrupt+0x6b/0x140
> [391855.838740]  apic_timer_interrupt+0x8e/0xa0
> [391855.870671]  </IRQ>
> [391855.892208] RIP: 0010:fsnotify+0x218/0x510
> [391855.922974] RSP: 0018:ffffc9000b877db8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
> [391855.970885] RAX: ffff882001c77a98 RBX: ffff882001c77a70 RCX: 0000000000000002
> [391856.016803] RDX: 0000000000028400 RSI: 0000000000000002 RDI: ffffffff8269a4e0
> [391856.062423] RBP: ffffc9000b877e98 R08: 0000000000000000 R09: 0000000000000000
> [391856.108153] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> [391856.153683] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [391856.200197]  ? fsnotify+0x4bb/0x510
> [391856.228102]  vfs_write+0x151/0x1b0
> [391856.256421]  ? syscall_trace_enter+0x1cd/0x2b0
> [391856.288496]  SyS_write+0x55/0xc0
> [391856.314643]  do_syscall_64+0x79/0x1b0
> [391856.342704]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> [391856.377545] RIP: 0033:0x483084
> [391856.402822] RSP: 002b:000000c43197d7f0 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> [391856.449735] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000483084
> [391856.494804] RDX: 00000000000002a9 RSI: 000000c424283c00 RDI: 0000000000000040
> [391856.540308] RBP: 000000c43197d840 R08: 0000000000000000 R09: 0000000000000000
> [391856.585743] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> [391856.630940] R13: 00000000000000f2 R14: 0000000000000032 R15: 0000000000000002
> [391856.676366] Code: c0 74 1a 48 8b 05 7f 44 ec 00 be fd 00 00 00 48 8b 80 a0 00 00 00 e8 ae 1a 9b
> 00 5d c3 89 fe 48 c7 c7 b8 26 e5 81 e8 21 45 09 00 <0f> 0b 5d c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f
> 44 00 00 55 48
> [391856.792915] ---[ end trace c661065d595325aa ]---
> [391856.826793] ------------[ cut here ]------------
> [391856.860523] WARNING: CPU: 21 PID: 60495 at kernel/sched/core.c:1179 set_task_cpu+0x197/0x1a0
> [391856.913620] Modules linked in: binfmt_misc sctp_diag sctp dccp_diag dccp tcp_diag udp_diag
> inet_diag unix_diag cfg80211 rfkill 8021q garp mrp xfs loop vfat fat x86_pkg_temp_thermal
> intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel
> pcbc aesni_intel crypto_simd glue_helper cryptd intel_cstate iTCO_wdt iTCO_vendor_support
> intel_rapl_perf sg hpilo hpwdt ipmi_si pcspkr lpc_ich ioatdma ipmi_devintf dca mfd_core i2c_i801
> shpchp wmi ipmi_msghandler nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2
> i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm sd_mod bnx2x mdio drm
> libcrc32c crc32c_intel hpsa ptp scsi_transport_sas pps_core dm_mirror dm_region_hash dm_log dm_mod dax
> [391857.333766] CPU: 21 PID: 60495 Comm: kube-apiserver Tainted: G        W    L
> 4.14.32-1.el7.x86_64 #1
> [391857.393681] Hardware name: HP ProLiant BL460c Gen9, BIOS I36 10/25/2017
> [391857.440546] task: ffff881022172e80 task.stack: ffffc9000b874000
> [391857.484076] RIP: 0010:set_task_cpu+0x197/0x1a0
> [391857.520542] RSP: 0018:ffff88103fd43ae8 EFLAGS: 00010046
> [391857.560948] RAX: 0000000000000200 RBX: ffff881038cb45c0 RCX: 0000000000000001
> [391857.610782] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffff881038cb45c0
> [391857.660456] RBP: ffff88103fd43b08 R08: 0000000000000008 R09: 0000000000000000
> [391857.710401] R10: 0000000000000001 R11: 0000000000aaaaaa R12: ffff881038cb516c
> [391857.760003] R13: 0000000000000008 R14: 0000000000000008 R15: 0000000000022ac0
> [391857.809282] FS:  000000c42be02090(0000) GS:ffff88103fd40000(0000) knlGS:0000000000000000
> [391857.863581] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [391857.906806] CR2: 00007f5c3c0690c0 CR3: 0000000fc47c4004 CR4: 00000000003606e0
> [391857.956620] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [391858.007011] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [391858.057596] Call Trace:
> [391858.085525]  <IRQ>
> [391858.110876]  try_to_wake_up+0x16c/0x480
> [391858.145085]  ? resched_curr+0xae/0xd0
> [391858.178173]  default_wake_function+0x12/0x20
> [391858.214468]  __wake_up_common+0x8f/0x160
> [391858.248941]  __wake_up_locked+0x16/0x20
> [391858.283175]  ep_poll_callback+0xd0/0x300
> [391858.316965]  __wake_up_common+0x8f/0x160
> [391858.351271]  __wake_up_common_lock+0x7e/0xc0
> [391858.387289]  __wake_up+0x13/0x20
> [391858.417695]  wake_up_klogd_work_func+0x40/0x60
> [391858.454575]  irq_work_run_list+0x53/0x80
> [391858.488737]  irq_work_run+0x2c/0x30
> [391858.520329]  flush_smp_call_function_queue+0x88/0x110
> [391858.559946]  generic_smp_call_function_single_interrupt+0x13/0x30
> [391858.603988]  smp_call_function_single_interrupt+0x3a/0xe0
> [391858.645713]  call_function_single_interrupt+0x8e/0xa0
> [391858.685706] RIP: 0010:panic+0x206/0x258
> [391858.720431] RSP: 0018:ffff88103fd43e80 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff04
> [391858.772695] RAX: 000000000000003b RBX: 0000000000000000 RCX: 0000000000000006
> [391858.822759] RDX: 0000000000000000 RSI: 0000000000000092 RDI: ffff88103fd569d0
> [391858.872167] RBP: ffff88103fd43ef0 R08: 0000000000000000 R09: 0000000000000555
> [391858.921420] R10: 0000000000000001 R11: 0000000000aaaaaa R12: ffffffff81e6be9f
> [391858.971071] R13: 0000000000000000 R14: 0000000000000000 R15: 00000000ee6b2800
> [391859.020677]  watchdog_timer_fn+0x21a/0x230
> [391859.054291]  ? watchdog+0x30/0x30
> [391859.083991]  __hrtimer_run_queues+0xe7/0x230
> [391859.118087]  hrtimer_interrupt+0xa8/0x1a0
> [391859.150361]  smp_apic_timer_interrupt+0x6b/0x140
> [391859.185167]  apic_timer_interrupt+0x8e/0xa0
> [391859.217429]  </IRQ>
> [391859.239165] RIP: 0010:fsnotify+0x218/0x510
> [391859.269961] RSP: 0018:ffffc9000b877db8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
> [391859.317370] RAX: ffff882001c77a98 RBX: ffff882001c77a70 RCX: 0000000000000002
> [391859.363263] RDX: 0000000000028400 RSI: 0000000000000002 RDI: ffffffff8269a4e0
> [391859.409279] RBP: ffffc9000b877e98 R08: 0000000000000000 R09: 0000000000000000
> [391859.455080] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> [391859.500518] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [391859.546063]  ? fsnotify+0x4bb/0x510
> [391859.574081]  vfs_write+0x151/0x1b0
> [391859.601468]  ? syscall_trace_enter+0x1cd/0x2b0
> [391859.634055]  SyS_write+0x55/0xc0
> [391859.660517]  do_syscall_64+0x79/0x1b0
> [391859.688919]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> [391859.723536] RIP: 0033:0x483084
> [391859.748891] RSP: 002b:000000c43197d7f0 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> [391859.796455] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000483084
> [391859.841781] RDX: 00000000000002a9 RSI: 000000c424283c00 RDI: 0000000000000040
> [391859.887303] RBP: 000000c43197d840 R08: 0000000000000000 R09: 0000000000000000
> [391859.932494] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> [391859.977838] R13: 00000000000000f2 R14: 0000000000000032 R15: 0000000000000002
> [391860.023361] Code: ff 80 8b ac 08 00 00 04 e9 20 ff ff ff 0f 0b e9 b9 fe ff ff f7 83 84 00 00 00
> fd ff ff ff 0f 84 c3 fe ff ff 0f 0b e9 bc fe ff ff <0f> 0b e9 cb fe ff ff 66 90 0f 1f 44 00 00 55 48
> 89 e5 41 56 49
> [391860.138078] ---[ end trace c661065d595325ab ]---
> [391860.172166] sched: Unexpected reschedule of offline CPU#8!
> [391860.210690] ------------[ cut here ]------------
> [391860.244671] WARNING: CPU: 21 PID: 60495 at arch/x86/kernel/smp.c:128
> native_smp_send_reschedule+0x42/0x50
> [391860.303820] Modules linked in: binfmt_misc sctp_diag sctp dccp_diag dccp tcp_diag udp_diag
> inet_diag unix_diag cfg80211 rfkill 8021q garp mrp xfs loop vfat fat x86_pkg_temp_thermal
> intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel
> pcbc aesni_intel crypto_simd glue_helper cryptd intel_cstate iTCO_wdt iTCO_vendor_support
> intel_rapl_perf sg hpilo hpwdt ipmi_si pcspkr lpc_ich ioatdma ipmi_devintf dca mfd_core i2c_i801
> shpchp wmi ipmi_msghandler nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2
> i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm sd_mod bnx2x mdio drm
> libcrc32c crc32c_intel hpsa ptp scsi_transport_sas pps_core dm_mirror dm_region_hash dm_log dm_mod dax
> [391860.726277] CPU: 21 PID: 60495 Comm: kube-apiserver Tainted: G        W    L
> 4.14.32-1.el7.x86_64 #1
> [391860.786402] Hardware name: HP ProLiant BL460c Gen9, BIOS I36 10/25/2017
> [391860.834206] task: ffff881022172e80 task.stack: ffffc9000b874000
> [391860.878669] RIP: 0010:native_smp_send_reschedule+0x42/0x50
> [391860.920832] RSP: 0018:ffff88103fd43b08 EFLAGS: 00010046
> [391860.961851] RAX: 000000000000002e RBX: ffff881038cb45c0 RCX: 0000000000000006
> [391861.012094] RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff88103fd569d0
> [391861.062447] RBP: ffff88103fd43b08 R08: 0000000000000000 R09: 00000000000005e8
> [391861.112691] R10: 0000000000000001 R11: 0000000000aaaaaa R12: ffff881038cb516c
> [391861.163322] R13: 0000000000000004 R14: 0000000000000046 R15: 0000000000022ac0
> [391861.213440] FS:  000000c42be02090(0000) GS:ffff88103fd40000(0000) knlGS:0000000000000000
> [391861.268665] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [391861.311928] CR2: 00007f5c3c0690c0 CR3: 0000000fc47c4004 CR4: 00000000003606e0
> [391861.362717] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [391861.414065] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [391861.464505] Call Trace:
> [391861.492319]  <IRQ>
> [391861.517992]  try_to_wake_up+0x405/0x480
> [391861.551956]  default_wake_function+0x12/0x20
> [391861.588252]  __wake_up_common+0x8f/0x160
> [391861.622982]  __wake_up_locked+0x16/0x20
> [391861.657272]  ep_poll_callback+0xd0/0x300
> [391861.691535]  __wake_up_common+0x8f/0x160
> [391861.726097]  __wake_up_common_lock+0x7e/0xc0
> [391861.762240]  __wake_up+0x13/0x20
> [391861.793096]  wake_up_klogd_work_func+0x40/0x60
> [391861.830133]  irq_work_run_list+0x53/0x80
> [391861.864538]  irq_work_run+0x2c/0x30
> [391861.896744]  flush_smp_call_function_queue+0x88/0x110
> [391861.936872]  generic_smp_call_function_single_interrupt+0x13/0x30
> [391861.981074]  smp_call_function_single_interrupt+0x3a/0xe0
> [391862.022733]  call_function_single_interrupt+0x8e/0xa0
> [391862.062300] RIP: 0010:panic+0x206/0x258
> [391862.096123] RSP: 0018:ffff88103fd43e80 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff04
> [391862.148335] RAX: 000000000000003b RBX: 0000000000000000 RCX: 0000000000000006
> [391862.197879] RDX: 0000000000000000 RSI: 0000000000000092 RDI: ffff88103fd569d0
> [391862.247474] RBP: ffff88103fd43ef0 R08: 0000000000000000 R09: 0000000000000555
> [391862.296985] R10: 0000000000000001 R11: 0000000000aaaaaa R12: ffffffff81e6be9f
> [391862.346312] R13: 0000000000000000 R14: 0000000000000000 R15: 00000000ee6b2800
> [391862.395985]  watchdog_timer_fn+0x21a/0x230
> [391862.430116]  ? watchdog+0x30/0x30
> [391862.460248]  __hrtimer_run_queues+0xe7/0x230
> [391862.494845]  hrtimer_interrupt+0xa8/0x1a0
> [391862.527650]  smp_apic_timer_interrupt+0x6b/0x140
> [391862.563130]  apic_timer_interrupt+0x8e/0xa0
> [391862.596032]  </IRQ>
> [391862.618884] RIP: 0010:fsnotify+0x218/0x510
> [391862.650285] RSP: 0018:ffffc9000b877db8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
> [391862.698849] RAX: ffff882001c77a98 RBX: ffff882001c77a70 RCX: 0000000000000002
> [391862.744636] RDX: 0000000000028400 RSI: 0000000000000002 RDI: ffffffff8269a4e0
> [391862.791246] RBP: ffffc9000b877e98 R08: 0000000000000000 R09: 0000000000000000
> [391862.837248] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> [391862.883324] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [391862.928937]  ? fsnotify+0x4bb/0x510
> [391862.957183]  vfs_write+0x151/0x1b0
> [391862.984840]  ? syscall_trace_enter+0x1cd/0x2b0
> [391863.017128]  SyS_write+0x55/0xc0
> [391863.043812]  do_syscall_64+0x79/0x1b0
> [391863.072403]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> [391863.107687] RIP: 0033:0x483084
> [391863.133412] RSP: 002b:000000c43197d7f0 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> [391863.180683] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000483084
> [391863.226639] RDX: 00000000000002a9 RSI: 000000c424283c00 RDI: 0000000000000040
> [391863.272308] RBP: 000000c43197d840 R08: 0000000000000000 R09: 0000000000000000
> [391863.317590] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> [391863.363056] R13: 00000000000000f2 R14: 0000000000000032 R15: 0000000000000002
> [391863.409871] Code: c0 74 1a 48 8b 05 7f 44 ec 00 be fd 00 00 00 48 8b 80 a0 00 00 00 e8 ae 1a 9b
> 00 5d c3 89 fe 48 c7 c7 b8 26 e5 81 e8 21 45 09 00 <0f> 0b 5d c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f
> 44 00 00 55 48
> [391863.522945] ---[ end trace c661065d595325ac ]---
> 
> 
> ----[ sar -f ./sa16 -s 04:25:50 -e 05:00:00 -P 21 ]----
> Linux 4.14.32-1.el7.x86_64 (foobar)        04/16/2018      _x86_64_        (32 CPU)
> 
> 04:25:50 AM     CPU     %user     %nice   %system   %iowait    %steal     %idle
> 04:25:51 AM      21      0.00      0.00      0.00      0.00      0.00    100.00
> 04:25:52 AM      21      1.00      0.00      1.00      0.00      0.00     98.00
> 04:25:53 AM      21      0.00      0.00      0.00      0.00      0.00    100.00
> 04:25:54 AM      21      1.00      0.00      0.00      0.00      0.00     99.00
> 04:25:55 AM      21      0.00      0.00     70.71      0.00      0.00     29.29
> 04:25:56 AM      21      0.00      0.00    100.00      0.00      0.00      0.00
> 04:25:57 AM      21      0.00      0.00    100.00      0.00      0.00      0.00
> 04:25:58 AM      21      0.00      0.00    100.00      0.00      0.00      0.00
> 04:25:59 AM      21      0.00      0.00    100.00      0.00      0.00      0.00
> 04:26:00 AM      21      0.00      0.00    100.00      0.00      0.00      0.00
> 04:26:01 AM      21      0.00      0.00    100.00      0.00      0.00      0.00
> 04:26:02 AM      21      0.00      0.00    100.00      0.00      0.00      0.00
> 04:26:03 AM      21      0.00      0.00    100.00      0.00      0.00      0.00
> ----[ sar -f ./sa16 -s 04:25:50 -e 05:00:00 -P 21 ]----
> 
> 
> The fact we see one CPU spinning at 100% utilization in all above crashes is a good thing,
> as we can use it as a start point for our investigation. We just need to find out which
> (kernel/hardware/network driver/userland application) process makes a single CPU to be stuck.
> Thus, we disabled the trigger to panic the kernel when a soft lockup occurs, and we hope
> can find out the process.
> 
> The following panic is from the second type of panics I mentioned, where we don't
> observe soft lockups and CPU utilization is close to zero before the crash.
> 
> [123379.816452] perf: interrupt took too long (6243 > 6231), lowering
> kernel.perf_event_max_sample_rate to 32000
> [295349.255065] general protection fault: 0000 [#1] SMP PTI
> [295349.281440] Modules linked in: binfmt_misc sctp_diag sctp dccp_diag dccp tcp_diag udp_diag
> inet_diag unix_diag cfg80211 rfkill 8021q garp mrp xfs x86_pkg_temp_thermal intel_powerclamp loop
> coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel
> crypto_simd glue_helper cryptd iTCO_wdt ipmi_si iTCO_vendor_support intel_cstate intel_rapl_perf
> lpc_ich sg hpilo hpwdt ioatdma pcspkr ipmi_devintf i2c_i801 dca shpchp mfd_core wmi ipmi_msghandler
> nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2 i2c_algo_bit drm_kms_helper
> syscopyarea sysfillrect sysimgblt sd_mod fb_sys_fops ttm bnx2x mdio libcrc32c crc32c_intel serio_raw
> hpsa ptp drm scsi_transport_sas pps_core dm_mirror dm_region_hash dm_log dm_mod dax
> [295349.615070] CPU: 26 PID: 1384 Comm: thread.rb:70 Not tainted 4.14.32-1.el7.x86_64 #1
> [295349.654011] Hardware name: HP ProLiant BL460c Gen9, BIOS I36 10/25/2017
> [295349.686931] task: ffff882035430000 task.stack: ffffc90007bb4000
> [295349.716421] RIP: 0010:prefetch_freepointer.isra.63+0x11/0x20
> [295349.744812] RSP: 0018:ffffc90007bb7e08 EFLAGS: 00010202
> [295349.771654] RAX: 0000000000000000 RBX: 6236612d38373234 RCX: 00000000000199bb
> [295349.807690] RDX: 00000000000199ba RSI: 6236612d38373234 RDI: ffff88203ec259a0
> [295349.843664] RBP: ffffc90007bb7e08 R08: 0000000000028060 R09: ffffffff82051cc0
> [295349.879868] R10: 0000000000002000 R11: 0000000000000040 R12: 00000000014000c0
> [295349.916097] R13: ffff88203ec25980 R14: ffff88203ec25980 R15: ffff882000000000
> [295349.951868] FS:  00007f3f439f9700(0000) GS:ffff88203f480000(0000) knlGS:0000000000000000
> [295349.993039] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [295350.021664] CR2: 000000c43069c000 CR3: 000000203943e001 CR4: 00000000003606e0
> [295350.057534] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [295350.093663] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [295350.129254] Call Trace:
> [295350.141644]  kmem_cache_alloc+0x9c/0x1b0
> [295350.161581]  ? fsnotify_add_mark_locked+0x153/0x320
> [295350.186330]  fsnotify_add_mark_locked+0x153/0x320
> [295350.210023]  SyS_inotify_add_watch+0x2d5/0x350
> [295350.232414]  do_syscall_64+0x79/0x1b0
> [295350.250528]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> [295350.275482] RIP: 0033:0x7f3f53f409b7
> [295350.293330] RSP: 002b:00007f3f439f70c8 EFLAGS: 00000202 ORIG_RAX: 00000000000000fe
> [295350.330889] RAX: ffffffffffffffda RBX: 00007f3f2c232fc0 RCX: 00007f3f53f409b7
> [295350.365971] RDX: 0000000022000fc6 RSI: 0000000002eaba50 RDI: 0000000000000018
> [295350.400949] RBP: 0000000002677d20 R08: 000000005ad2a563 R09: 0000000009caa9a8
> [295350.436090] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000002677d20
> [295350.471552] R13: 000000000000fd02 R14: 000000000005dc08 R15: 00000000000081a4
> [295350.507348] Code: 31 d2 e8 b3 ea ff ff 5b 41 5c 5d c3 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00
> 0f 1f 44 00 00 55 48 85 f6 48 89 e5 74 0a 48 63 07 <48> 8b 04 06 0f 18 08 5d c3 66 0f 1f 44 00 00 0f
> 1f 44 00 00 48
> [295350.601490] RIP: prefetch_freepointer.isra.63+0x11/0x20 RSP: ffffc90007bb7e08
> [295350.637891] ---[ end trace 97f09d2dbcdbfe07 ]---
> [295350.666426] Kernel panic - not syncing: Fatal exception
> [295350.692470] Kernel Offset: disabled
> [295350.715267] ---[ end Kernel panic - not syncing: Fatal exception
> [295350.745027] ------------[ cut here ]------------
> [295350.767882] WARNING: CPU: 26 PID: 1384 at kernel/sched/core.c:1179 set_task_cpu+0x197/0x1a0
> [295350.809229] Modules linked in: binfmt_misc sctp_diag sctp dccp_diag dccp tcp_diag udp_diag
> inet_diag unix_diag cfg80211 rfkill 8021q garp mrp xfs x86_pkg_temp_thermal intel_powerclamp loop
> coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel
> crypto_simd glue_helper cryptd iTCO_wdt ipmi_si iTCO_vendor_support intel_cstate intel_rapl_perf
> lpc_ich sg hpilo hpwdt ioatdma pcspkr ipmi_devintf i2c_i801 dca shpchp mfd_core wmi ipmi_msghandler
> nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2 i2c_algo_bit drm_kms_helper
> syscopyarea sysfillrect sysimgblt sd_mod fb_sys_fops ttm bnx2x mdio libcrc32c crc32c_intel serio_raw
> hpsa ptp drm scsi_transport_sas pps_core dm_mirror dm_region_hash dm_log dm_mod dax
> [295351.141701] CPU: 26 PID: 1384 Comm: thread.rb:70 Tainted: G      D         4.14.32-1.el7.x86_64 #1
> [295351.186528] Hardware name: HP ProLiant BL460c Gen9, BIOS I36 10/25/2017
> [295351.219763] task: ffff882035430000 task.stack: ffffc90007bb4000
> [295351.249425] RIP: 0010:set_task_cpu+0x197/0x1a0
> [295351.272046] RSP: 0018:ffff88203f483cd8 EFLAGS: 00010046
> [295351.298021] RAX: 0000000000000200 RBX: ffff880fc6730000 RCX: 0000000000000001
> [295351.333003] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffff880fc6730000
> [295351.368440] RBP: ffff88203f483cf8 R08: 0000000000000008 R09: 0000000000000000
> [295351.404295] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880fc6730bac
> [295351.440065] R13: 0000000000000008 R14: 0000000000000008 R15: 0000000000022ac0
> [295351.475936] FS:  00007f3f439f9700(0000) GS:ffff88203f480000(0000) knlGS:0000000000000000
> [295351.516850] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [295351.545941] CR2: 000000c43069c000 CR3: 000000203943e001 CR4: 00000000003606e0
> [295351.581551] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [295351.616790] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [295351.652332] Call Trace:
> [295351.664980]  <IRQ>
> [295351.675389]  try_to_wake_up+0x16c/0x480
> [295351.694771]  default_wake_function+0x12/0x20
> [295351.716287]  autoremove_wake_function+0x16/0x60
> [295351.738731]  __wake_up_common+0x8f/0x160
> [295351.758434]  __wake_up_common_lock+0x7e/0xc0
> [295351.780379]  __wake_up+0x13/0x20
> [295351.796700]  wake_up_klogd_work_func+0x40/0x60
> [295351.818797]  irq_work_run_list+0x53/0x80
> [295351.838265]  ? tick_sched_do_timer+0x70/0x70
> [295351.859777]  irq_work_tick+0x40/0x50
> [295351.877976]  update_process_times+0x42/0x60
> [295351.899104]  tick_sched_handle+0x2d/0x60
> [295351.919406]  tick_sched_timer+0x39/0x70
> [295351.938722]  __hrtimer_run_queues+0xe7/0x230
> [295351.960148]  hrtimer_interrupt+0xa8/0x1a0
> [295351.979989]  smp_apic_timer_interrupt+0x6b/0x140
> [295352.003308]  apic_timer_interrupt+0x8e/0xa0
> [295352.024371]  </IRQ>
> [295352.035497] RIP: 0010:panic+0x206/0x258
> [295352.055056] RSP: 0018:ffffc90007bb7c58 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
> [295352.092974] RAX: 0000000000000034 RBX: 0000000000000200 RCX: 0000000000000006
> [295352.129345] RDX: 0000000000000000 RSI: 0000000000000082 RDI: ffff88203f4969d0
> [295352.164888] RBP: ffffc90007bb7cc8 R08: 0000000000000000 R09: 00000000000004bf
> [295352.200268] R10: ffffffff8140e7c0 R11: 00000000000004be R12: ffffffff81e4b096
> [295352.236368] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [295352.272653]  ? vgacon_invert_region+0x80/0x80
> [295352.294690]  ? panic+0x1ff/0x258
> [295352.311125]  oops_end+0xba/0xd0
> [295352.327275]  die+0x42/0x50
> [295352.341034]  do_general_protection+0xd2/0x160
> [295352.362771]  general_protection+0x25/0x50
> [295352.382624] RIP: 0010:prefetch_freepointer.isra.63+0x11/0x20
> [295352.410365] RSP: 0018:ffffc90007bb7e08 EFLAGS: 00010202
> [295352.435958] RAX: 0000000000000000 RBX: 6236612d38373234 RCX: 00000000000199bb
> [295352.471228] RDX: 00000000000199ba RSI: 6236612d38373234 RDI: ffff88203ec259a0
> [295352.506333] RBP: ffffc90007bb7e08 R08: 0000000000028060 R09: ffffffff82051cc0
> [295352.541869] R10: 0000000000002000 R11: 0000000000000040 R12: 00000000014000c0
> [295352.577452] R13: ffff88203ec25980 R14: ffff88203ec25980 R15: ffff882000000000
> [295352.613390]  ? idr_alloc_cmn+0x98/0xe0
> [295352.633360]  kmem_cache_alloc+0x9c/0x1b0
> [295352.653132]  ? fsnotify_add_mark_locked+0x153/0x320
> [295352.677495]  fsnotify_add_mark_locked+0x153/0x320
> [295352.700960]  SyS_inotify_add_watch+0x2d5/0x350
> [295352.723337]  do_syscall_64+0x79/0x1b0
> [295352.741929]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> [295352.767022] RIP: 0033:0x7f3f53f409b7
> [295352.785431] RSP: 002b:00007f3f439f70c8 EFLAGS: 00000202 ORIG_RAX: 00000000000000fe
> [295352.823469] RAX: ffffffffffffffda RBX: 00007f3f2c232fc0 RCX: 00007f3f53f409b7
> [295352.859222] RDX: 0000000022000fc6 RSI: 0000000002eaba50 RDI: 0000000000000018
> [295352.901958] RBP: 0000000002677d20 R08: 000000005ad2a563 R09: 0000000009caa9a8
> [295352.937907] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000002677d20
> [295352.974108] R13: 000000000000fd02 R14: 000000000005dc08 R15: 00000000000081a4
> [295353.010354] Code: ff 80 8b ac 08 00 00 04 e9 20 ff ff ff 0f 0b e9 b9 fe ff ff f7 83 84 00 00 00
> fd ff ff ff 0f 84 c3 fe ff ff 0f 0b e9 bc fe ff ff <0f> 0b e9 cb fe ff ff 66 90 0f 1f 44 00 00 55 48
> 89 e5 41 56 49
> [295353.103228] ---[ end trace 97f09d2dbcdbfe08 ]---
> [295353.126793] sched: Unexpected reschedule of offline CPU#8!
> [295353.154571] ------------[ cut here ]------------
> [295353.178193] WARNING: CPU: 26 PID: 1384 at arch/x86/kernel/smp.c:128
> native_smp_send_reschedule+0x42/0x50
> [295353.225115] Modules linked in: binfmt_misc sctp_diag sctp dccp_diag dccp tcp_diag udp_diag
> inet_diag unix_diag cfg80211 rfkill 8021q garp mrp xfs x86_pkg_temp_thermal intel_powerclamp loop
> coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel
> crypto_simd glue_helper cryptd iTCO_wdt ipmi_si iTCO_vendor_support intel_cstate intel_rapl_perf
> lpc_ich sg hpilo hpwdt ioatdma pcspkr ipmi_devintf i2c_i801 dca shpchp mfd_core wmi ipmi_msghandler
> nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2 i2c_algo_bit drm_kms_helper
> syscopyarea sysfillrect sysimgblt sd_mod fb_sys_fops ttm bnx2x mdio libcrc32c crc32c_intel serio_raw
> hpsa ptp drm scsi_transport_sas pps_core dm_mirror dm_region_hash dm_log dm_mod dax
> [295353.554858] CPU: 26 PID: 1384 Comm: thread.rb:70 Tainted: G      D W       4.14.32-1.el7.x86_64 #1
> [295353.600673] Hardware name: HP ProLiant BL460c Gen9, BIOS I36 10/25/2017
> [295353.634304] task: ffff882035430000 task.stack: ffffc90007bb4000
> [295353.664086] RIP: 0010:native_smp_send_reschedule+0x42/0x50
> [295353.691429] RSP: 0018:ffff88203f483c60 EFLAGS: 00010046
> [295353.717211] RAX: 000000000000002e RBX: 0000000000000008 RCX: 0000000000000006
> [295353.753162] RDX: 0000000000000000 RSI: 0000000000000096 RDI: ffff88203f4969d0
> [295353.789028] RBP: ffff88203f483c60 R08: 0000000000000000 R09: 000000000000050a
> [295353.824901] R10: ffffffff8140e7c0 R11: 0000000000000509 R12: ffff88203f222ac0
> [295353.860780] R13: ffff880fc6730000 R14: ffff88203f483d18 R15: ffff88203f222ac0
> [295353.897041] FS:  00007f3f439f9700(0000) GS:ffff88203f480000(0000) knlGS:0000000000000000
> [295353.937015] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [295353.965230] CR2: 000000c43069c000 CR3: 000000203943e001 CR4: 00000000003606e0
> [295354.001263] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [295354.037348] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [295354.073079] Call Trace:
> [295354.085676]  <IRQ>
> [295354.096271]  resched_curr+0xae/0xd0
> [295354.114398]  check_preempt_curr+0x79/0xa0
> [295354.134774]  ttwu_do_wakeup+0x1e/0x160
> [295354.153738]  ttwu_do_activate+0x7a/0x90
> [295354.173017]  try_to_wake_up+0x1e7/0x480
> [295354.192199]  default_wake_function+0x12/0x20
> [295354.213726]  autoremove_wake_function+0x16/0x60
> [295354.236555]  __wake_up_common+0x8f/0x160
> [295354.256636]  __wake_up_common_lock+0x7e/0xc0
> [295354.278570]  __wake_up+0x13/0x20
> [295354.295265]  wake_up_klogd_work_func+0x40/0x60
> [295354.317984]  irq_work_run_list+0x53/0x80
> [295354.337965]  ? tick_sched_do_timer+0x70/0x70
> [295354.359264]  irq_work_tick+0x40/0x50
> [295354.377736]  update_process_times+0x42/0x60
> [295354.399024]  tick_sched_handle+0x2d/0x60
> [295354.418996]  tick_sched_timer+0x39/0x70
> [295354.438406]  __hrtimer_run_queues+0xe7/0x230
> [295354.459586]  hrtimer_interrupt+0xa8/0x1a0
> [295354.479258]  smp_apic_timer_interrupt+0x6b/0x140
> [295354.502194]  apic_timer_interrupt+0x8e/0xa0
> [295354.523081]  </IRQ>
> [295354.533789] RIP: 0010:panic+0x206/0x258
> [295354.553565] RSP: 0018:ffffc90007bb7c58 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
> [295354.590890] RAX: 0000000000000034 RBX: 0000000000000200 RCX: 0000000000000006
> [295354.626876] RDX: 0000000000000000 RSI: 0000000000000082 RDI: ffff88203f4969d0
> [295354.662703] RBP: ffffc90007bb7cc8 R08: 0000000000000000 R09: 00000000000004bf
> [295354.698251] R10: ffffffff8140e7c0 R11: 00000000000004be R12: ffffffff81e4b096
> [295354.733758] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [295354.769850]  ? vgacon_invert_region+0x80/0x80
> [295354.791724]  ? panic+0x1ff/0x258
> [295354.808021]  oops_end+0xba/0xd0
> [295354.823809]  die+0x42/0x50
> [295354.837948]  do_general_protection+0xd2/0x160
> [295354.859636]  general_protection+0x25/0x50
> [295354.880150] RIP: 0010:prefetch_freepointer.isra.63+0x11/0x20
> [295354.908869] RSP: 0018:ffffc90007bb7e08 EFLAGS: 00010202
> [295354.935002] RAX: 0000000000000000 RBX: 6236612d38373234 RCX: 00000000000199bb
> [295354.970812] RDX: 00000000000199ba RSI: 6236612d38373234 RDI: ffff88203ec259a0
> [295355.006560] RBP: ffffc90007bb7e08 R08: 0000000000028060 R09: ffffffff82051cc0
> [295355.042849] R10: 0000000000002000 R11: 0000000000000040 R12: 00000000014000c0
> [295355.077849] R13: ffff88203ec25980 R14: ffff88203ec25980 R15: ffff882000000000
> [295355.113175]  ? idr_alloc_cmn+0x98/0xe0
> [295355.132128]  kmem_cache_alloc+0x9c/0x1b0
> [295355.151819]  ? fsnotify_add_mark_locked+0x153/0x320
> [295355.176264]  fsnotify_add_mark_locked+0x153/0x320
> [295355.199925]  SyS_inotify_add_watch+0x2d5/0x350
> [295355.222164]  do_syscall_64+0x79/0x1b0
> [295355.240555]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> [295355.266353] RIP: 0033:0x7f3f53f409b7
> [295355.284573] RSP: 002b:00007f3f439f70c8 EFLAGS: 00000202 ORIG_RAX: 00000000000000fe
> [295355.322272] RAX: ffffffffffffffda RBX: 00007f3f2c232fc0 RCX: 00007f3f53f409b7
> [295355.357920] RDX: 0000000022000fc6 RSI: 0000000002eaba50 RDI: 0000000000000018
> [295355.393626] RBP: 0000000002677d20 R08: 000000005ad2a563 R09: 0000000009caa9a8
> [295355.429391] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000002677d20
> [295355.464726] R13: 000000000000fd02 R14: 000000000005dc08 R15: 00000000000081a4
> [295355.500091] Code: c0 74 1a 48 8b 05 7f 44 ec 00 be fd 00 00 00 48 8b 80 a0 00 00 00 e8 ae 1a 9b
> 00 5d c3 89 fe 48 c7 c7 b8 26 e5 81 e8 21 45 09 00 <0f> 0b 5d c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f
> 44 00 00 55 48
> [295355.592809] ---[ end trace 97f09d2dbcdbfe09 ]---
> [295355.616249] sched: Unexpected reschedule of offline CPU#0!
> [295355.642901] ------------[ cut here ]------------
> [295355.666243] WARNING: CPU: 26 PID: 1384 at arch/x86/kernel/smp.c:128
> native_smp_send_reschedule+0x42/0x50
> [295355.713782] Modules linked in: binfmt_misc sctp_diag sctp dccp_diag dccp tcp_diag udp_diag
> inet_diag unix_diag cfg80211 rfkill 8021q garp mrp xfs x86_pkg_temp_thermal intel_powerclamp loop
> coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel
> crypto_simd glue_helper cryptd iTCO_wdt ipmi_si iTCO_vendor_support intel_cstate intel_rapl_perf
> lpc_ich sg hpilo hpwdt ioatdma pcspkr ipmi_devintf i2c_i801 dca shpchp mfd_core wmi ipmi_msghandler
> nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2 i2c_algo_bit drm_kms_helper
> syscopyarea sysfillrect sysimgblt sd_mod fb_sys_fops ttm bnx2x mdio libcrc32c crc32c_intel serio_raw
> hpsa ptp drm scsi_transport_sas pps_core dm_mirror dm_region_hash dm_log dm_mod dax
> [295356.048067] CPU: 26 PID: 1384 Comm: thread.rb:70 Tainted: G      D W       4.14.32-1.el7.x86_64 #1
> [295356.094292] Hardware name: HP ProLiant BL460c Gen9, BIOS I36 10/25/2017
> [295356.127304] task: ffff882035430000 task.stack: ffffc90007bb4000
> [295356.157937] RIP: 0010:native_smp_send_reschedule+0x42/0x50
> [295356.186118] RSP: 0018:ffff88203f483c58 EFLAGS: 00010046
> [295356.212721] RAX: 000000000000002e RBX: ffff8810391945c0 RCX: 0000000000000006
> [295356.247928] RDX: 0000000000000000 RSI: 0000000000000082 RDI: ffff88203f4969d0
> [295356.284320] RBP: ffff88203f483c58 R08: 0000000000000000 R09: 0000000000000559
> [295356.320685] R10: ffffffff8140e7c0 R11: 0000000000000558 R12: ffff88103919516c
> [295356.356635] R13: 0000000000000004 R14: 0000000000000046 R15: 0000000000022ac0
> [295356.392135] FS:  00007f3f439f9700(0000) GS:ffff88203f480000(0000) knlGS:0000000000000000
> [295356.432737] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [295356.461522] CR2: 000000c43069c000 CR3: 000000203943e001 CR4: 00000000003606e0
> [295356.497800] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [295356.533485] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [295356.569205] Call Trace:
> [295356.581694]  <IRQ>
> [295356.591921]  try_to_wake_up+0x405/0x480
> [295356.611188]  default_wake_function+0x12/0x20
> [295356.632564]  __wake_up_common+0x8f/0x160
> [295356.652486]  __wake_up_locked+0x16/0x20
> [295356.671808]  ep_poll_callback+0xd0/0x300
> [295356.691565]  __wake_up_common+0x8f/0x160
> [295356.711684]  __wake_up_common_lock+0x7e/0xc0
> [295356.733447]  __wake_up+0x13/0x20
> [295356.749916]  wake_up_klogd_work_func+0x40/0x60
> [295356.772512]  irq_work_run_list+0x53/0x80
> [295356.792701]  ? tick_sched_do_timer+0x70/0x70
> [295356.821294]  irq_work_tick+0x40/0x50
> [295356.839929]  update_process_times+0x42/0x60
> [295356.860941]  tick_sched_handle+0x2d/0x60
> [295356.881072]  tick_sched_timer+0x39/0x70
> [295356.900787]  __hrtimer_run_queues+0xe7/0x230
> [295356.922396]  hrtimer_interrupt+0xa8/0x1a0
> [295356.942760]  smp_apic_timer_interrupt+0x6b/0x140
> [295356.966377]  apic_timer_interrupt+0x8e/0xa0
> [295356.987700]  </IRQ>
> [295356.998764] RIP: 0010:panic+0x206/0x258
> [295357.018139] RSP: 0018:ffffc90007bb7c58 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
> [295357.055880] RAX: 0000000000000034 RBX: 0000000000000200 RCX: 0000000000000006
> [295357.092139] RDX: 0000000000000000 RSI: 0000000000000082 RDI: ffff88203f4969d0
> [295357.127348] RBP: ffffc90007bb7cc8 R08: 0000000000000000 R09: 00000000000004bf
> [295357.163530] R10: ffffffff8140e7c0 R11: 00000000000004be R12: ffffffff81e4b096
> [295357.200334] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [295357.236063]  ? vgacon_invert_region+0x80/0x80
> [295357.257667]  ? panic+0x1ff/0x258
> [295357.274076]  oops_end+0xba/0xd0
> [295357.290155]  die+0x42/0x50
> [295357.303914]  do_general_protection+0xd2/0x160
> [295357.326145]  general_protection+0x25/0x50
> [295357.346126] RIP: 0010:prefetch_freepointer.isra.63+0x11/0x20
> [295357.374233] RSP: 0018:ffffc90007bb7e08 EFLAGS: 00010202
> [295357.400584] RAX: 0000000000000000 RBX: 6236612d38373234 RCX: 00000000000199bb
> [295357.436122] RDX: 00000000000199ba RSI: 6236612d38373234 RDI: ffff88203ec259a0
> [295357.471905] RBP: ffffc90007bb7e08 R08: 0000000000028060 R09: ffffffff82051cc0
> [295357.508220] R10: 0000000000002000 R11: 0000000000000040 R12: 00000000014000c0
> [295357.544201] R13: ffff88203ec25980 R14: ffff88203ec25980 R15: ffff882000000000
> [295357.580063]  ? idr_alloc_cmn+0x98/0xe0
> [295357.598651]  kmem_cache_alloc+0x9c/0x1b0
> [295357.617905]  ? fsnotify_add_mark_locked+0x153/0x320
> [295357.641988]  fsnotify_add_mark_locked+0x153/0x320
> [295357.665286]  SyS_inotify_add_watch+0x2d5/0x350
> [295357.687722]  do_syscall_64+0x79/0x1b0
> [295357.706171]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> [295357.731499] RIP: 0033:0x7f3f53f409b7
> [295357.749414] RSP: 002b:00007f3f439f70c8 EFLAGS: 00000202 ORIG_RAX: 00000000000000fe
> [295357.787490] RAX: ffffffffffffffda RBX: 00007f3f2c232fc0 RCX: 00007f3f53f409b7
> [295357.823420] RDX: 0000000022000fc6 RSI: 0000000002eaba50 RDI: 0000000000000018
> [295357.859615] RBP: 0000000002677d20 R08: 000000005ad2a563 R09: 0000000009caa9a8
> [295357.895120] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000002677d20
> [295357.931829] R13: 000000000000fd02 R14: 000000000005dc08 R15: 00000000000081a4
> [295357.967565] Code: c0 74 1a 48 8b 05 7f 44 ec 00 be fd 00 00 00 48 8b 80 a0 00 00 00 e8 ae 1a 9b
> 00 5d c3 89 fe 48 c7 c7 b8 26 e5 81 e8 21 45 09 00 <0f> 0b 5d c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f
> 44 00 00 55 48
> [295358.060705] ---[ end trace 97f09d2dbcdbfe0a ]---
> 
> 
> ---[ sar -f ./sa15 -s 01:05:00 -e 02:00:00 -P 26 ]---
> Linux 4.14.32-1.el7.x86_64 (foomar) 	04/15/2018 	_x86_64_	(32 CPU)
> 
> 01:05:00 AM     CPU     %user     %nice   %system   %iowait    %steal     %idle
> 01:05:01 AM      26      0.00      0.00      0.00      0.00      0.00    100.00
> 01:05:02 AM      26      0.00      0.00      0.00      0.00      0.00    100.00
> 01:05:03 AM      26      0.00      0.00      0.00      0.00      0.00    100.00
> 01:05:04 AM      26      0.00      0.00      0.00      0.00      0.00    100.00
> 01:05:05 AM      26      0.99      0.00      0.99      0.00      0.00     98.02
> 01:05:06 AM      26      0.00      0.00      0.00      0.00      0.00    100.00
> 01:05:07 AM      26      0.00      0.00      0.00      0.00      0.00    100.00
> 01:05:08 AM      26      0.00      0.00      0.00      0.00      0.00    100.00
> 01:05:09 AM      26      0.00      0.00      0.00      0.00      0.00    100.00
> 01:05:10 AM      26      0.00      0.00      0.00      0.00      0.00    100.00
> 01:05:11 AM      26      0.99      0.00      0.00      0.00      0.00     99.01
> 01:05:12 AM      26      0.00      0.00      0.00      0.00      0.00    100.00
> 01:05:13 AM      26      0.00      0.00      0.00      0.00      0.00    100.00
> 01:05:14 AM      26      0.00      0.00      0.00      0.00      0.00    100.00
> 01:05:15 AM      26      0.00      0.00      0.00      0.00      0.00    100.00
> 01:05:16 AM      26      2.00      0.00      1.00      0.00      0.00     97.00
> 01:05:17 AM      26      0.00      0.00      0.00      0.00      0.00    100.00
> 01:05:18 AM      26      0.00      0.00      0.00      0.00      0.00    100.00
> ---[ sar -f ./sa15 -s 01:05:00 -e 02:00:00 -P 26 ]---
> 
> 
> Any ideas would be very much appreciated.
> 
> Cheers,
> Pavlos Parissis
> 




-- 
Guillaume Morin <guillaume@xxxxxxxxxxx>




[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]