Hi,
Suddenly some of our Infernalis OSD nodes are down with "kernel:BUG: soft lockup" message. Nothing can do after that until rebooting. When I do recovery by restarting the down OSDs, one by one while add additional OSDs too, I get the same error again with on the same nodes. I'm not sure which of "ceph-disk activate" of recovery or "ceph-disk prepare" of new additional OSDs is related or both or none of them is related. Maybe following trace can help whoever can understand it. Any though?
Best regards,
May 12 17:27:20 storage-b kernel: NMI backtrace for cpu 6
May 12 17:27:20 storage-b kernel: CPU: 6 PID: 1799 Comm: master Tainted: G W OEL ------------ 3.10.0-
327.13.1.el7.x86_64 #1
May 12 17:27:20 storage-b kernel: Hardware name: Supermicro SSG-2028R-E1CR24N/X10DRi-T4+, BIOS 1.0b 01/29/2015
May 12 17:27:20 storage-b kernel: task: ffff8820247ab980 ti: ffff8820185d8000 task.ti: ffff8820185d8000
May 12 17:27:20 storage-b kernel: RIP: 0010:[<ffffffff8163d0f7>] [<ffffffff8163d0f7>] _raw_spin_lock+0x37/0x50
May 12 17:27:20 storage-b kernel: RSP: 0018:ffff8820185dbc30 EFLAGS: 00000206
May 12 17:27:20 storage-b kernel: RAX: 0000000000007bb1 RBX: ffff881e8d5dae60 RCX: 0000000000008cfc
May 12 17:27:20 storage-b kernel: RDX: 0000000000008d0c RSI: 0000000000008d0c RDI: ffffffff81943400
May 12 17:27:20 storage-b kernel: RBP: ffff8820185dbc30 R08: ffff881ea4af0510 R09: 000000000000fffa
May 12 17:27:20 storage-b kernel: R10: ffff881ea4af0480 R11: ffff881ea4af09c0 R12: ffff881e8d5daf68
May 12 17:27:20 storage-b kernel: R13: ffffffff8167fd40 R14: ffffffff8167fd40 R15: ffff881e8d5dae60
May 12 17:27:20 storage-b kernel: FS: 00007fcd70e17840(0000) GS:ffff88103fcc0000(0000) knlGS:0000000000000000
May 12 17:27:20 storage-b kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 12 17:27:20 storage-b kernel: CR2: 00007fcd7105f78c CR3: 0000002027d6f000 CR4: 00000000001407e0
May 12 17:27:20 storage-b kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
May 12 17:27:20 storage-b kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
May 12 17:27:20 storage-b kernel: Stack:
May 12 17:27:20 storage-b kernel: ffff8820185dbc58 ffffffff811fa10b ffff881e8d5dae60 ffff881e8d5daee8
May 12 17:27:20 storage-b kernel: ffff88202909b800 ffff8820185dbc88 ffffffff811fa985 ffff881ea4af09c0
May 12 17:27:20 storage-b kernel: ffff881ea4af0500 ffff881ea4af04d8 ffff8820185dbcf8 ffff8820185dbce0
May 12 17:27:20 storage-b kernel: Call Trace:
May 12 17:27:20 storage-b kernel: [<ffffffff811fa10b>] evict+0x6b/0x170
May 12 17:27:20 storage-b kernel: [<ffffffff811fa985>] iput+0xf5/0x180
May 12 17:27:20 storage-b kernel: [<ffffffff811f713c>] shrink_dentry_list+0x45c/0x480
May 12 17:27:20 storage-b kernel: [<ffffffff811f7268>] shrink_dcache_parent+0x38/0x90
May 12 17:27:20 storage-b kernel: [<ffffffff8124ec06>] proc_flush_task+0xb6/0x1b0
May 12 17:27:20 storage-b kernel: [<ffffffff8107fdef>] release_task+0x3f/0x470
May 12 17:27:20 storage-b kernel: [<ffffffff810bc630>] ? thread_group_cputime_adjusted+0x50/0x70
May 12 17:27:20 storage-b kernel: [<ffffffff81080b7f>] wait_consider_task+0x95f/0xb80
May 12 17:27:20 storage-b kernel: [<ffffffff81080ea8>] do_wait+0x108/0x260
May 12 17:27:20 storage-b kernel: [<ffffffff810820c0>] SyS_wait4+0x80/0x110
May 12 17:27:20 storage-b kernel: [<ffffffff8107fb40>] ? task_stopped_code+0x60/0x60
May 12 17:27:20 storage-b kernel: [<ffffffff81645ec9>] system_call_fastpath+0x16/0x1b
May 12 17:27:20 storage-b kernel: Code: 02 00 f0 0f c1 07 89 c2 c1 ea 10 66 39 c2 75 02 5d c3 83 e2 fe 0f b7 f2 b8
00 80 00 00 eb 0c 0f 1f 44 00 00 f3 90 83 e8 01 74 0a <0f> b7 0f 66 39 ca 75 f1 5d c3 0f 1f 80 00 00 00 00 eb da 66
0f
May 12 17:27:20 storage-b kernel: NMI backtrace for cpu 2
May 12 17:27:20 storage-b kernel: CPU: 2 PID: 6182 Comm: ceph-osd Tainted: G W OEL ------------ 3.10.0-
327.13.1.el7.x86_64 #1
May 12 17:27:20 storage-b kernel: Hardware name: Supermicro SSG-2028R-E1CR24N/X10DRi-T4+, BIOS 1.0b 01/29/2015
May 12 17:27:20 storage-b kernel: task: ffff882025862e00 ti: ffff882014760000 task.ti: ffff882014760000
May 12 17:27:20 storage-b kernel: RIP: 0010:[<ffffffff8163d0f2>] [<ffffffff8163d0f2>] _raw_spin_lock+0x32/0x50
May 12 17:27:20 storage-b kernel: RSP: 0018:ffff882014763a38 EFLAGS: 00000212
May 12 17:27:20 storage-b kernel: RAX: 0000000000006000 RBX: ffff880f66dca7f8 RCX: 0000000000008cfc
May 12 17:27:20 storage-b kernel: RDX: 0000000000008d12 RSI: 0000000000008d12 RDI: ffffffff81943400
May 12 17:27:20 storage-b kernel: RBP: ffff882014763a38 R08: ffff881e783bc460 R09: ffff8810225d9820
May 12 17:27:20 storage-b kernel: R10: ffffffffa0545ab1 R11: ffffea0079e0ef00 R12: ffff880f66dca7f8
May 12 17:27:20 storage-b kernel: R13: ffff882014763b38 R14: 0000000000000000 R15: ffff881020f34840
May 12 17:27:20 storage-b kernel: FS: 00007f7037c1d700(0000) GS:ffff88103fc40000(0000) knlGS:0000000000000000
May 12 17:27:20 storage-b kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 12 17:27:20 storage-b kernel: CR2: 00007f7057331ea0 CR3: 0000002023052000 CR4: 00000000001407e0
May 12 17:27:20 storage-b kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
May 12 17:27:20 storage-b kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
May 12 17:27:20 storage-b kernel: Stack:
May 12 17:27:20 storage-b kernel: ffff882014763a50 ffffffff811f8dc9 ffff880f66dca640 ffff882014763a70
May 12 17:27:20 storage-b kernel: ffffffffa0531b44 ffff881e783bc3a0 0000000000000001 ffff882014763ae0
May 12 17:27:20 storage-b kernel: ffffffffa0533e9d ffff882025862e00 0000000000000000 ffff880f66dca640
May 12 17:27:20 storage-b kernel: Call Trace:
May 12 17:27:20 storage-b kernel: [<ffffffff811f8dc9>] inode_sb_list_add+0x19/0x50
May 12 17:27:20 storage-b kernel: [<ffffffffa0531b44>] xfs_setup_inode+0x34/0x2f0 [xfs]
May 12 17:27:20 storage-b kernel: [<ffffffffa0533e9d>] xfs_ialloc+0x2cd/0x540 [xfs]
May 12 17:27:20 storage-b kernel: [<ffffffffa0534186>] xfs_dir_ialloc+0x76/0x280 [xfs]
May 12 17:27:20 storage-b kernel: [<ffffffffa054421b>] ? xfs_log_reserve+0x15b/0x1b0 [xfs]
May 12 17:27:20 storage-b kernel: [<ffffffff8163a082>] ? down_write+0x12/0x30
May 12 17:27:20 storage-b kernel: [<ffffffffa0534664>] xfs_create+0x284/0x710 [xfs]
May 12 17:27:20 storage-b kernel: [<ffffffffa0530d8b>] xfs_vn_mknod+0xbb/0x250 [xfs]
May 12 17:27:20 storage-b kernel: [<ffffffffa0530f53>] xfs_vn_create+0x13/0x20 [xfs]
May 12 17:27:20 storage-b kernel: [<ffffffff811ead2d>] vfs_create+0xcd/0x130
May 12 17:27:20 storage-b kernel: [<ffffffff811ec3bf>] do_last+0xbef/0x1270
May 12 17:27:20 storage-b kernel: [<ffffffff811ee722>] path_openat+0xc2/0x490
May 12 17:27:20 storage-b kernel: [<ffffffff811efdf2>] ? user_path_at_empty+0x72/0xc0
May 12 17:27:20 storage-b kernel: [<ffffffff811efeeb>] do_filp_open+0x4b/0xb0
May 12 17:27:20 storage-b kernel: [<ffffffff811fca77>] ? __alloc_fd+0xa7/0x130
May 12 17:27:20 storage-b kernel: [<ffffffff811dd893>] do_sys_open+0xf3/0x1f0
May 12 17:27:20 storage-b kernel: [<ffffffff811dd9ae>] SyS_open+0x1e/0x20
May 12 17:27:20 storage-b kernel: [<ffffffff81645ec9>] system_call_fastpath+0x16/0x1b
May 12 17:27:20 storage-b kernel: Code: 89 e5 b8 00 00 02 00 f0 0f c1 07 89 c2 c1 ea 10 66 39 c2 75 02 5d c3 83 e2
fe 0f b7 f2 b8 00 80 00 00 eb 0c 0f 1f 44 00 00 f3 90 <83> e8 01 74 0a 0f b7 0f 66 39 ca 75 f1 5d c3 0f 1f 80 00 00
00
May 12 17:27:20 storage-b kernel: NMI backtrace for cpu 22
May 12 17:27:20 storage-b kernel: CPU: 22 PID: 0 Comm: swapper/22 Tainted: G W OEL ------------ 3.10.0-
327.13.1.el7.x86_64 #1
May 12 17:27:20 storage-b kernel: Hardware name: Supermicro SSG-2028R-E1CR24N/X10DRi-T4+, BIOS 1.0b 01/29/2015
May 12 17:27:20 storage-b kernel: task: ffff8820291f4500 ti: ffff881029278000 task.ti: ffff881029278000
May 12 17:27:20 storage-b kernel: RIP: 0010:[<ffffffff8135df87>] [<ffffffff8135df87>] intel_idle+0xd7/0x160
May 12 17:27:20 storage-b kernel: RSP: 0018:ffff88102927be10 EFLAGS: 00000046
May 12 17:27:20 storage-b kernel: RAX: 0000000000000020 RBX: 0000000000000008 RCX: 0000000000000001
May 12 17:27:20 storage-b kernel: RDX: 0000000000000000 RSI: ffff88102927bfd8 RDI: 000000000194a000
May 12 17:27:20 storage-b kernel: RBP: ffff88102927be40 R08: 000000000000d67f R09: 0000000000000018
May 12 17:27:20 storage-b kernel: R10: 000000000000570c R11: 0000000000000001 R12: ffff88102927bfd8
May 12 17:27:20 storage-b kernel: R13: 0000000000000004 R14: 0000000000000020 R15: ffffffff819fdeb8
May 12 17:27:20 storage-b kernel: FS: 0000000000000000(0000) GS:ffff88103fdc0000(0000) knlGS:0000000000000000
May 12 17:27:20 storage-b kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 12 17:27:20 storage-b kernel: CR2: 00007f7052704200 CR3: 000000000194a000 CR4: 00000000001407e0
May 12 17:27:20 storage-b kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
May 12 17:27:20 storage-b kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
May 12 17:27:20 storage-b kernel: Stack:
May 12 17:27:20 storage-b kernel: 000000162927be40 73dc88025ab7a9e3 ffff88103fddea00 ffffffff819fdd40
May 12 17:27:20 storage-b kernel: 00000e6ff44bcfe4 0000000000000004 ffff88102927be78 ffffffff814d46e0
May 12 17:27:20 storage-b kernel: ffff88103fddea00 0000000000000004 0000000000000004 ffffffff819fdd40
May 12 17:27:20 storage-b kernel: Call Trace:
May 12 17:27:20 storage-b kernel: [<ffffffff814d46e0>] cpuidle_enter_state+0x40/0xc0
May 12 17:27:20 storage-b kernel: [<ffffffff814d4839>] cpuidle_idle_call+0xd9/0x210
May 12 17:27:20 storage-b kernel: [<ffffffff8101e4be>] arch_cpu_idle+0xe/0x30
May 12 17:27:20 storage-b kernel: [<ffffffff810d6325>] cpu_startup_entry+0x245/0x290
May 12 17:27:20 storage-b kernel: [<ffffffff810475fa>] start_secondary+0x1ba/0x230
May 12 17:27:20 storage-b kernel: Code: 31 d2 65 48 8b 34 25 b8 b7 00 00 48 89 d1 48 8d 86 38 c0 ff ff 0f 01 c8 48
8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 f0 0f 01 c9 <65> 48 8b 04 25 b8 b7 00 00 f0 80 a0 3a c0 ff ff 7f 85 1d 9a
fd
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com