Re: ceph node crashed with these errors "kernel: ceph: build_snap_context" (maybe now it is urgent?)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Dec 2, 2019 at 1:23 PM Marc Roos <M.Roos@xxxxxxxxxxxxxxxxx> wrote:
>
>
>
> I guess this is related? kworker 100%
>
>
> [Mon Dec  2 13:05:27 2019] SysRq : Show backtrace of all active CPUs
> [Mon Dec  2 13:05:27 2019] sending NMI to all CPUs:
> [Mon Dec  2 13:05:27 2019] NMI backtrace for cpu 0 skipped: idling at pc
> 0xffffffffb0581e94
> [Mon Dec  2 13:05:27 2019] NMI backtrace for cpu 1 skipped: idling at pc
> 0xffffffffb0581e94
> [Mon Dec  2 13:05:27 2019] NMI backtrace for cpu 2 skipped: idling at pc
> 0xffffffffb0581e94
> [Mon Dec  2 13:05:27 2019] NMI backtrace for cpu 3 skipped: idling at pc
> 0xffffffffb0581e94
> [Mon Dec  2 13:05:27 2019] NMI backtrace for cpu 4
> [Mon Dec  2 13:05:27 2019] CPU: 4 PID: 426200 Comm: kworker/4:2 Not
> tainted 3.10.0-1062.4.3.el7.x86_64 #1
> [Mon Dec  2 13:05:27 2019] Hardware name: Supermicro
> X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.0b 05/27/2014
> [Mon Dec  2 13:05:27 2019] Workqueue: ceph-msgr ceph_con_workfn
> [libceph]
> [Mon Dec  2 13:05:27 2019] task: ffffa0c8e1240000 ti: ffffa0ccb6364000
> task.ti: ffffa0ccb6364000
> [Mon Dec  2 13:05:27 2019] RIP: 0010:[<ffffffffc08d7db9>]
> [<ffffffffc08d7db9>] cmpu64_rev+0x19/0x20 [ceph]
> [Mon Dec  2 13:05:27 2019] RSP: 0018:ffffa0ccb6367a20  EFLAGS: 00000202
> [Mon Dec  2 13:05:27 2019] RAX: 0000000000000001 RBX: 0000000000000038
> RCX: 0000000000000008
> [Mon Dec  2 13:05:27 2019] RDX: 0000000000025c33 RSI: ffffa0cbbe380050
> RDI: ffffa0cbbe380030
> [Mon Dec  2 13:05:27 2019] RBP: ffffa0ccb6367a20 R08: 0000000000000018
> R09: 00000000000013ed
> [Mon Dec  2 13:05:27 2019] R10: 0000000000000002 R11: ffffe94994f8e000
> R12: ffffa0cbbe380030
> [Mon Dec  2 13:05:27 2019] R13: ffffffffc08d7da0 R14: ffffa0cbbe380018
> R15: ffffa0cbbe380050
> [Mon Dec  2 13:05:27 2019] FS:  0000000000000000(0000)
> GS:ffffa0d2cfb00000(0000) knlGS:0000000000000000
> [Mon Dec  2 13:05:27 2019] CS:  0010 DS: 0000 ES: 0000 CR0:
> 0000000080050033
> [Mon Dec  2 13:05:27 2019] CR2: 000055a7c413fcb9 CR3: 0000001813010000
> CR4: 00000000000607e0
> [Mon Dec  2 13:05:27 2019] Call Trace:
> [Mon Dec  2 13:05:27 2019]  [<ffffffffb019303f>] sort+0x1af/0x260
> [Mon Dec  2 13:05:27 2019]  [<ffffffffb0192e60>] ? u32_swap+0x10/0x10
> [Mon Dec  2 13:05:27 2019]  [<ffffffffc08d807b>]
> build_snap_context+0x12b/0x290 [ceph]
> [Mon Dec  2 13:05:27 2019]  [<ffffffffc08d820c>]
> rebuild_snap_realms+0x2c/0x90 [ceph]
> [Mon Dec  2 13:05:27 2019]  [<ffffffffc08d822b>]
> rebuild_snap_realms+0x4b/0x90 [ceph]
> [Mon Dec  2 13:05:27 2019]  [<ffffffffc08d91fc>]
> ceph_update_snap_trace+0x3ec/0x530 [ceph]
> [Mon Dec  2 13:05:27 2019]  [<ffffffffc08e2239>]
> handle_reply+0x359/0xc60 [ceph]
> [Mon Dec  2 13:05:27 2019]  [<ffffffffc08e48ba>] dispatch+0x11a/0xb00
> [ceph]
> [Mon Dec  2 13:05:27 2019]  [<ffffffffb042e56a>] ?
> kernel_recvmsg+0x3a/0x50
> [Mon Dec  2 13:05:27 2019]  [<ffffffffc05fcff4>] try_read+0x544/0x1300
> [libceph]
> [Mon Dec  2 13:05:27 2019]  [<ffffffffafee13ce>] ?
> account_entity_dequeue+0xae/0xd0
> [Mon Dec  2 13:05:27 2019]  [<ffffffffafee4d5c>] ?
> dequeue_entity+0x11c/0x5e0
> [Mon Dec  2 13:05:27 2019]  [<ffffffffb042e417>] ?
> kernel_sendmsg+0x37/0x50
> [Mon Dec  2 13:05:27 2019]  [<ffffffffc05fdfb4>]
> ceph_con_workfn+0xe4/0x1530 [libceph]
> [Mon Dec  2 13:05:27 2019]  [<ffffffffb057f568>] ?
> __schedule+0x448/0x9c0
> [Mon Dec  2 13:05:27 2019]  [<ffffffffafebe21f>]
> process_one_work+0x17f/0x440
> [Mon Dec  2 13:05:27 2019]  [<ffffffffafebf336>]
> worker_thread+0x126/0x3c0
> [Mon Dec  2 13:05:27 2019]  [<ffffffffafebf210>] ?
> manage_workers.isra.26+0x2a0/0x2a0
> [Mon Dec  2 13:05:27 2019]  [<ffffffffafec61f1>] kthread+0xd1/0xe0
> [Mon Dec  2 13:05:27 2019]  [<ffffffffafec6120>] ?
> insert_kthread_work+0x40/0x40
> [Mon Dec  2 13:05:27 2019]  [<ffffffffb058cd37>]
> ret_from_fork_nospec_begin+0x21/0x21
> [Mon Dec  2 13:05:27 2019]  [<ffffffffafec6120>] ?
> insert_kthread_work+0x40/0x40
> [Mon Dec  2 13:05:27 2019] Code: 87 c8 fc ff ff 5d 0f 94 c0 0f b6 c0 c3
> 0f 1f 44 00 00 66 66 66 66 90 48 8b 16 48 39 17 b8 01 00 00 00 55 48 89
> e5 72 08 0f 97 c0 <0f> b6 c0 f7 d8 5d c3 66 66 66 66 90 55 f6 05 ed 92
> 02 00 04 48
> [Mon Dec  2 13:05:27 2019] NMI backtrace for cpu 5

Yes, seems related.  I'm not sure how it relates to an upgrade to
nautilus, but as I mentioned in a different message, with thousands of
snapshots you are in a dangerous territory anyway.

Thanks,

                Ilya
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux