Re: Kworker 100% with ceph-msgr (after upgrade to 14.2.6?)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jan 14, 2020 at 10:31 AM Marc Roos <M.Roos@xxxxxxxxxxxxxxxxx> wrote:
>
>
> I think this is new since I upgraded to 14.2.6. kworker/7:3 100%
>
> [@~]# echo l > /proc/sysrq-trigger
>
> [Tue Jan 14 10:05:08 2020] CPU: 7 PID: 2909400 Comm: kworker/7:0 Not
> tainted 3.10.0-1062.4.3.el7.x86_64 #1
>
> [Tue Jan 14 10:05:08 2020] Workqueue: ceph-msgr ceph_con_workfn
> [libceph]
> [Tue Jan 14 10:05:08 2020] task: ffffa0d2cb9db150 ti: ffffa0d3040f0000
> task.ti: ffffa0d3040f0000
> [Tue Jan 14 10:05:08 2020] RIP: 0010:[<ffffffffb0192e7e>]
> [<ffffffffb0192e7e>] generic_swap+0x1e/0x30
> [Tue Jan 14 10:05:08 2020] RSP: 0018:ffffa0d3040f3a20  EFLAGS: 00000206
> [Tue Jan 14 10:05:08 2020] RAX: 0000000000000072 RBX: 0000000000000060
> RCX: 0000000000000072
> [Tue Jan 14 10:05:08 2020] RDX: 0000000000000006 RSI: ffffa0c788914a7a
> RDI: ffffa0c788914a42
> [Tue Jan 14 10:05:08 2020] RBP: ffffa0d3040f3a20 R08: 0000000000000028
> R09: 0000000000000016
> [Tue Jan 14 10:05:08 2020] R10: 0000000000000036 R11: ffffe94984224500
> R12: ffffa0c788914a40
> [Tue Jan 14 10:05:08 2020] R13: ffffffffc08d7da0 R14: ffffa0c788914a18
> R15: ffffa0c788914a78
> [Tue Jan 14 10:05:08 2020] FS:  0000000000000000(0000)
> GS:ffffa0d2cfbc0000(0000) knlGS:0000000000000000
> [Tue Jan 14 10:05:08 2020] CS:  0010 DS: 0000 ES: 0000 CR0:
> 0000000080050033
> [Tue Jan 14 10:05:08 2020] CR2: 000055f0404d5e40 CR3: 0000001813010000
> CR4: 00000000000627e0
> [Tue Jan 14 10:05:08 2020] Call Trace:
> [Tue Jan 14 10:05:08 2020]  [<ffffffffb0193055>] sort+0x1c5/0x260
> [Tue Jan 14 10:05:08 2020]  [<ffffffffb0192e60>] ? u32_swap+0x10/0x10
> [Tue Jan 14 10:05:08 2020]  [<ffffffffc08d807b>]
> build_snap_context+0x12b/0x290 [ceph]
> [Tue Jan 14 10:05:08 2020]  [<ffffffffc08d820c>]
> rebuild_snap_realms+0x2c/0x90 [ceph]
> [Tue Jan 14 10:05:08 2020]  [<ffffffffc08d822b>]
> rebuild_snap_realms+0x4b/0x90 [ceph]
> [Tue Jan 14 10:05:08 2020]  [<ffffffffc08d91fc>]
> ceph_update_snap_trace+0x3ec/0x530 [ceph]
> [Tue Jan 14 10:05:08 2020]  [<ffffffffc08e2239>]
> handle_reply+0x359/0xc60 [ceph]
> [Tue Jan 14 10:05:08 2020]  [<ffffffffc08e48ba>] dispatch+0x11a/0xb00
> [ceph]
> [Tue Jan 14 10:05:08 2020]  [<ffffffffb042e56a>] ?
> kernel_recvmsg+0x3a/0x50
> [Tue Jan 14 10:05:08 2020]  [<ffffffffc05fcff4>] try_read+0x544/0x1300
> [libceph]
> [Tue Jan 14 10:05:08 2020]  [<ffffffffafee13ce>] ?
> account_entity_dequeue+0xae/0xd0
> [Tue Jan 14 10:05:08 2020]  [<ffffffffafee4d5c>] ?
> dequeue_entity+0x11c/0x5e0
> [Tue Jan 14 10:05:08 2020]  [<ffffffffb042e417>] ?
> kernel_sendmsg+0x37/0x50
> [Tue Jan 14 10:05:08 2020]  [<ffffffffc05fdfb4>]
> ceph_con_workfn+0xe4/0x1530 [libceph]
> [Tue Jan 14 10:05:08 2020]  [<ffffffffb057f568>] ?
> __schedule+0x448/0x9c0
> [Tue Jan 14 10:05:08 2020]  [<ffffffffafebe21f>]
> process_one_work+0x17f/0x440
> [Tue Jan 14 10:05:08 2020]  [<ffffffffafebf336>]
> worker_thread+0x126/0x3c0
> [Tue Jan 14 10:05:08 2020]  [<ffffffffafebf210>] ?
> manage_workers.isra.26+0x2a0/0x2a0
> [Tue Jan 14 10:05:08 2020]  [<ffffffffafec61f1>] kthread+0xd1/0xe0
> [Tue Jan 14 10:05:08 2020]  [<ffffffffafec6120>] ?
> insert_kthread_work+0x40/0x40
> [Tue Jan 14 10:05:08 2020]  [<ffffffffb058cd37>]
> ret_from_fork_nospec_begin+0x21/0x21
> [Tue Jan 14 10:05:08 2020]  [<ffffffffafec6120>] ?
> insert_kthread_work+0x40/0x40
>
>
>
> [Tue Jan 14 10:05:08 2020] CPU: 7 PID: 2909400 Comm: kworker/7:0 Not
> tainted 3.10.0-1062.4.3.el7.x86_64 #1
>
> [Tue Jan 14 10:05:08 2020] Workqueue: ceph-msgr ceph_con_workfn
> [libceph]
> [Tue Jan 14 10:05:08 2020] task: ffffa0d2cb9db150 ti: ffffa0d3040f0000
> task.ti: ffffa0d3040f0000
> [Tue Jan 14 10:05:08 2020] RIP: 0010:[<ffffffffb0192200>]
> [<ffffffffb0192200>] __x86_indirect_thunk_rax+0x0/0x20
> [Tue Jan 14 10:05:08 2020] RSP: 0018:ffffa0d3040f3a28  EFLAGS: 00000286
> [Tue Jan 14 10:05:08 2020] RAX: ffffffffb0192e60 RBX: 0000000000000010
> RCX: 0000000000000010
> [Tue Jan 14 10:05:08 2020] RDX: 0000000000000008 RSI: ffffa0d0e108b828
> RDI: ffffa0d0e108b818
> [Tue Jan 14 10:05:08 2020] RBP: ffffa0d3040f3a98 R08: 0000000000000000
> R09: 0000000000000016
> [Tue Jan 14 10:05:08 2020] R10: 0000000000000036 R11: ffffe949a9842280
> R12: ffffa0d0e108b818
> [Tue Jan 14 10:05:08 2020] R13: ffffffffc08d7da0 R14: ffffa0d0e108b818
> R15: ffffa0d0e108b828
> [Tue Jan 14 10:05:08 2020] FS:  0000000000000000(0000)
> GS:ffffa0d2cfbc0000(0000) knlGS:0000000000000000
> [Tue Jan 14 10:05:08 2020] CS:  0010 DS: 0000 ES: 0000 CR0:
> 0000000080050033
> [Tue Jan 14 10:05:08 2020] CR2: 000055f0404d5e40 CR3: 0000001813010000
> CR4: 00000000000627e0
> [Tue Jan 14 10:05:08 2020] Call Trace:
> [Tue Jan 14 10:05:08 2020]  [<ffffffffb0193055>] ? sort+0x1c5/0x260
> [Tue Jan 14 10:05:08 2020]  [<ffffffffb0192e60>] ? u32_swap+0x10/0x10
> [Tue Jan 14 10:05:08 2020]  [<ffffffffc08d807b>]
> build_snap_context+0x12b/0x290 [ceph]
> [Tue Jan 14 10:05:08 2020]  [<ffffffffc08d820c>]
> rebuild_snap_realms+0x2c/0x90 [ceph]
> [Tue Jan 14 10:05:08 2020]  [<ffffffffc08d822b>]
> rebuild_snap_realms+0x4b/0x90 [ceph]
> [Tue Jan 14 10:05:08 2020]  [<ffffffffc08d91fc>]
> ceph_update_snap_trace+0x3ec/0x530 [ceph]
> [Tue Jan 14 10:05:08 2020]  [<ffffffffc08e2239>]
> handle_reply+0x359/0xc60 [ceph]
> [Tue Jan 14 10:05:08 2020]  [<ffffffffc08e48ba>] dispatch+0x11a/0xb00
> [ceph]
> [Tue Jan 14 10:05:08 2020]  [<ffffffffb042e56a>] ?
> kernel_recvmsg+0x3a/0x50
> [Tue Jan 14 10:05:08 2020]  [<ffffffffc05fcff4>] try_read+0x544/0x1300
> [libceph]
> [Tue Jan 14 10:05:08 2020]  [<ffffffffafee13ce>] ?
> account_entity_dequeue+0xae/0xd0
> [Tue Jan 14 10:05:08 2020]  [<ffffffffafee4d5c>] ?
> dequeue_entity+0x11c/0x5e0
> [Tue Jan 14 10:05:08 2020]  [<ffffffffb042e417>] ?
> kernel_sendmsg+0x37/0x50
> [Tue Jan 14 10:05:08 2020]  [<ffffffffc05fdfb4>]
> ceph_con_workfn+0xe4/0x1530 [libceph]
> [Tue Jan 14 10:05:08 2020]  [<ffffffffb057f568>] ?
> __schedule+0x448/0x9c0
> [Tue Jan 14 10:05:08 2020]  [<ffffffffafebe21f>]
> process_one_work+0x17f/0x440
> [Tue Jan 14 10:05:08 2020]  [<ffffffffafebf336>]
> worker_thread+0x126/0x3c0
> [Tue Jan 14 10:05:08 2020]  [<ffffffffafebf210>] ?
> manage_workers.isra.26+0x2a0/0x2a0
> [Tue Jan 14 10:05:08 2020]  [<ffffffffafec61f1>] kthread+0xd1/0xe0
> [Tue Jan 14 10:05:08 2020]  [<ffffffffafec6120>] ?
> insert_kthread_work+0x40/0x40
> [Tue Jan 14 10:05:08 2020]  [<ffffffffb058cd37>]
> ret_from_fork_nospec_begin+0x21/0x21
> [Tue Jan 14 10:05:08 2020]  [<ffffffffafec6120>] ?
> insert_kthread_work+0x40/0x40

Hi Marc,

It's busy processing snapshot contexts, I think it's more or less
the same issue that you reported in "ceph node crashed with these
errors "kernel: ceph: build_snap_context" (maybe now it is urgent?)".

Did "ceph: build_snap_context 100020c9287 ffff911a9a26bd00 fail -12"
errors go away?

How many snapshots do you have now?

Did you file a tracker ticket?  It's better to have all details in one
place rather than scattered across different threads on the list.

Adding Jeff and Zheng, who can take a closer look.

Thanks,

                Ilya
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux