Re: Kworker 100% with ceph-msgr (after upgrade to 14.2.6?)

Marc <Marc@xxxxxxxxxxxxxxxxx> · Wed, 8 Sep 2021 18:43:55 +0000

Hi Samuel,

I am not really that fortunate to be in an environment that would allow to trace and resolve incidents with a ceph production cluster quickly before clients start complaining.
So I am more or less forced to choose a path that is least likely to fail. Rbd has been longer around than cephfs, thus more tested, thus less likely to have some strange bugs.
The cephfs (kernel mount) just failed me (even when trying to keep the environment as standard as possible with just one mds and normal replication) after a ceph update, I do not want to go through that again. And since I currently do not really have a need to use the cephfs, I rather stick to rbd.

> 
> Is there specific reason for "not to use the cephfs for important
> things" ? What the  major concerns then?
> 
> 
> thanks,
> 
> 
> samuel
> 
> ________________________________
> 
> huxiaoyu@xxxxxxxxxxxx
> 
> 
> 	From: Marc <mailto:Marc@xxxxxxxxxxxxxxxxx>
> 	Date: 2021-09-07 20:37
> 	To: Frank Schilder <mailto:frans@xxxxxx>
> 	CC: ceph-users <mailto:ceph-users@xxxxxxx>
> 	Subject:  Re: Kworker 100% with ceph-msgr (after
> upgrade to 14.2.6?)
> 	Not really, I am currently using the fuse mount and decided not to
> use the cephfs for important things. I can remember that I got this
> after some update. With this update there was something you had to
> 'enable' which I did not do straightaway. But after enabling that, I
> started having these issues.
> 
> 	>
> 	> did you ever get a proper solution for this problem? We are
> having
> 	> exactly the same issue, having snapshots on a file system leads
> to
> 	> incredible performance degradation. I'm reporting some
> observations here
> 	> (latest reply):
> 	>
> 	> https://lists.ceph.io/hyperkitty/list/ceph-
> 	>
> users@xxxxxxx/thread/HKEBXXRMX5WA5Y6JFM34WFPMWTCMPFCG/#6S5GTKGGBI2Y3QE4E
> 	> 5XJJY2KSSLLX64H
> 	>
> 	> The problem is almost certainly that the ceph kernel client
> executes
> 	> ceph_update_snap_trace over and over again over the exact same
> data. I
> 	> see that the execution time of ceph fs IO increases roughly with
> the
> 	> number of snapshots present, N snapshots means ~N times slower.
> 	>
> 	> I'm testing this on kernel version 5.9.9-1.el7.elrepo.x86_64. It
> is even
> 	> worse on older kernels.
> 	>
> 	> Best regards,
> 	> =================
> 	> Frank Schilder
> 	> AIT Risø Campus
> 	> Bygning 109, rum S14
> 	>
> 	> ________________________________________
> 	> From: Marc Roos <M.Roos@xxxxxxxxxxxxxxxxx>
> 	> Sent: 16 March 2020 00:35
> 	> To: idryomov
> 	> Cc: ceph-users; jlayton
> 	> Subject:  Re: Kworker 100% with ceph-msgr (after
> upgrade to
> 	> 14.2.6?)
> 	>
> 	> I have changed an osd node with this 5.5 kernel-ml, and had still
> a high
> 	> load with this kworker. I moved the backup rsync script to a vm
> and also
> 	> there the high load persists.
> 	>
> 	> This happens already with one rsync session. Before on Luminous I
> did
> 	> not have such issues at all having 4 concurrent rsync's running.
> So I
> 	> would conclude it has nothing to do with rsync.
> 	>
> 	>
> 	>
> 	>
> 	> -----Original Message-----
> 	> From: Ilya Dryomov [mailto:idryomov@xxxxxxxxx]
> 	> Sent: 14 January 2020 11:45
> 	> To: Marc Roos
> 	> Cc: ceph-users; Jeff Layton; Yan, Zheng
> 	> Subject: Re:  Kworker 100% with ceph-msgr (after
> upgrade to
> 	> 14.2.6?)
> 	>
> 	> On Tue, Jan 14, 2020 at 10:31 AM Marc Roos <M.Roos@f1-
> outsourcing.eu>
> 	> wrote:
> 	> >
> 	> >
> 	> > I think this is new since I upgraded to 14.2.6. kworker/7:3
> 100%
> 	> >
> 	> > [@~]# echo l > /proc/sysrq-trigger
> 	> >
> 	> > [Tue Jan 14 10:05:08 2020] CPU: 7 PID: 2909400 Comm:
> kworker/7:0 Not
> 	> > tainted 3.10.0-1062.4.3.el7.x86_64 #1
> 	> >
> 	> > [Tue Jan 14 10:05:08 2020] Workqueue: ceph-msgr ceph_con_workfn
> 	> > [libceph] [Tue Jan 14 10:05:08 2020] task: ffffa0d2cb9db150 ti:
> 	> > ffffa0d3040f0000
> 	> > task.ti: ffffa0d3040f0000
> 	> > [Tue Jan 14 10:05:08 2020] RIP: 0010:[<ffffffffb0192e7e>]
> 	> > [<ffffffffb0192e7e>] generic_swap+0x1e/0x30 [Tue Jan 14
> 10:05:08 2020]
> 	>
> 	> > RSP: 0018:ffffa0d3040f3a20  EFLAGS: 00000206 [Tue Jan 14
> 10:05:08
> 	> > 2020] RAX: 0000000000000072 RBX: 0000000000000060
> 	> > RCX: 0000000000000072
> 	> > [Tue Jan 14 10:05:08 2020] RDX: 0000000000000006 RSI:
> ffffa0c788914a7a
> 	> > RDI: ffffa0c788914a42
> 	> > [Tue Jan 14 10:05:08 2020] RBP: ffffa0d3040f3a20 R08:
> 0000000000000028
> 	> > R09: 0000000000000016
> 	> > [Tue Jan 14 10:05:08 2020] R10: 0000000000000036 R11:
> ffffe94984224500
> 	> > R12: ffffa0c788914a40
> 	> > [Tue Jan 14 10:05:08 2020] R13: ffffffffc08d7da0 R14:
> ffffa0c788914a18
> 	> > R15: ffffa0c788914a78
> 	> > [Tue Jan 14 10:05:08 2020] FS:  0000000000000000(0000)
> 	> > GS:ffffa0d2cfbc0000(0000) knlGS:0000000000000000 [Tue Jan 14
> 10:05:08
> 	> > 2020] CS:  0010 DS: 0000 ES: 0000 CR0:
> 	> > 0000000080050033
> 	> > [Tue Jan 14 10:05:08 2020] CR2: 000055f0404d5e40 CR3:
> 0000001813010000
> 	> > CR4: 00000000000627e0
> 	> > [Tue Jan 14 10:05:08 2020] Call Trace:
> 	> > [Tue Jan 14 10:05:08 2020]  [<ffffffffb0193055>]
> sort+0x1c5/0x260 [Tue
> 	>
> 	> > Jan 14 10:05:08 2020]  [<ffffffffb0192e60>] ?
> u32_swap+0x10/0x10 [Tue
> 	> > Jan 14 10:05:08 2020]  [<ffffffffc08d807b>]
> 	> > build_snap_context+0x12b/0x290 [ceph] [Tue Jan 14 10:05:08
> 2020]
> 	> > [<ffffffffc08d820c>] rebuild_snap_realms+0x2c/0x90 [ceph] [Tue
> Jan 14
> 	> > 10:05:08 2020]  [<ffffffffc08d822b>]
> rebuild_snap_realms+0x4b/0x90
> 	> > [ceph] [Tue Jan 14 10:05:08 2020]  [<ffffffffc08d91fc>]
> 	> > ceph_update_snap_trace+0x3ec/0x530 [ceph] [Tue Jan 14 10:05:08
> 2020]
> 	> > [<ffffffffc08e2239>] handle_reply+0x359/0xc60 [ceph] [Tue Jan
> 14
> 	> > 10:05:08 2020]  [<ffffffffc08e48ba>] dispatch+0x11a/0xb00
> [ceph] [Tue
> 	> > Jan 14 10:05:08 2020]  [<ffffffffb042e56a>] ?
> 	> > kernel_recvmsg+0x3a/0x50
> 	> > [Tue Jan 14 10:05:08 2020]  [<ffffffffc05fcff4>]
> try_read+0x544/0x1300
> 	>
> 	> > [libceph] [Tue Jan 14 10:05:08 2020]  [<ffffffffafee13ce>] ?
> 	> > account_entity_dequeue+0xae/0xd0
> 	> > [Tue Jan 14 10:05:08 2020]  [<ffffffffafee4d5c>] ?
> 	> > dequeue_entity+0x11c/0x5e0
> 	> > [Tue Jan 14 10:05:08 2020]  [<ffffffffb042e417>] ?
> 	> > kernel_sendmsg+0x37/0x50
> 	> > [Tue Jan 14 10:05:08 2020]  [<ffffffffc05fdfb4>]
> 	> > ceph_con_workfn+0xe4/0x1530 [libceph] [Tue Jan 14 10:05:08
> 2020]
> 	> > [<ffffffffb057f568>] ?
> 	> > __schedule+0x448/0x9c0
> 	> > [Tue Jan 14 10:05:08 2020]  [<ffffffffafebe21f>]
> 	> > process_one_work+0x17f/0x440 [Tue Jan 14 10:05:08 2020]
> 	> > [<ffffffffafebf336>] worker_thread+0x126/0x3c0 [Tue Jan 14
> 10:05:08
> 	> > 2020]  [<ffffffffafebf210>] ?
> 	> > manage_workers.isra.26+0x2a0/0x2a0
> 	> > [Tue Jan 14 10:05:08 2020]  [<ffffffffafec61f1>]
> kthread+0xd1/0xe0
> 	> > [Tue Jan 14 10:05:08 2020]  [<ffffffffafec6120>] ?
> 	> > insert_kthread_work+0x40/0x40
> 	> > [Tue Jan 14 10:05:08 2020]  [<ffffffffb058cd37>]
> 	> > ret_from_fork_nospec_begin+0x21/0x21
> 	> > [Tue Jan 14 10:05:08 2020]  [<ffffffffafec6120>] ?
> 	> > insert_kthread_work+0x40/0x40
> 	> >
> 	> >
> 	> >
> 	> > [Tue Jan 14 10:05:08 2020] CPU: 7 PID: 2909400 Comm:
> kworker/7:0 Not
> 	> > tainted 3.10.0-1062.4.3.el7.x86_64 #1
> 	> >
> 	> > [Tue Jan 14 10:05:08 2020] Workqueue: ceph-msgr ceph_con_workfn
> 	> > [libceph] [Tue Jan 14 10:05:08 2020] task: ffffa0d2cb9db150 ti:
> 	> > ffffa0d3040f0000
> 	> > task.ti: ffffa0d3040f0000
> 	> > [Tue Jan 14 10:05:08 2020] RIP: 0010:[<ffffffffb0192200>]
> 	> > [<ffffffffb0192200>] __x86_indirect_thunk_rax+0x0/0x20 [Tue Jan
> 14
> 	> > 10:05:08 2020] RSP: 0018:ffffa0d3040f3a28  EFLAGS: 00000286
> [Tue Jan
> 	> > 14 10:05:08 2020] RAX: ffffffffb0192e60 RBX: 0000000000000010
> 	> > RCX: 0000000000000010
> 	> > [Tue Jan 14 10:05:08 2020] RDX: 0000000000000008 RSI:
> ffffa0d0e108b828
> 	> > RDI: ffffa0d0e108b818
> 	> > [Tue Jan 14 10:05:08 2020] RBP: ffffa0d3040f3a98 R08:
> 0000000000000000
> 	> > R09: 0000000000000016
> 	> > [Tue Jan 14 10:05:08 2020] R10: 0000000000000036 R11:
> ffffe949a9842280
> 	> > R12: ffffa0d0e108b818
> 	> > [Tue Jan 14 10:05:08 2020] R13: ffffffffc08d7da0 R14:
> ffffa0d0e108b818
> 	> > R15: ffffa0d0e108b828
> 	> > [Tue Jan 14 10:05:08 2020] FS:  0000000000000000(0000)
> 	> > GS:ffffa0d2cfbc0000(0000) knlGS:0000000000000000 [Tue Jan 14
> 10:05:08
> 	> > 2020] CS:  0010 DS: 0000 ES: 0000 CR0:
> 	> > 0000000080050033
> 	> > [Tue Jan 14 10:05:08 2020] CR2: 000055f0404d5e40 CR3:
> 0000001813010000
> 	> > CR4: 00000000000627e0
> 	> > [Tue Jan 14 10:05:08 2020] Call Trace:
> 	> > [Tue Jan 14 10:05:08 2020]  [<ffffffffb0193055>] ?
> sort+0x1c5/0x260
> 	> > [Tue Jan 14 10:05:08 2020]  [<ffffffffb0192e60>] ?
> u32_swap+0x10/0x10
> 	> > [Tue Jan 14 10:05:08 2020]  [<ffffffffc08d807b>]
> 	> > build_snap_context+0x12b/0x290 [ceph] [Tue Jan 14 10:05:08
> 2020]
> 	> > [<ffffffffc08d820c>] rebuild_snap_realms+0x2c/0x90 [ceph] [Tue
> Jan 14
> 	> > 10:05:08 2020]  [<ffffffffc08d822b>]
> rebuild_snap_realms+0x4b/0x90
> 	> > [ceph] [Tue Jan 14 10:05:08 2020]  [<ffffffffc08d91fc>]
> 	> > ceph_update_snap_trace+0x3ec/0x530 [ceph] [Tue Jan 14 10:05:08
> 2020]
> 	> > [<ffffffffc08e2239>] handle_reply+0x359/0xc60 [ceph] [Tue Jan
> 14
> 	> > 10:05:08 2020]  [<ffffffffc08e48ba>] dispatch+0x11a/0xb00
> [ceph] [Tue
> 	> > Jan 14 10:05:08 2020]  [<ffffffffb042e56a>] ?
> 	> > kernel_recvmsg+0x3a/0x50
> 	> > [Tue Jan 14 10:05:08 2020]  [<ffffffffc05fcff4>]
> try_read+0x544/0x1300
> 	>
> 	> > [libceph] [Tue Jan 14 10:05:08 2020]  [<ffffffffafee13ce>] ?
> 	> > account_entity_dequeue+0xae/0xd0
> 	> > [Tue Jan 14 10:05:08 2020]  [<ffffffffafee4d5c>] ?
> 	> > dequeue_entity+0x11c/0x5e0
> 	> > [Tue Jan 14 10:05:08 2020]  [<ffffffffb042e417>] ?
> 	> > kernel_sendmsg+0x37/0x50
> 	> > [Tue Jan 14 10:05:08 2020]  [<ffffffffc05fdfb4>]
> 	> > ceph_con_workfn+0xe4/0x1530 [libceph] [Tue Jan 14 10:05:08
> 2020]
> 	> > [<ffffffffb057f568>] ?
> 	> > __schedule+0x448/0x9c0
> 	> > [Tue Jan 14 10:05:08 2020]  [<ffffffffafebe21f>]
> 	> > process_one_work+0x17f/0x440 [Tue Jan 14 10:05:08 2020]
> 	> > [<ffffffffafebf336>] worker_thread+0x126/0x3c0 [Tue Jan 14
> 10:05:08
> 	> > 2020]  [<ffffffffafebf210>] ?
> 	> > manage_workers.isra.26+0x2a0/0x2a0
> 	> > [Tue Jan 14 10:05:08 2020]  [<ffffffffafec61f1>]
> kthread+0xd1/0xe0
> 	> > [Tue Jan 14 10:05:08 2020]  [<ffffffffafec6120>] ?
> 	> > insert_kthread_work+0x40/0x40
> 	> > [Tue Jan 14 10:05:08 2020]  [<ffffffffb058cd37>]
> 	> > ret_from_fork_nospec_begin+0x21/0x21
> 	> > [Tue Jan 14 10:05:08 2020]  [<ffffffffafec6120>] ?
> 	> > insert_kthread_work+0x40/0x40
> 	>
> 	> Hi Marc,
> 	>
> 	> It's busy processing snapshot contexts, I think it's more or less
> the
> 	> same issue that you reported in "ceph node crashed with these
> errors
> 	> "kernel: ceph: build_snap_context" (maybe now it is urgent?)".
> 	>
> 	> Did "ceph: build_snap_context 100020c9287 ffff911a9a26bd00 fail -
> 12"
> 	> errors go away?
> 	>
> 	> How many snapshots do you have now?
> 	>
> 	> Did you file a tracker ticket?  It's better to have all details
> in one
> 	> place rather than scattered across different threads on the list.
> 	>
> 	> Adding Jeff and Zheng, who can take a closer look.
> 	>
> 	> Thanks,
> 	>
> 	>                 Ilya
> 	>
> 	> _______________________________________________
> 	> ceph-users mailing list -- ceph-users@xxxxxxx
> 	> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> 	_______________________________________________
> 	ceph-users mailing list -- ceph-users@xxxxxxx
> 	To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx