14.2.20: Strange monitor problem eating 100% CPU

Rainer Krienke <krienke@xxxxxxxxxxxxxx> · Tue, 4 May 2021 16:10:15 +0200

Hello,

I am playing around with a test ceph 14.2.20 cluster. The cluster 
consists of 4 VMs, each VM has 2 OSDs. The first three VMs vceph1, 
vceph2 and vceph3 are monitors. vceph1 is also mgr.

What I did was quite simple. The cluster is in the state HEALTHY:

vceph2: systemctl stop ceph-osd@2
# let ceph repair until ceph -s reports cluster is healthy again

vceph2: systemctl start ceph-osd@2  # @ 15:39:15, for the logs
# cluster reports in cephs -s that 8 OSDs are up and in, then
# starts rebalance osd.2

vceph2:  ceph -s   # hangs forever also if executed on vceph3 or 4
# mon on vceph1 eats 100% CPU permanently, the other mons ~0 %CPU

vceph1: systemctl stop ceph-mon@vceph1 # wait ~30 sec to terminate
vceph1: systemctl start ceph-mon@vceph1 # Everything is OK again

I posted the mon-log to: https://cloud.uni-koblenz.de/s/t8tWjWFAobZb5Hy

Strange enough if I set "debug mon 20" before starting the experiment 
this  bug does not show up. I also tried the very same procedure on the 
same cluster updated to 15.2.11 but I was unable to reproduce this bug 
in this ceph version.

Thanks
Rainer
--
Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse  1
56070 Koblenz, Web: http://www.uni-koblenz.de/~krienke, Tel: +49261287 1312
PGP: http://www.uni-koblenz.de/~krienke/mypgp.html,     Fax: +49261287 
1001312
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx