Re: 14.2.20: Strange monitor problem eating 100% CPU

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

This sounds a lot like the negative progress bug we just found last
week: https://tracker.ceph.com/issues/50591

That bug makes the mon enter a very long loop rendering a progress bar
if the mgr incorrectly sends a message to the mon that the progress is
negative.
Octopus and later don't have this loop so don't have this bug.

Could you set debug_mgr = 4/5 then check the mgr log for something like this?

    mgr[progress] Updated progress to -0.333333333333 (Rebalancing
after osd... marked in)

Cheers, Dan


On Tue, May 4, 2021 at 4:10 PM Rainer Krienke <krienke@xxxxxxxxxxxxxx> wrote:
>
> Hello,
>
> I am playing around with a test ceph 14.2.20 cluster. The cluster
> consists of 4 VMs, each VM has 2 OSDs. The first three VMs vceph1,
> vceph2 and vceph3 are monitors. vceph1 is also mgr.
>
> What I did was quite simple. The cluster is in the state HEALTHY:
>
> vceph2: systemctl stop ceph-osd@2
> # let ceph repair until ceph -s reports cluster is healthy again
>
> vceph2: systemctl start ceph-osd@2  # @ 15:39:15, for the logs
> # cluster reports in cephs -s that 8 OSDs are up and in, then
> # starts rebalance osd.2
>
> vceph2:  ceph -s   # hangs forever also if executed on vceph3 or 4
> # mon on vceph1 eats 100% CPU permanently, the other mons ~0 %CPU
>
> vceph1: systemctl stop ceph-mon@vceph1 # wait ~30 sec to terminate
> vceph1: systemctl start ceph-mon@vceph1 # Everything is OK again
>
> I posted the mon-log to: https://cloud.uni-koblenz.de/s/t8tWjWFAobZb5Hy
>
> Strange enough if I set "debug mon 20" before starting the experiment
> this  bug does not show up. I also tried the very same procedure on the
> same cluster updated to 15.2.11 but I was unable to reproduce this bug
> in this ceph version.
>
> Thanks
> Rainer
> --
> Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse  1
> 56070 Koblenz, Web: http://www.uni-koblenz.de/~krienke, Tel: +49261287 1312
> PGP: http://www.uni-koblenz.de/~krienke/mypgp.html,     Fax: +49261287
> 1001312
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux