Re: Can't add a ceph-mon to existing large cluster

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 5 Mar 2020, Dan van der Ster wrote:
> Hi all,
> 
> There's something broken in our env when we try to add new mons to
> existing clusters, confirmed on two clusters running mimic and
> nautilus. It's basically this issue
> https://tracker.ceph.com/issues/42830
> 
> In case something is wrong with our puppet manifests, I'm trying to
> doing it manually.
> 
> First we --mkfs the mon and start it, but as soon as the new mon
> starts synchronizing, the existing leader becomes unresponsive and an
> election is triggered.
> 
> Here's exactly what I'm doing:
> 
> # cd /var/lib/ceph/tmp/
> # scp cephmon1:/var/lib/ceph/tmp/keyring.mon.cephmon1 keyring.mon.cephmon4
> # ceph mon getmap -o monmap
> # ceph-mon --mkfs -i cephmon4 --monmap monmap --keyrin
> keyring.mon.cephmon4 --setuser ceph --setgroup ceph
> # vi /etc/ceph/ceph.conf <add the new mon to ceph.conf like this>
> [mon.cephmon4]
> host = cephmon4
> mon addr = a.b.c.d:6790
> # systemctl start ceph-mon@cephmon4
> 
> The log file on the new mon shows it start synchronizing, then
> immediately the CPU usage on the leader goes to 100% and elections
> start happening, and ceph health shows mon slow ops. perf top of the
> ceph-mon with 100% CPU is shown below [1].
> On a small nautilus cluster, the new mon gets added withing a minute
> or so (but not cleanly -- the leader is unresponsive for quite awhile
> until the new mon joins). debug_mon=20 on the leader doesn't show
> anything very interesting.
> On our large mimic cluster we tried waiting more than 10 minutes --
> suffering through several mon elections and 100% usage bouncing around
> between leaders -- until we gave up.
> 
> I'm pulling my hair out a bit on this -- it's really weird!

Can you try running a rocksdb compaction on the existing mons before 
adding the new one and see if that helps?

s

> 
> Did anyone add a new mon to an existing large cluster recently, and it
> went smoothly?
> 
> Cheers, Dan
> 
> [1]
> 
>   15.12%  ceph-mon                             [.]
> MonitorDBStore::Transaction::encode
>    8.95%  libceph-common.so.0                  [.]
> ceph::buffer::v14_2_0::ptr::append
>    8.68%  libceph-common.so.0                  [.]
> ceph::buffer::v14_2_0::list::append
>    7.69%  libceph-common.so.0                  [.]
> ceph::buffer::v14_2_0::ptr::release
>    5.86%  libceph-common.so.0                  [.]
> ceph::buffer::v14_2_0::ptr::ptr
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> 
> 
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux