On 3/5/20 3:22 PM, Sage Weil wrote: > On Thu, 5 Mar 2020, Dan van der Ster wrote: >> Hi all, >> >> There's something broken in our env when we try to add new mons to >> existing clusters, confirmed on two clusters running mimic and >> nautilus. It's basically this issue >> https://tracker.ceph.com/issues/42830 >> >> In case something is wrong with our puppet manifests, I'm trying to >> doing it manually. >> >> First we --mkfs the mon and start it, but as soon as the new mon >> starts synchronizing, the existing leader becomes unresponsive and an >> election is triggered. >> >> Here's exactly what I'm doing: >> >> # cd /var/lib/ceph/tmp/ >> # scp cephmon1:/var/lib/ceph/tmp/keyring.mon.cephmon1 keyring.mon.cephmon4 >> # ceph mon getmap -o monmap >> # ceph-mon --mkfs -i cephmon4 --monmap monmap --keyrin >> keyring.mon.cephmon4 --setuser ceph --setgroup ceph >> # vi /etc/ceph/ceph.conf <add the new mon to ceph.conf like this> >> [mon.cephmon4] >> host = cephmon4 >> mon addr = a.b.c.d:6790 >> # systemctl start ceph-mon@cephmon4 >> >> The log file on the new mon shows it start synchronizing, then >> immediately the CPU usage on the leader goes to 100% and elections >> start happening, and ceph health shows mon slow ops. perf top of the >> ceph-mon with 100% CPU is shown below [1]. >> On a small nautilus cluster, the new mon gets added withing a minute >> or so (but not cleanly -- the leader is unresponsive for quite awhile >> until the new mon joins). debug_mon=20 on the leader doesn't show >> anything very interesting. >> On our large mimic cluster we tried waiting more than 10 minutes -- >> suffering through several mon elections and 100% usage bouncing around >> between leaders -- until we gave up. >> >> I'm pulling my hair out a bit on this -- it's really weird! > > Can you try running a rocksdb compaction on the existing mons before > adding the new one and see if that helps? I can chime in here: I had this happen to a customer as well. Compact did not work. Some background: 5 Monitors and the DBs were ~350M in size. They upgraded one MON from 13.2.6 to 13.2.8 and that caused one MON (sync source) to eat 100% CPU. The logs showed that the upgraded MON (which was restarted) was in the synchronizing state. Because they had 5 MONs they now had 3 left so the cluster kept running. I left this for about 5 minutes, but it never synced. I tried a compact, didn't work either. Eventually I stopped one MON, tarballed it's database and used that to bring back the MON which was upgraded to 13.2.8 That work without any hickups. The MON joined again within a few seconds. Wido > > s > >> >> Did anyone add a new mon to an existing large cluster recently, and it >> went smoothly? >> >> Cheers, Dan >> >> [1] >> >> 15.12% ceph-mon [.] >> MonitorDBStore::Transaction::encode >> 8.95% libceph-common.so.0 [.] >> ceph::buffer::v14_2_0::ptr::append >> 8.68% libceph-common.so.0 [.] >> ceph::buffer::v14_2_0::list::append >> 7.69% libceph-common.so.0 [.] >> ceph::buffer::v14_2_0::ptr::release >> 5.86% libceph-common.so.0 [.] >> ceph::buffer::v14_2_0::ptr::ptr >> _______________________________________________ >> ceph-users mailing list -- ceph-users@xxxxxxx >> To unsubscribe send an email to ceph-users-leave@xxxxxxx >> >> > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx