Hi all, Here our story. Perhaps some day could help anyone. Be in mind that English is not my native language so sorry if I make mistakes.
Our system is: Ceph 0.87.2 (Giant), with 5 OSD servers (116 1TB osd total) and 3 monitors.
After a nightmare time, we initially "correct" ceph monitor problems. But first, some additional info and a TimeLine (Dates are in dd-mm-yyyy format). At the beginning, we had 3 working monitors and we were happy. (MON01, MON02 and MON03) Wednesday 05/06/2019: After a SAI outage on B line, we found in MON03 ceph-mon process does not clean start: after initiating ceph-mon, ceph-create-keys does not contact with daemon. We work with
quorum with 2 monitors, and has access to Ceph Storage. Thusrday 06/06/2019 We have the "good" idea to add a new mon into mon cluster... this was our first error. After "ceph-deploy mon mon.mon04" command, new monitor activates (4 monitor in cluster)
but... only 2 monitors had data (mon01 and mon02) and this is equal no quorum. As no quorum, mon04 does not contact mon cluster. We lost "ceph" commands as no monitor can held quorum, so any ceph related command works.
Fortunately, storage "works" and active openstack instance were not affected (we do not know why it works, but it does). At this point, we made some mon02 and mon04 restart.
I do not remember order, but our priority was recover mon quorum :( After mon02 restart, repeats same behaviour than mon03: ceph-create-keys does not contact deamon. We left cluster "working" with mon01 in electing status and mon04 in waiting to add to cluster. Friday 07/06/2019 We prepare a new monitor computer (mon05) to integrate on Mon's cluster. Our idea was "If we develop mon05 and integrate to mon cluster, this could work as 3 mon's up will make
quorum..." We done a "ceph-mon -i mon05 --mkfs --monmap /root/monmap-mon04-original --keyring /root/keyring" with data extracted from mon04 (keyring and monmap) and started it with ceph-mon
-i mon05 -c /etc/ceph/ceph.conf --cluster ceph"... Yes, it works. We were very happy because we recover monitors quorum, we have ceph related commands and all works.... but only 10 minutes :( And here nightmare began. Slow request began to increase. We do not know why, so initialy we restart affected osd. After 3 hours restarting osd's we think " this is not normal.
What's happening here?" Osd logs show some "key errors" contacting others osd's and monitors. Really we were in trouble, because openstack cinder can't contact rbd volumes, rbd commands shows a lot
of key errors when readind pool volumes. Really all system goes down, so no write or read was made to storage.... We tried to restart Mon's, restart openstack serices, restart osd's (one at time), check NTP (no errors here) check iptables check anything that
colul be checkered... with no success. We remake monitors 2 and 3 formating ceph-mon data in the same way we do with mon05, so we have a 5 monitors cluster, but key errors does no disappears. And when no more things we can do... we use a Spanish sentence: "De perdidos, al rio" (direct translation: From the lost, to the river i. e. when nothing works and all is lost,
you can try anything you want) So...we think "the only monitor we never touch is mon01 (the active monitor) so if we reset it?"
Thought and done. We stop mon01. Monitor quorum was transferred to Mon02, but slow request were there. We restart ceph-mon on mon01... but again, ceph-create-keys does not contact
daemon. We lost Mon01. So mon02 to mon05 was working in quorum. And, suddenly, storage began to recover: slow request decrease, rbd commands works, osd logs show normal info (any key related error) and 10 minutes after mon01 down, all cluster
was active and clean. After this story, we have some "things to be in mind" we want to share: - Always have more than 1 "initial-monitors" defines in ceph. We have only one, and if it is not active, the other monitors does not start (after storage recovery, we stop mon05
and it has status "probing" trying to contact mon01, which is down) - Have a copy of monitors keyring and monmap. This is the safe way to add manually monitors to cluster when no ceph related commands works - Be careful adding or removing monitors in a not healthy monitor cluster: If they lost quorum you will be into problems. Now, we have some work to do: - Remove mon01 with "ceph mon destroy mon01": we want to remove it from monmap, but is the "initial monitor" so we do not know if it is safe to do.
- Clean and "format" monitor data (as we do on mon02 and mon03) for mon01, but we have the same situation: is safe to do when is the "initial mon"? - Modify monmap, deleting mon01, and inyect it om mon05, but... what happens when we delete "initial mon" from monmap? Is safe? As you can understand, we have now a working storage but in a critical situation, because any problem with monitors could bring it again unstable... And there is still 15 TB
of data inside. If someone has any "safe" idea to share....
will be appreciated. Regards
|
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com