Re: [Ceph-community] Monitors not in quorum (1 of 3 live)

Lluis Arasanz i Nonell - Adam <lluis.arasanz@xxxxxxx> · Wed, 12 Jun 2019 09:44:51 +0000

Hi all,

Here our story. Perhaps some day could help anyone. Be in mind that English is not my native language so sorry if I make mistakes.

Our system is: Ceph 0.87.2 (Giant), with 5 OSD servers (116 1TB osd total) and 3 monitors.

After a nightmare time, we initially "correct" ceph monitor problems. But first, some additional info and a TimeLine (Dates are in dd-mm-yyyy format).

At the beginning, we had 3 working monitors and we were happy. (MON01, MON02 and MON03)

Wednesday 05/06/2019:
After a SAI outage on B line, we found in MON03 ceph-mon process does not clean start: after initiating ceph-mon, ceph-create-keys does not contact with daemon. We work with
 quorum with 2 monitors, and has access to Ceph Storage.

Thusrday 06/06/2019
We have the "good" idea to add a new mon into mon cluster... this was our first error. After "ceph-deploy mon mon.mon04" command, new monitor activates (4 monitor in cluster)
 but... only 2 monitors had data (mon01 and mon02) and this is equal no quorum. As no quorum, mon04 does not contact mon cluster. We lost "ceph" commands as no monitor can held quorum, so any ceph related command works.

Fortunately, storage "works" and active openstack instance were not affected (we do not know why it works, but it does). At this point, we made some mon02 and mon04 restart.
 I do not remember order, but our priority was recover mon quorum :(  After mon02 restart, repeats same behaviour than mon03: ceph-create-keys does not contact deamon.

We left cluster "working" with mon01 in electing status and mon04 in waiting to add to cluster.

Friday 07/06/2019
We prepare a new monitor computer (mon05) to integrate on Mon's cluster. Our idea was "If we develop mon05 and integrate to mon cluster, this could work as 3 mon's up will make
 quorum..."

We done a "ceph-mon -i mon05 --mkfs --monmap /root/monmap-mon04-original  --keyring /root/keyring" with data extracted from mon04 (keyring and monmap) and started it with ceph-mon
 -i mon05 -c /etc/ceph/ceph.conf --cluster ceph"...

Yes, it works. We were very happy because we recover monitors quorum, we have ceph related commands and all works.... but only 10 minutes :(

And here nightmare began.

Slow request began to increase. We do not know why, so initialy we restart affected osd. After 3 hours  restarting osd's we think " this is not normal.
What's happening here?" 

Osd logs show some "key errors" contacting others osd's and monitors. Really we were in trouble, because openstack cinder can't contact rbd volumes, rbd commands shows a lot
 of key errors when readind pool volumes. Really all system goes down, so no write or read was made to storage.... We tried to restart Mon's, restart openstack serices, restart osd's (one at time), check NTP (no errors here) check iptables check anything that
 colul be checkered...  with no success.
We remake monitors 2 and 3 formating ceph-mon data in the same way we do with mon05, so we have a 5 monitors cluster, but key errors does no disappears.

And  when no more things we can do...  we use a Spanish sentence: "De perdidos, al rio" (direct translation: From the lost, to the river i. e. when nothing works and all is lost,
 you can try anything you want) So...we think "the only monitor we never touch is mon01 (the active monitor) so if we reset it?"

Thought and done. We stop mon01. Monitor quorum was transferred to  Mon02, but slow request were there. We restart ceph-mon on mon01... but again, ceph-create-keys does not contact
 daemon. We lost Mon01. So mon02 to mon05 was working in quorum.

And, suddenly, storage began to recover: slow request decrease, rbd commands works, osd logs show normal info (any key related error) and 10 minutes after mon01 down, all cluster
 was active and clean. 

After this story, we have some "things to be in mind" we want to share:

- Always have more than 1 "initial-monitors" defines in ceph. We have only one, and if it is not active, the other monitors does not start (after storage recovery, we stop mon05
 and it has status "probing" trying to contact mon01, which is down)
- Have a copy of monitors keyring and monmap. This is the safe way to add manually monitors to cluster when no ceph related commands works
- Be careful adding or removing monitors in a not healthy monitor cluster: If they lost quorum you will be into problems.

Now, we have some work to do:
- Remove mon01 with "ceph mon destroy mon01": we want to remove it from monmap, but is the "initial monitor" so we do not know if it is safe to do.

- Clean and "format" monitor data (as we do on mon02 and mon03) for mon01, but we have the same situation: is safe to do when is the "initial mon"?
- Modify monmap, deleting mon01, and inyect it om mon05, but...  what happens when we delete "initial mon" from monmap? Is safe?

As you can understand, we have now a working storage but in a critical situation, because any problem with monitors could  bring it again unstable... And there is still 15 TB
 of data inside.

If someone has any "safe" idea to share.... 
will be appreciated.

Regards

Lluís Arasanz Nonell
•
Departamento de Sistemas 
Tel: +34 902 902 685

email:
lluis.arasanz@xxxxxxx

www.adam.es

Advertencia legal:
 La información contenida en este mensaje y/o archivo(s) adjunto(s), enviada desde OGIC INFORMATICA SLU, es confidencial/privilegiada y está destinada a ser leída sólo por la(s) persona(s) a la(s) que va dirigida. Le recordamos que sus datos han sido incorporados
 en el sistema de tratamiento de OGIC INFORMATICA SLU y que siempre y cuando se cumplan los requisitos exigidos por la normativa, usted podrá ejercer sus derechos de acceso, rectificación, limitación de tratamiento, supresión, portabilidad y oposición/revocación,
 en los términos que establece la normativa vigente en materia de protección de datos, dirigiendo su petición a la dirección postal TRAVESSERA DE GRACIA 342-344 08025, BARCELONA o bien a través de correo electrónico administracion@xxxxxxx Si usted lee este
 mensaje y no es el destinatario señalado, el empleado o el agente responsable de entregar el mensaje al destinatario, o ha recibido esta comunicación por error, le informamos que está totalmente prohibida, y puede ser ilegal, cualquier divulgación, distribución
 o reproducción de esta comunicación, y le rogamos que nos lo notifique inmediatamente y nos devuelva el mensaje original a la dirección arriba mencionada. Gracias.

No
 imprimas si no es necesario. Protejamos el Medio Ambiente.

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com