Hi Raul,
we had quite a similar issue last year. We removed the two failing
MONs from the monmap, injected the reduced monmap into the surviving
MON so it would have a quorum. After that the other daemons would
start but we had to deal with large mon stores (I believe around 250
GB or so) during this phase. IIRC we had to prevent it from compacting
too often during startup (mon_compact_on_start = false) and also added
SSD storage for the MON store so the sync speed would increase.
Eventually, we brought the cluster up into a healthy state and then
added back the two crashed MONs. The root cause was that /var/ ran out
of disk space. So in our case it definitely was not a bug. ;-)
Hope this helps!
Eugen
Zitat von Raul H C Lopes <raul.cardoso.lopes@xxxxxxx>:
Dear CEPH dev team,
I have a CEPH cluster with three MONs two of each are down. When I
try to start them they crash and
journalctl shows that they crashed and a core dump was created.
Would that be a bug? Or a corrupt DB?
I have then a third MON that starts fine but when I get mon_status
through the admin-socket I see
"quorum": []
"state": "probing"
because of that (I believe) I cannot use 'ceph orch' to create new MONs.
So my questions:
- Is there way that I can use 'ceph orch' to create new MONS?
- Can I just rsync the store.db from this running node to the
crashing MON nodes?
- Or do I have to rebuild the store.db using a script as in
https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-mon/#mon-store-recovery-using-osds
Regards,
Raul
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx