Hi there! This isn’t a difficult problem to fix. For purposes of clarity, the monmap is just a part of the monitor database. You generally have all the details correct though. Have you looked at the process in https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-mon/#recovering-a-monitor-s-broken-monmap? <https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-mon/#recovering-a-monitor-s-broken-monmap?> Please do make sure you are working on the copy of the monitor database with the newest epoch. After removing the other monitors and getting your cluster back online, you can re-add monitors at will. Also note that a quorum is defined as "one-half the total number of nodes plus one”. In your case, quorum is defined by both nodes! Taking either down would cause this problem. So you need to have an odd number of nodes to provide the ability to take a node down, for instance in a rolling upgrade. Hope that helps! Brian > On Oct 12, 2020, at 3:54 PM, Gaël THEROND <gael.therond@xxxxxxxxxxxx> wrote: > > Hi everyone, > > Because of unfortunate events, I’ve a containers based ceph cluster > (nautilus) in a bad shape. > > One of the lab cluster which is only made of 2 nodes as control plane (I > know it’s bad :-)) each of these nodes run a mon, a mgr and a rados-gw > containerized ceph_daemon. > > They were installed using ceph-ansible if relevant for anyone. > > However, when I was performing an upgrade on one of the first nodes, the > second went down too (electrical power outage). > > As soon as I saw that I stopped all current process within the upgrading > node. > > For now, if I try to restart my second node, as the quorum is looking for > two node the cluster isn’t available. > > The container start, the node elect itself as the master but all ceph > commands are stuck forever, which is perfectly normal as the quorum still > wait for one member to achieve the election process etc. > > So, my question is, as I can’t (to my knowledge) extract the monmap with > this intermediary state, and as my first node will still be considered as a > known mon and try to join back if started properly, can I just copy the > /etc/ceph.conf and /var/lib/mon/<host>/keyring from the last living node > (the second one) and copy everything at its own place within the first > node? My mon keys were the same for both mon initially and if I’m not > making any mistakes my first node being blank will try to create a default > store, join the existing cluster and try to retrieve the appropriate monmap > from the remaining node right? > > If not, is there a process to be able to save/extract the monmap when using > a container based ceph ? I can perfectly exec on the remaining node if it > make any difference. > > Thanks a lot! > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx