I’m not using rook although I think it will probably help a lot with that recovery as rook is containers based too! Thanks a lot! Le mar. 13 oct. 2020 à 00:19, Brian Topping <brian.topping@xxxxxxxxx> a écrit : > I see, maybe you want to look at these instructions. I don’t know if you > are running Rook, but the point about getting the container alive by using > `sleep` is important. Then you can get into the container with `exec` and > do what you need to. > > > https://rook.io/docs/rook/v1.4/ceph-disaster-recovery.html#restoring-mon-quorum > > > On Oct 12, 2020, at 4:16 PM, Gaël THEROND <gael.therond@xxxxxxxxxxxx> > wrote: > > Hi Brian! > > Thanks a lot for your quick answer, it was fast ! > > Yes, I’ve read this doc, yet I can’t perform appropriate commands as my > OSDs are up and running. > > As my mon is a container if I try to use ceph-mon —extract it won’t work > as the mon process is running and if I stop it the container will be > restarted and I’ll be ousted off it. > > I can’t retrieve anything from ceph mon getmap as the quorum isn’t forming. > > Yep, I know that I would need three nodes and I have a third node > available since recently for this lab. > > unfortunately it’s a lab cluster and so one of my colleagues just took the > third node for testing purpose... I told you, a series of unfortunate > events :-) > > I can’t get rid of the cluster as I can’t lost OSDs data. > > G. > > Le mar. 13 oct. 2020 à 00:01, Brian Topping <brian.topping@xxxxxxxxx> a > écrit : > >> Hi there! >> >> This isn’t a difficult problem to fix. For purposes of clarity, the >> monmap is just a part of the monitor database. You generally have all the >> details correct though. >> >> Have you looked at the process in >> https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-mon/#recovering-a-monitor-s-broken-monmap? >> >> Please do make sure you are working on the copy of the monitor database >> with the newest epoch. After removing the other monitors and getting your >> cluster back online, you can re-add monitors at will. >> >> Also note that a quorum is defined as "one-half the total number of nodes >> plus one”. In your case, quorum is defined by both nodes! Taking either >> down would cause this problem. So you need to have an odd number of nodes >> to provide the ability to take a node down, for instance in a rolling >> upgrade. >> >> Hope that helps! >> >> Brian >> >> On Oct 12, 2020, at 3:54 PM, Gaël THEROND <gael.therond@xxxxxxxxxxxx> >> wrote: >> >> Hi everyone, >> >> Because of unfortunate events, I’ve a containers based ceph cluster >> (nautilus) in a bad shape. >> >> One of the lab cluster which is only made of 2 nodes as control plane (I >> know it’s bad :-)) each of these nodes run a mon, a mgr and a rados-gw >> containerized ceph_daemon. >> >> They were installed using ceph-ansible if relevant for anyone. >> >> However, when I was performing an upgrade on one of the first nodes, the >> second went down too (electrical power outage). >> >> As soon as I saw that I stopped all current process within the upgrading >> node. >> >> For now, if I try to restart my second node, as the quorum is looking for >> two node the cluster isn’t available. >> >> The container start, the node elect itself as the master but all ceph >> commands are stuck forever, which is perfectly normal as the quorum still >> wait for one member to achieve the election process etc. >> >> So, my question is, as I can’t (to my knowledge) extract the monmap with >> this intermediary state, and as my first node will still be considered as >> a >> known mon and try to join back if started properly, can I just copy the >> /etc/ceph.conf and /var/lib/mon/<host>/keyring from the last living node >> (the second one) and copy everything at its own place within the first >> node? My mon keys were the same for both mon initially and if I’m not >> making any mistakes my first node being blank will try to create a default >> store, join the existing cluster and try to retrieve the appropriate >> monmap >> from the remaining node right? >> >> If not, is there a process to be able to save/extract the monmap when >> using >> a container based ceph ? I can perfectly exec on the remaining node if it >> make any difference. >> >> Thanks a lot! >> >> _______________________________________________ >> ceph-users mailing list -- ceph-users@xxxxxxx >> To unsubscribe send an email to ceph-users-leave@xxxxxxx >> >> >> > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx