Re: MONs are down, the quorum is unable to resolve.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I see, maybe you want to look at these instructions. I don’t know if you are running Rook, but the point about getting the container alive by using `sleep` is important. Then you can get into the container with `exec` and do what you need to.

https://rook.io/docs/rook/v1.4/ceph-disaster-recovery.html#restoring-mon-quorum

> On Oct 12, 2020, at 4:16 PM, Gaël THEROND <gael.therond@xxxxxxxxxxxx> wrote:
> 
> Hi Brian!
> 
> Thanks a lot for your quick answer, it was fast !
> 
> Yes, I’ve read this doc, yet I can’t perform appropriate commands as my OSDs are up and running.
> 
> As my mon is a container if I try to use ceph-mon —extract it won’t work as the mon process is running and if I stop it the container will be restarted and I’ll be ousted off it.
> 
> I can’t retrieve anything from ceph mon getmap as the quorum isn’t forming.
> 
> Yep, I know that I would need three nodes and I have a third node available since recently for this lab.
> 
> unfortunately it’s a lab cluster and so one of my colleagues just took the third node for testing purpose... I told you, a series of unfortunate events :-)
> 
> I can’t get rid of the cluster as I can’t lost OSDs data.
> 
> G.
> 
> Le mar. 13 oct. 2020 à 00:01, Brian Topping <brian.topping@xxxxxxxxx <mailto:brian.topping@xxxxxxxxx>> a écrit :
> Hi there!
> 
> This isn’t a difficult problem to fix. For purposes of clarity, the monmap is just a part of the monitor database. You generally have all the details correct though.
> 
> Have you looked at the process in https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-mon/#recovering-a-monitor-s-broken-monmap? <https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-mon/#recovering-a-monitor-s-broken-monmap?>
> 
> Please do make sure you are working on the copy of the monitor database with the newest epoch. After removing the other monitors and getting your cluster back online, you can re-add monitors at will.
> 
> Also note that a quorum is defined as "one-half the total number of nodes plus one”. In your case, quorum is defined by both nodes! Taking either down would cause this problem. So you need to have an odd number of nodes to provide the ability to take a node down, for instance in a rolling upgrade.
> 
> Hope that helps! 
> 
> Brian
> 
> 
>> On Oct 12, 2020, at 3:54 PM, Gaël THEROND <gael.therond@xxxxxxxxxxxx <mailto:gael.therond@xxxxxxxxxxxx>> wrote:
>> 
> 
> 
>> Hi everyone,
>> 
>> Because of unfortunate events, I’ve a containers based ceph cluster
>> (nautilus) in a bad shape.
>> 
>> One of the lab cluster which is only made of 2 nodes as control plane (I
>> know it’s bad :-)) each of these nodes run a mon, a mgr and a rados-gw
>> containerized ceph_daemon.
>> 
>> They were installed using ceph-ansible if relevant for anyone.
>> 
>> However, when I was performing an upgrade on one of the first nodes, the
>> second went down too (electrical power outage).
>> 
>> As soon as I saw that I stopped all current process within the upgrading
>> node.
>> 
>> For now, if I try to restart my second node, as the quorum is looking for
>> two node the cluster isn’t available.
>> 
>> The container start, the node elect itself as the master but all ceph
>> commands are stuck forever, which is perfectly normal as the quorum still
>> wait for one member to achieve the election process etc.
>> 
>> So, my question is, as I can’t (to my knowledge) extract the monmap with
>> this intermediary state, and as my first node will still be considered as a
>> known mon and try to join back if started properly, can I just copy the
>> /etc/ceph.conf and /var/lib/mon/<host>/keyring from the last living node
>> (the second one) and copy everything at its own place within the first
>> node? My mon keys were the same for both mon initially and if I’m not
>> making any mistakes my first node being blank will try to create a default
>> store, join the existing cluster and try to retrieve the appropriate monmap
>> from the remaining node right?
>> 
>> If not, is there a process to be able to save/extract the monmap when using
>> a container based ceph ? I can perfectly exec on the remaining node if it
>> make any difference.
>> 
>> Thanks a lot!
> 
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx <mailto:ceph-users@xxxxxxx>
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx <mailto:ceph-users-leave@xxxxxxx>
> 

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux