Re: Emergency, I lost 4 monitors but all osd disk are safe

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,
follow these instructions:
https://docs.ceph.com/en/quincy/rados/operations/add-or-rm-mons/#removing-monitors-from-an-unhealthy-cluster
As you are using containers, you might need to specify the --mon-data
directory (/var/lib/CLUSTER_UUID/mon.MONNAME) (actually I never did this in
an orchestrator environment)

Good luck.


Am Do., 2. Nov. 2023 um 12:48 Uhr schrieb Mohamed LAMDAOUAR <
mohamed.lamdaouar@xxxxxxx>:

> Hello Boris,
>
> I have one server monitor up and two other servers of the cluster are also
> up (These two servers are not monitors ) .
> I have four other servers down (the boot disk is out) but the osd data
> disks are safe.
> I reinstalled the OS on a  new SSD disk. How can I rebuild my cluster with
> only one mons.
> If you would like, you can join me for a meeting. I will give you more
> information about the cluster.
>
> Thanks for your help, I'm very stuck because the data is present but I
> don't know how to add the old osd in the cluster to recover the data.
>
>
>
> Le jeu. 2 nov. 2023 à 11:55, Boris Behrens <bb@xxxxxxxxx> a écrit :
>
>> Hi Mohamed,
>> are all mons down, or do you still have at least one that is running?
>>
>> AFAIK: the mons save their DB on the normal OS disks, and not within the
>> ceph cluster.
>> So if all mons are dead, which mean the disks which contained the mon data
>> are unrecoverable dead, you might need to bootstrap a new cluster and add
>> the OSDs to the new cluster. This will likely include tinkering with cephx
>> authentication, so you don't wipe the old OSD data.
>>
>> If you still have at least ONE mon alive, you can shut it down, and remove
>> all the other mons from the monmap and start it again. You CAN have
>> clusters with only one mon.
>>
>> Or is did your host just lost the boot disk and you just need to bring it
>> up somehow? losing 4x2 NVME disks at the same time, sounds a bit strange.
>>
>> Am Do., 2. Nov. 2023 um 11:34 Uhr schrieb Mohamed LAMDAOUAR <
>> mohamed.lamdaouar@xxxxxxx>:
>>
>> > Hello,
>> >
>> >   I have 7 machines on CEPH cluster, the service ceph runs on a docker
>> > container.
>> >  Each machine has 4 hdd of data (available) and 2 nvme sssd (bricked)
>> >   During a reboot, the ssd bricked on 4 machines, the data are
>> available on
>> > the HDD disk but the nvme is bricked and the system is not available.
>> is it
>> > possible to recover the data of the cluster (the data disk are all
>> > available)
>> > _______________________________________________
>> > ceph-users mailing list -- ceph-users@xxxxxxx
>> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>> >
>>
>>
>> --
>> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
>> groüen Saal.
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>
>

-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux