Re: all monitors deleted, state recovered using documentation .. at what point to start osds ?

Shashi Dahal <myshashi@xxxxxxxxx> · Thu, 10 Nov 2022 20:07:36 +0100

Thanks for the info.
I was able to get everything up and running.

Just to mention,  this particular cluster had around 50 vms from openstack
(nova, cinder, glance all using openstack), 4 osd nodes with 10 disks each.
There was no stop/start/delete operation.  The cluster ran fine headless
without any mons for 3+ weeks.  It was only when someone rebooted a
critical VM and it refused to come back up that I had to start the recovery
process.

If anyone else faces the same issues in future, where all the mons die and
are unable to recover, there is no need to panic and ceph just works fine
as long as no start/stop/update on the volumes side and nothing on
adding/removing the osd side.

Thanks,

On Thu, Nov 10, 2022 at 5:32 PM Tyler Brekke <tbrekke@xxxxxxxxxxxxxxxx>
wrote:

> Hi Shashi, I think you need to have a mgr running to get updated
> reporting, which would explain the incorrect ceph status output.
>
> Since you have a monitor quorum 1 out of 1, you can start up OSDs. but I
> would recommend getting all your mons/mgrs back up first.
>
> On Tue, Nov 8, 2022 at 5:56 PM Shashi Dahal <myshashi@xxxxxxxxx> wrote:
>
>> Hi,
>>
>> Unfortunately, all 3 monitors were lost.
>> I followed this ->
>>
>> https://docs.ceph.com/en/quincy/rados/troubleshooting/troubleshooting-mon/#mon-store-recovery-using-osds
>> and it is in the current state now.
>>
>> id:     234c6a96-8101-49d1-b354-1110e759d572
>>     health: HEALTH_WARN
>>             mon is allowing insecure global_id reclaim
>>             no active mgr
>>
>>   services:
>>     mon: 1 daemons, quorum mon1 (age 8m)
>>     mgr: no daemons active
>>     osd: 40 osds: 40 up (since 5M), 40 in (since 5M)
>>
>>   data:
>>     pools:   0 pools, 0 pgs
>>     objects: 0 objects, 0 B
>>     usage:   0 B used, 0 B / 0 B avail
>>     pgs:
>>
>>
>> all OSD daemons are turned off.
>>
>> It says all 40 osds are up, but the osd services are down.
>> If I run  ceph osd dump, it shows all the volumes and pg numbers .. so it
>> looks like it knows of all those.
>>
>> My question is, is it now in a state where it's safe to start an OSD
>> daemon
>> ?  Because  ceph status shows no pools, no pgs, I have not turned it on to
>> ensure no data loss occurs. At what point would I start the osd daemon?
>>
>>
>> Note:
>> If anyone has done something like this before, and can offer (paid)
>> assistance/consultation, that is also welcome .
>>
>> Thanks,
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>
>
>
> --
> Tyler Brekke
> Senior Engineer I
> tbrekke@xxxxxxxxxxxxxxxx
> ------------------------------
> We're Hiring! <https://do.co/careers> | @digitalocean
> <https://twitter.com/digitalocean> | YouTube
> <https://www.youtube.com/digitalocean>
>

-- 
Cheers,
Shashi
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx