Re: restoring ceph cluster from osds

Eugen Block <eblock@xxxxxx> · Thu, 09 Mar 2023 18:31:34 +0000

Hi,

I still think the best approach would be to rebuild the MON store from  
the OSDs as described here [2]. Just creating new MONs with the same  
IDs might not be sufficient because they would miss all the OSD  
keyrings etc., so you'd still have to do some work to get it up. It  
might be easier with the OSD approach, but other users might have a  
better approach, it's really been a while since I had to go through  
that troubleshooting section.

Regards,
Eugen

[2]  
https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-mon/#recovery-using-osds

Zitat von Ben <ruidong.gao@xxxxxxxxx>:

Hi,

Yes, the old mon daemons are removed. In the first post mon daemons were
started with mon data from scratch. After some code search, I suspect
without original mon data I could restore the cluster from all osds. But I
may be wrong on this. For now, I think it could be of less configuration if
I could start a mon daemon cluster with exact ID as original one( something
like k,m,o). Any thoughts on this?

Ben

Eugen Block <eblock@xxxxxx> 于2023年3月9日周四 20:56写道：

Hi,

I'm not familiar with rook so the steps required may vary. If you try
to reuse the old mon stores you'll have the mentioned mismatch between
the new daemons and the old monmap (which still contains the old mon
daemons). It's not entirely clear what went wrong in the first place
and what you already tried exactly, so it's hard to tell if editing
the monmap is the way to go here. I guess the old mon daemons are
removed, is that assumption correct? In that case it could be worth a
try to edit the current monmap to contain only the new mons and inject
it (see [1] for details). If the mons start and form a quorum you'd
have a cluster, but I can't tell if the OSDs will register
successfully. I think the previous approach when the original mons
were up but the OSDs didn't start would have been more promising.
Anyway, maybe editing the monmap will fix this for you.

[1]

https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-mon/#recovering-a-monitor-s-broken-monmap

Zitat von Ben <ruidong.gao@xxxxxxxxx>:

> Hi Eugen,
>
> Thank you for help on this.
>
> Forget the log. A little progress, the monitors store were restored. I
> created a new ceph cluster to use the restored monitors store. But the
> monitor log complains:
>
> debug 2023-03-09T11:00:31.233+0000 7fe95234f880  0 starting mon.a rank -1
> at public addrs [v2:169.169.163.25:3300/0,v1:169.169.163.25:6789/0] at
bind
> addrs [v2:197.166.206.27:3300/0,v1:197.166.206.27:6789/0] mon_data
> /var/lib/ceph/mon/ceph-a fsid 3f271841-6188-47c1-b3fd-90fd4f978c76
>
> debug 2023-03-09T11:00:31.234+0000 7fe95234f880  1 mon.a@-1(???) e27
> preinit fsid 3f271841-6188-47c1-b3fd-90fd4f978c76
>
> debug 2023-03-09T11:00:31.234+0000 7fe95234f880 -1 mon.a@-1(???) e27
not in
> monmap and have been in a quorum before; must have been removed
>
> debug 2023-03-09T11:00:31.234+0000 7fe95234f880 -1 mon.a@-1(???) e27
commit
> suicide!
>
> debug 2023-03-09T11:00:31.234+0000 7fe95234f880 -1 failed to initialize
>
>
> The fact is original monitor clusters ids are k,m,o, however the new ones
> are a,b,d. It was deployed by rook. Any ideas to make this work?
>
>
> Ben
>
> Eugen Block <eblock@xxxxxx> 于2023年3月9日周四 16:00写道：
>
>> Hi,
>>
>> there's no attachment to your email, please use something like
>> pastebin to provide OSD logs.
>>
>> Thanks
>> Eugen
>>
>> Zitat von Ben <ruidong.gao@xxxxxxxxx>:
>>
>> > Hi,
>> >
>> > I ended up with having whole set of osds to get back original ceph
>> cluster.
>> > I figured out to make the cluster running. However, it's status is
>> > something as below:
>> >
>> > bash-4.4$ ceph -s
>> >
>> >   cluster:
>> >
>> >     id:     3f271841-6188-47c1-b3fd-90fd4f978c76
>> >
>> >     health: HEALTH_WARN
>> >
>> >             7 daemons have recently crashed
>> >
>> >             4 slow ops, oldest one blocked for 35077 sec, daemons
>> > [mon.a,mon.b] have slow ops.
>> >
>> >
>> >
>> >   services:
>> >
>> >     mon: 3 daemons, quorum a,b,d (age 9h)
>> >
>> >     mgr: b(active, since 14h), standbys: a
>> >
>> >     osd: 4 osds: 0 up, 4 in (since 9h)
>> >
>> >
>> >
>> >   data:
>> >
>> >     pools:   0 pools, 0 pgs
>> >
>> >     objects: 0 objects, 0 B
>> >
>> >     usage:   0 B used, 0 B / 0 B avail
>> >
>> >     pgs:
>> >
>> >
>> > All osds are down.
>> >
>> >
>> > I checked the osds logs and attached with this.
>> >
>> >
>> > Please help and I wonder if it's possible to get the cluster back. I
have
>> > some backup for monitor's data. Till now I haven't restore that in the
>> > course.
>> >
>> >
>> > Thanks,
>> >
>> > Ben
>> > _______________________________________________
>> > ceph-users mailing list -- ceph-users@xxxxxxx
>> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>
>>
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx