Re: restoring ceph cluster from osds

Eugen Block <eblock@xxxxxx> · Thu, 09 Mar 2023 12:56:25 +0000

Hi,

I'm not familiar with rook so the steps required may vary. If you try  
to reuse the old mon stores you'll have the mentioned mismatch between  
the new daemons and the old monmap (which still contains the old mon  
daemons). It's not entirely clear what went wrong in the first place  
and what you already tried exactly, so it's hard to tell if editing  
the monmap is the way to go here. I guess the old mon daemons are  
removed, is that assumption correct? In that case it could be worth a  
try to edit the current monmap to contain only the new mons and inject  
it (see [1] for details). If the mons start and form a quorum you'd  
have a cluster, but I can't tell if the OSDs will register  
successfully. I think the previous approach when the original mons  
were up but the OSDs didn't start would have been more promising.  
Anyway, maybe editing the monmap will fix this for you.

[1]  
https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-mon/#recovering-a-monitor-s-broken-monmap

Zitat von Ben <ruidong.gao@xxxxxxxxx>:

Hi Eugen,

Thank you for help on this.

Forget the log. A little progress, the monitors store were restored. I
created a new ceph cluster to use the restored monitors store. But the
monitor log complains:

debug 2023-03-09T11:00:31.233+0000 7fe95234f880  0 starting mon.a rank -1
at public addrs [v2:169.169.163.25:3300/0,v1:169.169.163.25:6789/0] at bind
addrs [v2:197.166.206.27:3300/0,v1:197.166.206.27:6789/0] mon_data
/var/lib/ceph/mon/ceph-a fsid 3f271841-6188-47c1-b3fd-90fd4f978c76

debug 2023-03-09T11:00:31.234+0000 7fe95234f880  1 mon.a@-1(???) e27
preinit fsid 3f271841-6188-47c1-b3fd-90fd4f978c76

debug 2023-03-09T11:00:31.234+0000 7fe95234f880 -1 mon.a@-1(???) e27 not in
monmap and have been in a quorum before; must have been removed

debug 2023-03-09T11:00:31.234+0000 7fe95234f880 -1 mon.a@-1(???) e27 commit
suicide!

debug 2023-03-09T11:00:31.234+0000 7fe95234f880 -1 failed to initialize

The fact is original monitor clusters ids are k,m,o, however the new ones
are a,b,d. It was deployed by rook. Any ideas to make this work?

Ben

Eugen Block <eblock@xxxxxx> 于2023年3月9日周四 16:00写道：

Hi,

there's no attachment to your email, please use something like
pastebin to provide OSD logs.

Thanks
Eugen

Zitat von Ben <ruidong.gao@xxxxxxxxx>:

> Hi,
>
> I ended up with having whole set of osds to get back original ceph
cluster.
> I figured out to make the cluster running. However, it's status is
> something as below:
>
> bash-4.4$ ceph -s
>
>   cluster:
>
>     id:     3f271841-6188-47c1-b3fd-90fd4f978c76
>
>     health: HEALTH_WARN
>
>             7 daemons have recently crashed
>
>             4 slow ops, oldest one blocked for 35077 sec, daemons
> [mon.a,mon.b] have slow ops.
>
>
>
>   services:
>
>     mon: 3 daemons, quorum a,b,d (age 9h)
>
>     mgr: b(active, since 14h), standbys: a
>
>     osd: 4 osds: 0 up, 4 in (since 9h)
>
>
>
>   data:
>
>     pools:   0 pools, 0 pgs
>
>     objects: 0 objects, 0 B
>
>     usage:   0 B used, 0 B / 0 B avail
>
>     pgs:
>
>
> All osds are down.
>
>
> I checked the osds logs and attached with this.
>
>
> Please help and I wonder if it's possible to get the cluster back. I have
> some backup for monitor's data. Till now I haven't restore that in the
> course.
>
>
> Thanks,
>
> Ben
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx