Hi,
I'm not familiar with rook so the steps required may vary. If you try
to reuse the old mon stores you'll have the mentioned mismatch between
the new daemons and the old monmap (which still contains the old mon
daemons). It's not entirely clear what went wrong in the first place
and what you already tried exactly, so it's hard to tell if editing
the monmap is the way to go here. I guess the old mon daemons are
removed, is that assumption correct? In that case it could be worth a
try to edit the current monmap to contain only the new mons and inject
it (see [1] for details). If the mons start and form a quorum you'd
have a cluster, but I can't tell if the OSDs will register
successfully. I think the previous approach when the original mons
were up but the OSDs didn't start would have been more promising.
Anyway, maybe editing the monmap will fix this for you.
[1]
https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-mon/#recovering-a-monitor-s-broken-monmap
Zitat von Ben <ruidong.gao@xxxxxxxxx>:
Hi Eugen,
Thank you for help on this.
Forget the log. A little progress, the monitors store were restored. I
created a new ceph cluster to use the restored monitors store. But the
monitor log complains:
debug 2023-03-09T11:00:31.233+0000 7fe95234f880 0 starting mon.a rank -1
at public addrs [v2:169.169.163.25:3300/0,v1:169.169.163.25:6789/0] at bind
addrs [v2:197.166.206.27:3300/0,v1:197.166.206.27:6789/0] mon_data
/var/lib/ceph/mon/ceph-a fsid 3f271841-6188-47c1-b3fd-90fd4f978c76
debug 2023-03-09T11:00:31.234+0000 7fe95234f880 1 mon.a@-1(???) e27
preinit fsid 3f271841-6188-47c1-b3fd-90fd4f978c76
debug 2023-03-09T11:00:31.234+0000 7fe95234f880 -1 mon.a@-1(???) e27 not in
monmap and have been in a quorum before; must have been removed
debug 2023-03-09T11:00:31.234+0000 7fe95234f880 -1 mon.a@-1(???) e27 commit
suicide!
debug 2023-03-09T11:00:31.234+0000 7fe95234f880 -1 failed to initialize
The fact is original monitor clusters ids are k,m,o, however the new ones
are a,b,d. It was deployed by rook. Any ideas to make this work?
Ben
Eugen Block <eblock@xxxxxx> 于2023年3月9日周四 16:00写道:
Hi,
there's no attachment to your email, please use something like
pastebin to provide OSD logs.
Thanks
Eugen
Zitat von Ben <ruidong.gao@xxxxxxxxx>:
> Hi,
>
> I ended up with having whole set of osds to get back original ceph
cluster.
> I figured out to make the cluster running. However, it's status is
> something as below:
>
> bash-4.4$ ceph -s
>
> cluster:
>
> id: 3f271841-6188-47c1-b3fd-90fd4f978c76
>
> health: HEALTH_WARN
>
> 7 daemons have recently crashed
>
> 4 slow ops, oldest one blocked for 35077 sec, daemons
> [mon.a,mon.b] have slow ops.
>
>
>
> services:
>
> mon: 3 daemons, quorum a,b,d (age 9h)
>
> mgr: b(active, since 14h), standbys: a
>
> osd: 4 osds: 0 up, 4 in (since 9h)
>
>
>
> data:
>
> pools: 0 pools, 0 pgs
>
> objects: 0 objects, 0 B
>
> usage: 0 B used, 0 B / 0 B avail
>
> pgs:
>
>
> All osds are down.
>
>
> I checked the osds logs and attached with this.
>
>
> Please help and I wonder if it's possible to get the cluster back. I have
> some backup for monitor's data. Till now I haven't restore that in the
> course.
>
>
> Thanks,
>
> Ben
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx