Re: restoring ceph cluster from osds

Ben <ruidong.gao@xxxxxxxxx> · Thu, 9 Mar 2023 23:37:31 +0800

Hi,

Yes, the old mon daemons are removed. In the first post mon daemons were
started with mon data from scratch. After some code search, I suspect
without original mon data I could restore the cluster from all osds. But I
may be wrong on this. For now, I think it could be of less configuration if
I could start a mon daemon cluster with exact ID as original one( something
like k,m,o). Any thoughts on this?

Ben

Eugen Block <eblock@xxxxxx> 于2023年3月9日周四 20:56写道：

> Hi,
>
> I'm not familiar with rook so the steps required may vary. If you try
> to reuse the old mon stores you'll have the mentioned mismatch between
> the new daemons and the old monmap (which still contains the old mon
> daemons). It's not entirely clear what went wrong in the first place
> and what you already tried exactly, so it's hard to tell if editing
> the monmap is the way to go here. I guess the old mon daemons are
> removed, is that assumption correct? In that case it could be worth a
> try to edit the current monmap to contain only the new mons and inject
> it (see [1] for details). If the mons start and form a quorum you'd
> have a cluster, but I can't tell if the OSDs will register
> successfully. I think the previous approach when the original mons
> were up but the OSDs didn't start would have been more promising.
> Anyway, maybe editing the monmap will fix this for you.
>
> [1]
>
> https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-mon/#recovering-a-monitor-s-broken-monmap
>
> Zitat von Ben <ruidong.gao@xxxxxxxxx>:
>
> > Hi Eugen,
> >
> > Thank you for help on this.
> >
> > Forget the log. A little progress, the monitors store were restored. I
> > created a new ceph cluster to use the restored monitors store. But the
> > monitor log complains:
> >
> > debug 2023-03-09T11:00:31.233+0000 7fe95234f880  0 starting mon.a rank -1
> > at public addrs [v2:169.169.163.25:3300/0,v1:169.169.163.25:6789/0] at
> bind
> > addrs [v2:197.166.206.27:3300/0,v1:197.166.206.27:6789/0] mon_data
> > /var/lib/ceph/mon/ceph-a fsid 3f271841-6188-47c1-b3fd-90fd4f978c76
> >
> > debug 2023-03-09T11:00:31.234+0000 7fe95234f880  1 mon.a@-1(???) e27
> > preinit fsid 3f271841-6188-47c1-b3fd-90fd4f978c76
> >
> > debug 2023-03-09T11:00:31.234+0000 7fe95234f880 -1 mon.a@-1(???) e27
> not in
> > monmap and have been in a quorum before; must have been removed
> >
> > debug 2023-03-09T11:00:31.234+0000 7fe95234f880 -1 mon.a@-1(???) e27
> commit
> > suicide!
> >
> > debug 2023-03-09T11:00:31.234+0000 7fe95234f880 -1 failed to initialize
> >
> >
> > The fact is original monitor clusters ids are k,m,o, however the new ones
> > are a,b,d. It was deployed by rook. Any ideas to make this work?
> >
> >
> > Ben
> >
> > Eugen Block <eblock@xxxxxx> 于2023年3月9日周四 16:00写道：
> >
> >> Hi,
> >>
> >> there's no attachment to your email, please use something like
> >> pastebin to provide OSD logs.
> >>
> >> Thanks
> >> Eugen
> >>
> >> Zitat von Ben <ruidong.gao@xxxxxxxxx>:
> >>
> >> > Hi,
> >> >
> >> > I ended up with having whole set of osds to get back original ceph
> >> cluster.
> >> > I figured out to make the cluster running. However, it's status is
> >> > something as below:
> >> >
> >> > bash-4.4$ ceph -s
> >> >
> >> >   cluster:
> >> >
> >> >     id:     3f271841-6188-47c1-b3fd-90fd4f978c76
> >> >
> >> >     health: HEALTH_WARN
> >> >
> >> >             7 daemons have recently crashed
> >> >
> >> >             4 slow ops, oldest one blocked for 35077 sec, daemons
> >> > [mon.a,mon.b] have slow ops.
> >> >
> >> >
> >> >
> >> >   services:
> >> >
> >> >     mon: 3 daemons, quorum a,b,d (age 9h)
> >> >
> >> >     mgr: b(active, since 14h), standbys: a
> >> >
> >> >     osd: 4 osds: 0 up, 4 in (since 9h)
> >> >
> >> >
> >> >
> >> >   data:
> >> >
> >> >     pools:   0 pools, 0 pgs
> >> >
> >> >     objects: 0 objects, 0 B
> >> >
> >> >     usage:   0 B used, 0 B / 0 B avail
> >> >
> >> >     pgs:
> >> >
> >> >
> >> > All osds are down.
> >> >
> >> >
> >> > I checked the osds logs and attached with this.
> >> >
> >> >
> >> > Please help and I wonder if it's possible to get the cluster back. I
> have
> >> > some backup for monitor's data. Till now I haven't restore that in the
> >> > course.
> >> >
> >> >
> >> > Thanks,
> >> >
> >> > Ben
> >> > _______________________________________________
> >> > ceph-users mailing list -- ceph-users@xxxxxxx
> >> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >>
> >>
> >> _______________________________________________
> >> ceph-users mailing list -- ceph-users@xxxxxxx
> >> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >>
>
>
>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx