Re: Is it possible (or meaningful) to revive old OSDs?

ceph-mail@xxxxxxxxxxxxxxxx · Thu, 07 Sep 2023 13:48:48 +0000

Thanks all for the advice, very helpful!

The node also had a mon, which happily slotted right back into the cluster. The node's been up and running for a number of days now, but the systemd OSD processes don't seem to be trying continously, they're never progressing or getting a newer map.

As mentioned, the cluster is otherwise healthy (only these OSDs, which are down and out), and I have spare capacity and no issue with min_size. And they've been out for a long time (months) so it's reasonable to guess that most PGs may have been touched.

So, based on the advice, my plan is the following:

  1.  Set norebalance
  2.  One by one, do this for each OSD
     *   Purge the OSD from the dashboard
     *   cephadm ceph-volume lvm zap
     *   cephadm may automatically find and add the OSD, otherwise I'll add it manually
  3.  use pgremapper<https://github.com/digitalocean/pgremapper> to prevent the OSDs to be filled
  4.  unset norebalance
  5.  Let the balancer gently flow data back into the OSDs over the next hours, days, weeks.

Thanks all!

________________________________
From: Richard Bade 'hitrich at gmail.com' <ceph-mail@xxxxxxxxxxxxxxxx>
Sent: Thursday, September 7, 2023 01:25
To: ceph-mail@xxxxxxxxxxxxxxxx <ceph-mail@xxxxxxxxxxxxxxxx>
Subject: Re:  Re: Is it possible (or meaningful) to revive old OSDs?

Yes, I agree with Anthony. If your cluster is healthy and you don't
*need* to bring them back in it's going to be less work and time to
just deploy them as new.

I usually set norebalance, purge the osds in ceph, remove the vg from
the disks and re-deploy. Then unset norebalance at the end once
everything is peered and happy. This is so that it doesn't start
moving stuff around when you purge.

Rich

On Thu, 7 Sept 2023 at 02:21, Anthony D'Atri <anthony.datri@xxxxxxxxx> wrote:
>
> Resurrection usually only makes sense if fate or a certain someone resulted in enough overlapping removed OSDs that you can't meet min_size.  I've had to a couple of times :-/
>
> If an OSD is down for more than a short while, backfilling a redeployed OSD will likely be faster than waiting for it to peer and do deltas -- if it can at all.
>
> > On Sep 6, 2023, at 10:16, Malte Stroem <malte.stroem@xxxxxxxxx> wrote:
> >
> > Hi ceph-mail@xxxxxxxxxxxxxxxx,
> >
> > you could squeeze the OSDs back in but it does not make sense.
> >
> > Just clean the disks with dd for example and add them as new disks to your cluster.
> >
> > Best,
> > Malte
> >
> > Am 04.09.23 um 09:39 schrieb ceph-mail@xxxxxxxxxxxxxxxx:
> >> Hello,
> >> I have a ten node cluster with about 150 OSDs. One node went down a while back, several months. The OSDs on the node have been marked as down and out since.
> >> I am now in the position to return the node to the cluster, with all the OS and OSD disks. When I boot up the now working node, the OSDs do not start.
> >> Essentially , it seems to complain with "fail[ing]to load OSD map for [various epoch]s, got 0 bytes".
> >> I'm guessing the OSDs on disk maps are so old, they can't get back into the cluster?
> >> My questions are whether it's possible or worth it to try to squeeze these OSDs back in or to just replace them. And if I should just replace them, what's the best way? Manually remove [1] and recreate? Replace [2]? Purge in dashboard?
> >> [1] https://docs.ceph.com/en/quincy/rados/operations/add-or-rm-osds/#removing-osds-manual
> >> [2] https://docs.ceph.com/en/quincy/rados/operations/add-or-rm-osds/#replacing-an-osd
> >> Many thanks!
> >> _______________________________________________
> >> ceph-users mailing list -- ceph-users@xxxxxxx
> >> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx