Re: Procedure for temporary evacuation and replacement

Frank Schilder <frans@xxxxxx> · Mon, 28 Oct 2024 16:32:26 +0000

Hi everybody,

apparently, I forgot to report back. The evacuation completed without problems and we are replacing disks at the moment. This procedure worked like a charm (please read the thread to see why we didn't just shut down OSDs and used recovery for rebuild):

1.) For all OSDs: ceph osd out ID # just set them out, this is sticky and does what you want
2.) Wait for rebalance to finish
3.) Replace disks.
4.) Deploy OSDs with the same IDs as before per host.
5.) Start OSDs and let rebalance back.

During the evacuation you might want to consider setting "osd_delete_sleep" to a high value to avoid issues due to PG removal reported in this thread; see messages by Joshua Baergen in this thread.

The only wish I have is that after setting the OSDs "out" it would be great to have an option to have recovery kick in in addition to speed up movement of data. Instead of just reading shard by shard from the out-OSDs, shards should also be reconstructed by recovery from all other OSDs. Our evacuation lasted for about 2 weeks. If recovery would kick in, this time would go down to 2-3 days.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Szabo, Istvan (Agoda) <Istvan.Szabo@xxxxxxxxx>
Sent: Monday, October 28, 2024 4:41 AM
To: Frank Schilder
Subject: Re:  Re: Procedure for temporary evacuation and replacement

Hi Frank,

Finally what was the best way to do this evacuation replacement?
I want to destroy all my osds node by node in my cluster due to high fragmentation so might follow your method.

Thank you
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx