Hi everybody, apparently, I forgot to report back. The evacuation completed without problems and we are replacing disks at the moment. This procedure worked like a charm (please read the thread to see why we didn't just shut down OSDs and used recovery for rebuild): 1.) For all OSDs: ceph osd out ID # just set them out, this is sticky and does what you want 2.) Wait for rebalance to finish 3.) Replace disks. 4.) Deploy OSDs with the same IDs as before per host. 5.) Start OSDs and let rebalance back. During the evacuation you might want to consider setting "osd_delete_sleep" to a high value to avoid issues due to PG removal reported in this thread; see messages by Joshua Baergen in this thread. The only wish I have is that after setting the OSDs "out" it would be great to have an option to have recovery kick in in addition to speed up movement of data. Instead of just reading shard by shard from the out-OSDs, shards should also be reconstructed by recovery from all other OSDs. Our evacuation lasted for about 2 weeks. If recovery would kick in, this time would go down to 2-3 days. Best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: Szabo, Istvan (Agoda) <Istvan.Szabo@xxxxxxxxx> Sent: Monday, October 28, 2024 4:41 AM To: Frank Schilder Subject: Re: Re: Procedure for temporary evacuation and replacement Hi Frank, Finally what was the best way to do this evacuation replacement? I want to destroy all my osds node by node in my cluster due to high fragmentation so might follow your method. Thank you _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx