Re: Procedure for temporary evacuation and replacement

"Anthony D'Atri" <aad@xxxxxxxxxxxxxx> · Thu, 10 Oct 2024 09:07:24 -0400

> 
> We need to replace about 40 disks distributed over all 12 hosts backing a large pool with EC 8+3. We can't do it host by host as it would take way too long (replace disks per host and let recovery rebuild the data)

<soapbox>This is one of the false economies of HDDs ;) </soapbox>

> Therefore, we would like to evacuate all data from these disks simultaneously and with as little data movement as possible. This is the procedure that seems to do the trick:
> 
> 1.) For all OSDs: ceph osd reweight ID 0  # Note: not "osd crush reweight"

Note that this will run afoul of the balancer module.  I *think* also that it will result in the data moving to OSDs on the same host.

> 2.) Wait for rebalance to finish
> 3.) Replace disks and deploy OSDs with the same IDs as before per host
> 4.) Start OSDs and let rebalance back
> 
> I tested step 1 on Octopus with 1 disk and it seems to work. The reason I ask is that step 1 actually marks the OSDs as OUT. However, they are still UP and I see only misplaced objects, not degraded objects. It is a bit counter-intuitive, but it seems that UP+OUT OSDs still participate in IO.
> 
> Because it is counter-intuitive, I would like to have a second opinion. I have read before that others reweight to something like 0.001 and hope that this flushes all PGs. I would prefer not to rely on hope and a reweight to 0 apparently is a valid choice here, leading to a somewhat weird state with UP+OUT OSDs.
> 
> Problems that could arise are timeouts I'm overlooking that will make data chunks on UP+OUT OSDs unavailable after some time. I'm also wondering if UP+OUT OSDs participate in peering in case there is an OSD restart somewhere in the pool.
> 
> Thanks for your input and best regards!
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx