If you are replacing the OSDs with the same size/weight device, I agree with your reweight approach. I've been doing some similar work myself that does require crush reweighting to 0 and have been in that headspace. I did a bit of testing around this: - Even with the lowest possible reweight an OSD would take, 1 PG was left on my up+in OSD "ceph osd reweight osd.1 0.00002" results in a reweight of 0.00002 however a "ceph osd reweight osd.1 0.00001" results in the reweight of 0 (out). - With my OSD in a state of UP + IN and a reweight of .00002 I used upmap to move that 1 PG off of the OSD to be left with 0 PGs there. - I attempted to destroy the osd in this state but it complained it was not down and so I marked it down and set the noup flag - With the PG in a down + in state the osd could be destroyed, and this surprised me, I assumed it would need to be marked (or transition to) out as well. So in summary, I think you will be left with 1 or more PGs on the OSD in your approach of reweighting to a very low value you will then either need to later mark it fully out / reweight it to 0 or use upmap approach to not degrade that subsequent PG when getting it marked down. I dont think there is any danger to reweighting to 0 (or marking it out) vs marking it to a very low value and as I have more clarity on what you want to do that is exactly the approach I would take (mark it out). Respectfully, *Wes Dillingham* LinkedIn <http://www.linkedin.com/in/wesleydillingham> wes@xxxxxxxxxxxxxxxxx On Thu, Oct 10, 2024 at 9:58 AM Frank Schilder <frans@xxxxxx> wrote: > Thanks Anthony and Wesley for your input. > > Let me explain in more detail why I'm interested in the somewhat obscure > looking procedure in step 1. > > Whats the difference between "ceph osd reweight" and "ceph osd crush > reweight"? the difference is that command 1 only remaps shards within the > same failure domain (as Anthony noted), while command 2 implies global > changes to the crush map with rediúndant data movement. In other words, > using > > ceph osd reweight osd.X 0 > > will only move the shards from osd.X to other OSDs (in the same failue > domain) while > > ceph osd crush reweight osd.X 0 > > has a global effect and will move a lot more around. This "a lot more" is > what I want to avoid. There is necessary data movement, namely the data on > the OSDs I want to evacuate, and there is redundant data movement, which is > everything else. > > So, for evacuation, the first command is the command of choice if one > wants to move exactly the shards that need to move. > > If one re-creates OSDs with exactly the same IDs and weights that the > evacuated OSDs had, which is the default when using "ceph osd destroy" as > it preserves the crush weights, then, after adding the new OSDs, it will be > exactly the shards that were evacuated in step 1 that will move back. > That's the minimum possible data movement: data moved = data that needs to > move. > > I don't have balancer or anything enabled that could interfere with that > procedure. Please don't bother commenting about things like that. > > My actual question is, how dangerous is it to use > > ceph osd reweight osd.X 0 > > instead of > > ceph osd reweight osd.X 0.001 > > The first command will mark the OSD OUT while the second won't. The second > command might leave 1-2 PGs on the OSDs, while the first one won't. > > Does the OSD being formally UP+OUT make any difference compared with UP+IN > for evacuation? My initial simplistic test says no, but I would like to be > a bit more sure than that. > > Best regards, > ================= > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx