Re: Procedure for temporary evacuation and replacement

Wesley Dillingham <wes@xxxxxxxxxxxxxxxxx> · Thu, 10 Oct 2024 14:04:08 -0400

If you are replacing the OSDs with the same size/weight device, I agree
with your reweight approach. I've been doing some similar work myself that
does require crush reweighting to 0 and have been in that headspace.

I did a bit of testing around this:

- Even with the lowest possible reweight an OSD would take, 1 PG was left
on my up+in OSD  "ceph osd reweight osd.1 0.00002" results in a reweight of
0.00002 however a "ceph osd reweight osd.1 0.00001" results in the reweight
of 0 (out).

- With my OSD in a state of UP + IN and a reweight of .00002 I used upmap
to move that 1 PG off of the OSD to be left with 0 PGs there.

- I attempted to destroy the osd in this state but it complained it was not
down and so I marked it down and set the noup flag

- With the PG in a down + in state the osd could be destroyed, and this
surprised me, I assumed it would need to be marked (or transition to) out
as well.

So in summary, I think you will be left with 1 or more PGs on the OSD in
your approach of reweighting to a very low value you will then either need
to later mark it fully out / reweight it to 0 or use upmap approach to not
degrade that subsequent PG when getting it marked down.

I dont think there is any danger to reweighting to 0 (or marking it out) vs
marking it to a very low value and as I have more clarity on what you want
to do that is exactly the approach I would take (mark it out).

Respectfully,

*Wes Dillingham*
LinkedIn <http://www.linkedin.com/in/wesleydillingham>
wes@xxxxxxxxxxxxxxxxx

On Thu, Oct 10, 2024 at 9:58 AM Frank Schilder <frans@xxxxxx> wrote:

> Thanks Anthony and Wesley for your input.
>
> Let me explain in more detail why I'm interested in the somewhat obscure
> looking procedure in step 1.
>
> Whats the difference between "ceph osd reweight" and "ceph osd crush
> reweight"? the difference is that command 1 only remaps shards within the
> same failure domain (as Anthony noted), while command 2 implies global
> changes to the crush map with rediúndant data movement. In other words,
> using
>
>   ceph osd reweight osd.X 0
>
> will only move the shards from osd.X to other OSDs (in the same failue
> domain) while
>
>   ceph osd crush reweight osd.X 0
>
> has a global effect and will move a lot more around. This "a lot more" is
> what I want to avoid. There is necessary data movement, namely the data on
> the OSDs I want to evacuate, and there is redundant data movement, which is
> everything else.
>
> So, for evacuation, the first command is the command of choice if one
> wants to move exactly the shards that need to move.
>
> If one re-creates OSDs with exactly the same IDs and weights that the
> evacuated OSDs had, which is the default when using "ceph osd destroy" as
> it preserves the crush weights, then, after adding the new OSDs, it will be
> exactly the shards that were evacuated in step 1 that will move back.
> That's the minimum possible data movement: data moved = data that needs to
> move.
>
> I don't have balancer or anything enabled that could interfere with that
> procedure. Please don't bother commenting about things like that.
>
> My actual question is, how dangerous is it to use
>
>   ceph osd reweight osd.X 0
>
> instead of
>
>   ceph osd reweight osd.X 0.001
>
> The first command will mark the OSD OUT while the second won't. The second
> command might leave 1-2 PGs on the OSDs, while the first one won't.
>
> Does the OSD being formally UP+OUT make any difference compared with UP+IN
> for evacuation? My initial simplistic test says no, but I would like to be
> a bit more sure than that.
>
> Best regards,
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx