Re: Strategy for add new osds

Nico Schottelius <nico.schottelius@xxxxxxxxxxx> · Wed, 16 Jun 2021 11:22:59 +0200

I've to say I am reading quite some interesting strategies in this
thread and I'd like to shortly take the time to compare them:

1) one by one osd adding

- least amount of pg rebalance
- will potentially re-re-balance data that has just been distributed
  with the next OSD phase in
- limits the impact if you have a bug in the hdd/ssd series

The biggest problem with this approach is that you will re-re-re-balance
data over and over again and that will slowdown the process significantly.

2) reweighted phase in

- Starting slow with reweighting to a small amount of its potential
- Allows to see how the new OSD performs
- Needs manual interaction for growing
- delays the phase in possibly for "longer than necessary"

We use this approach when phasing in multiple, larger OSDs that are from
a newer / not so well known series of disks.

3) noin / norebalance based phase in

- Interesting approach to delay rebalancing until the "proper/final" new
  storage is in place
- Unclear how much of a difference it makes if you insert the new set of
  osds within a short timeframe (i.e. adding 1 osd at minute 0, 2nd at
  minute 1, etc.)

4) All at once / randomly

- Least amount of manual tuning
- In a way something one "would expect" ceph to do right (but in
  practice doesn't all the time)
- Might (likely) cause short term re-adjustments
- Might cause client i/o slowdown (see next point)

5) General slowing down

What we actually do in datacenterlight.ch is slowing down phase ins by
default via the followign tunings:

# Restrain recovery operations so that normal cluster is not affected
[osd]
osd max backfills = 1
osd recovery max active = 1
osd recovery op priority = 2

This works well in about 90% of the cases for us.

Quite an interesting thread, thanks everyone for sharing!

Cheers,

Nico

Anthony D'Atri <anthony.datri@xxxxxxxxx> writes:

>> Hi,
>>
>> as far as I understand it,
>>
>> you get no real benefit with doing them one by one, as each osd add, can cause a lot of data to be moved to a different osd, even tho you just rebalanced it.
>
> Less than with older releases, but yeah.
>
> I’ve known someone who advised against doing them in parallel because one would — for a time — have PGs with multiple remaps in the acting set.  The objection may have been paranoia, I’m not sure.
>
> One compromise is to upweight the new OSDs one node at a time, so the churn is limited to one failure domain at a time.
>
> — aad
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

--
Sustainable and modern Infrastructures by ungleich.ch
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx