I've to say I am reading quite some interesting strategies in this thread and I'd like to shortly take the time to compare them: 1) one by one osd adding - least amount of pg rebalance - will potentially re-re-balance data that has just been distributed with the next OSD phase in - limits the impact if you have a bug in the hdd/ssd series The biggest problem with this approach is that you will re-re-re-balance data over and over again and that will slowdown the process significantly. 2) reweighted phase in - Starting slow with reweighting to a small amount of its potential - Allows to see how the new OSD performs - Needs manual interaction for growing - delays the phase in possibly for "longer than necessary" We use this approach when phasing in multiple, larger OSDs that are from a newer / not so well known series of disks. 3) noin / norebalance based phase in - Interesting approach to delay rebalancing until the "proper/final" new storage is in place - Unclear how much of a difference it makes if you insert the new set of osds within a short timeframe (i.e. adding 1 osd at minute 0, 2nd at minute 1, etc.) 4) All at once / randomly - Least amount of manual tuning - In a way something one "would expect" ceph to do right (but in practice doesn't all the time) - Might (likely) cause short term re-adjustments - Might cause client i/o slowdown (see next point) 5) General slowing down What we actually do in datacenterlight.ch is slowing down phase ins by default via the followign tunings: # Restrain recovery operations so that normal cluster is not affected [osd] osd max backfills = 1 osd recovery max active = 1 osd recovery op priority = 2 This works well in about 90% of the cases for us. Quite an interesting thread, thanks everyone for sharing! Cheers, Nico Anthony D'Atri <anthony.datri@xxxxxxxxx> writes: >> Hi, >> >> as far as I understand it, >> >> you get no real benefit with doing them one by one, as each osd add, can cause a lot of data to be moved to a different osd, even tho you just rebalanced it. > > Less than with older releases, but yeah. > > I’ve known someone who advised against doing them in parallel because one would — for a time — have PGs with multiple remaps in the acting set. The objection may have been paranoia, I’m not sure. > > One compromise is to upweight the new OSDs one node at a time, so the churn is limited to one failure domain at a time. > > — aad > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx -- Sustainable and modern Infrastructures by ungleich.ch _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx