> > Based on our observation of the impact of the balancer on the > performance of the entire cluster, we have drawn conclusions that we > would like to discuss with you. > > - A newly created pool should be balanced before being handed over > to the user. This, I believe, is quite evident. > I think this question might contain a lot of hidden assumptions, so it's hard to respond to in a correct manner. Using rgw means you get some 7-10-13 different pools depending on if you use either swift/s3 or all at the same time. In this case, only one or a few of those pools need care before doing bulk work, the rest are quite fine being very small and .. "unbalanced". > - When replacing a disk, it is advisable to exchange it directly > for a new one. As soon as the OSD replacement occurs, the balancer > should be invoked to realign any improperly placed PGs during the disk > outage and disk recovery. > Not that I think the default behaviours are optimal in any way, but the above text seems to describe what actually does happen, even if the balancer may not be involved, the normal crush "repairs" of an imbalanced cluster will even the data out when the new OSD is in place. Perhaps an even better method is to pause recovery and backfilling > before removing the disk, remove the disk itself, promptly add a new > one, and then resume recovery and backfilling. It's essential to per > form all of this as quickly as possible (using a script). > Here I would just state "set norebalance (and noout if you must stop the whole OSD host) before removing the old and adding the new OSD", then when the new OSD is created and started, you unset the options and let it repair back to the newly added OSD. > Ad. We are using a community balancer developed by Jonas Jelton because > the built-in one does not meet our requirements. > We sometimes use the python or go upmap remapper scripts/programs to have the cluster be less sad while moving a small number of PGs at a time, but that is more or less just for convenience and to let scrubs run on the non-moving PGs if the data movements are expected to take long calendar time. -- May the most significant bit of your life be positive. _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx