Adding Rack to crushmap - Rebalancing multiple PB of data - advice/experience

"Anthony D'Atri" <anthony.datri@xxxxxxxxx> · Sun, 12 Jan 2025 15:48:15 -0500

[ ed: snag during moderation (somehow a newline was interpolated in the Subject), so I’m sending this on behalf of kasper_steengaard@xxxxxxxxxxx <mailto:kasper_steengaard@xxxxxxxxxxx> , to whom replies should be sent]

I'm managing a ceph cluster with +1K OSDs distributed accross 56 host.
Until now the crush rule used is the default replicated rule, but I want to change that in order to implement failure domain on rack level.

Current plan is to
- Disable rebalancing by executing - ceph osd set norebalance;
- Add Rack to crushmap and distribute the hosts accordingly (8 in each.) by using the built in commands
      - ceph osd crush add-bucket rack1 rack root=default
      - ceph osd crush move osd-host1 rack=rack1
- Create the new rack split rule with command
       -  ceph osd crush rule create-replicated rack_split default rack
- Set the rule across all my pools
      - for p in $(ceph osd lspools | cut -d' ' -f 2) ; do echo $p $(ceph osd pool set $p crush_rule rack_split) ; done
- Finally enable rebalancing - ceph osd unset norebalance;

However I'm concerned with the amount of data that needs to be rebalanced, since
 the cluster holds multiple PB, and I'm looking for review of/input for my plan,
 as well as words of advice/experience from someone who have been in similar situations.

——

[
 ed: You only include commands for one CRUSH `rack` — would you create multiple `rack` CRUSH buckets, at least three of them?

Are all of your pools replicated?  No EC pools for RGW buckets, CephFS data, etc?

What OSD media and networking does this cluster have? HDDs will be much slower and much more impacted during the process than SSDs. Is your client workload 24x7? Which Ceph release? These factors inform how impactful the grand shuffle will be.  Are your mon DBs on SSDs?

A popular strategy is to use upmap-remapped.py to freeze all of the PG mappings before unsetting the norebalance flag, then the balancer will gradually undo the mappings as it moves data to where it now belongs.  This process has built-in throttling.

]
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx