Re: Adding OSD nodes

Frédéric Nass <frederic.nass@xxxxxxxxxxxxxxxx> · Tue, 18 Mar 2025 15:59:46 +0100 (CET)

Hi Sinan,

The safest approach would be to use the upmap-remapped.py tool developed by Dan at CERN. See [1] for details.

The idea is to leverage the upmap load balancer to progressively migrate the data to the new servers, minimizing performance impact on the cluster and clients. I like to create the OSDs ahead of time on the nodes that I initially place in a root directory called ‘closet’.

I then apply the norebalance flag (ceph osd set norebalance), disable the balancer (ceph balancer off), move the new nodes with already provisioned OSDs to their final location (rack), run ./upmap-remapped.py to bring all PGs back to active+clean state, remove the norebalance flag (ceph osd unset norebalance), re-enable the balancer (ceph balancer on) and watch data moving progressively as the upmap balancer executes its plans.

Regards,
Frédéric.

[1] https://docs.clyso.com/blog/adding-capacity-with-upmap-remapped/

----- Le 17 Mar 25, à 17:51, Sinan Polat sinan86polat@xxxxxxxxx a écrit :

> Hello,
> 
> I am currently managing a Ceph cluster that consists of 3 racks, each with
> 4 OSD nodes. Each node contains 24 OSDs. I plan to add three new nodes, one
> to each rack, to help alleviate the high OSD utilization.
> 
> The current highest OSD utilization is 85%. I am concerned about the
> possibility of any OSD reaching the osd_full_ratio threshold during the
> rebalancing process. This would cause the cluster to enter a read-only
> state, which I want to avoid at all costs.
> 
> I am planning to execute the following commands:
> 
> ceph orch host add new-node-1
> ceph orch host add new-node-2
> ceph orch host add new-node-3
> 
> ceph osd crush move new-node-1 rack=rack-1
> ceph osd crush move new-node-2 rack=rack-2
> ceph osd crush move new-node-3 rack=rack-3
> 
> ceph config set osd osd_max_backfills 1
> ceph config set osd osd_recovery_max_active 1
> ceph config set osd osd_recovery_sleep 0.1
> 
> ceph orch apply osd --all-available-devices
> 
> Before proceeding, I would like to ask if the above steps are safe to
> execute in a cluster with such high utilization. My main concern is whether
> the rebalancing could cause any OSD to exceed the osd_full_ratio and result
> in unexpected failures.
> 
> Any insights or advice on how to safely add these nodes without impacting
> cluster stability would be greatly appreciated.
> 
> Thanks!
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx