Take a look at https://github.com/TheJJ/ceph-balancer
We switched to it after lot of attempts to make internal balancer work
as expected and now we have ~even OSD utilization across cluster:
# ./placementoptimizer.py -v balance --ensure-optimal-moves
--ensure-variance-decrease
[2023-08-03 23:33:27,954] gathering cluster state via ceph api...
[2023-08-03 23:33:36,081] running pg balancer
[2023-08-03 23:33:36,088] current OSD fill rate per crushclasses:
[2023-08-03 23:33:36,089] ssd: average=49.86%, median=50.27%,
without_placement_constraints=53.01%
[2023-08-03 23:33:36,090] cluster variance for crushclasses:
[2023-08-03 23:33:36,090] ssd: 4.163
[2023-08-03 23:33:36,090] min osd.14 44.698%
[2023-08-03 23:33:36,090] max osd.22 51.897%
[2023-08-03 23:33:36,101] in descending full-order, couldn't empty
osd.22, so we're done. if you want to try more often, set
--max-full-move-attempts=$nr, this may unlock more balancing possibilities.
[2023-08-03 23:33:36,101]
--------------------------------------------------------------------------------
[2023-08-03 23:33:36,101] generated 0 remaps.
[2023-08-03 23:33:36,101] total movement size: 0.0B.
[2023-08-03 23:33:36,102]
--------------------------------------------------------------------------------
[2023-08-03 23:33:36,102] old cluster variance per crushclass:
[2023-08-03 23:33:36,102] ssd: 4.163
[2023-08-03 23:33:36,102] old min osd.14 44.698%
[2023-08-03 23:33:36,102] old max osd.22 51.897%
[2023-08-03 23:33:36,102]
--------------------------------------------------------------------------------
[2023-08-03 23:33:36,103] new min osd.14 44.698%
[2023-08-03 23:33:36,103] new max osd.22 51.897%
[2023-08-03 23:33:36,103] new cluster variance:
[2023-08-03 23:33:36,103] ssd: 4.163
[2023-08-03 23:33:36,103]
--------------------------------------------------------------------------------
On 03.08.2023 16:38, Spiros Papageorgiou wrote:
On 03-Aug-23 12:11 PM, Eugen Block wrote:
ceph balancer status
I changed the PGs and it started rebalancing (and turned autoscaler
off) , so now it will not report status:
It reports: "optimize_result": "Too many objects (0.088184 > 0.050000)
are misplaced; try again later"
Lets wait a few hours to see what happens...
Thanx!
Sp
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx