Re: unbalanced OSDs

Pavlo Astakhov <jared@xxxxxxxxxxxxx> · Thu, 3 Aug 2023 23:38:03 +0200

Take a look at https://github.com/TheJJ/ceph-balancer

We switched to it after lot of attempts to make internal balancer work 
as expected and now we have ~even OSD utilization across cluster:

# ./placementoptimizer.py -v balance --ensure-optimal-moves 
--ensure-variance-decrease
[2023-08-03 23:33:27,954] gathering cluster state via ceph api...
[2023-08-03 23:33:36,081] running pg balancer
[2023-08-03 23:33:36,088] current OSD fill rate per crushclasses:
[2023-08-03 23:33:36,089]   ssd: average=49.86%, median=50.27%, 
without_placement_constraints=53.01%
[2023-08-03 23:33:36,090] cluster variance for crushclasses:
[2023-08-03 23:33:36,090]   ssd: 4.163
[2023-08-03 23:33:36,090] min osd.14 44.698%
[2023-08-03 23:33:36,090] max osd.22 51.897%
[2023-08-03 23:33:36,101] in descending full-order, couldn't empty 
osd.22, so we're done. if you want to try more often, set 
--max-full-move-attempts=$nr, this may unlock more balancing possibilities.
[2023-08-03 23:33:36,101] 
--------------------------------------------------------------------------------
[2023-08-03 23:33:36,101] generated 0 remaps.
[2023-08-03 23:33:36,101] total movement size: 0.0B.
[2023-08-03 23:33:36,102] 
--------------------------------------------------------------------------------
[2023-08-03 23:33:36,102] old cluster variance per crushclass:
[2023-08-03 23:33:36,102]   ssd: 4.163
[2023-08-03 23:33:36,102] old min osd.14 44.698%
[2023-08-03 23:33:36,102] old max osd.22 51.897%
[2023-08-03 23:33:36,102] 
--------------------------------------------------------------------------------
[2023-08-03 23:33:36,103] new min osd.14 44.698%
[2023-08-03 23:33:36,103] new max osd.22 51.897%
[2023-08-03 23:33:36,103] new cluster variance:
[2023-08-03 23:33:36,103]   ssd: 4.163
[2023-08-03 23:33:36,103] 
--------------------------------------------------------------------------------

On 03.08.2023 16:38, Spiros Papageorgiou wrote:
On 03-Aug-23 12:11 PM, Eugen Block wrote:
ceph balancer status

I changed the PGs and it started rebalancing (and turned autoscaler 
off) , so now it will not report status:

It reports: "optimize_result": "Too many objects (0.088184 > 0.050000) 
are misplaced; try again later"

Lets wait a few hours to see what happens...

Thanx!

Sp

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx