Balancer not balancing (14.2.7, crush-compat)

Vladimir Brik <vladimir.brik@xxxxxxxxxxxxxxxx> · Thu, 9 Apr 2020 11:28:12 -0500

Hello

I am running ceph 14.2.7 with balancer in crush-compat mode (needed 
because of old clients), but it's doesn't seem to be doing anything. It 
used to work in the past. I am not sure what changed. I created a big 
pool, ~285TB stored, and it doesn't look like it ever got balanced:
pool 43 'fs-data-k5m2-hdd' erasure size 7 min_size 6 crush_rule 7 
object_hash rjenkins pg_num 2048 pgp_num 2048 autoscale_mode warn 
last_change 48647 lfor 0/42080/42102 flags 
hashpspool,ec_overwrites,nearfull stripe_width 20480 application cephfs

OSD utilization varies between ~50% and about ~80%, with about 60% raw 
used. I am using a mixture of 9TB and 14TB drives. Number of PGs/drive 
varies 103 and 207.

# ceph osd df | grep hdd | sort -k 17 | (head -n 2; tail -n 2)
160   hdd 12.53519  1.00000  13 TiB 6.0 TiB 5.9 TiB  74 KiB  12 GiB 6.6 
TiB 47.74 0.79 120     up
146   hdd 12.53519  1.00000  13 TiB 6.0 TiB 6.0 TiB  51 MiB  13 GiB 6.5 
TiB 48.17 0.80 119     up
 79   hdd  8.99799  1.00000 9.0 TiB 7.3 TiB 7.2 TiB  42 KiB  16 GiB 1.7 
TiB 80.91 1.34 186     up
 62   hdd  8.99799  1.00000 9.0 TiB 7.3 TiB 7.2 TiB 112 KiB  16 GiB 1.7 
TiB 81.44 1.35 189     up

# ceph balancer status
{
    "last_optimize_duration": "0:00:00.339635",
    "plans": [],
    "mode": "crush-compat",
    "active": true,
    "optimize_result": "Some osds belong to multiple subtrees: {0: 
['default', 'default~hdd'], ...
    "last_optimize_started": "Thu Apr  9 11:17:40 2020"
}

Does anybody know how to debug this?

Thanks,

Vlad
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx