Ceph Balancer Limitations

Adam Tygart <mozes@xxxxxxx> · Wed, 11 Sep 2019 14:36:42 +0000

Hello all,

We're using Nautilus 14.2.2 (upgrading soon to 14.2.3) on 29 CentOS osd servers.

We've got a large variation of disk sizes and host densities. Such
that the default crush mappings lead to an unbalanced data and pg
distribution.

We enabled the balancer manager module in pg upmap mode. The balancer
commands frequently hang indefinitely when enabled and then queried.
Even issuing a balancer off will hang for hours unless issued within
about a minute of the manager restarting. I digress.

In upmap mode, it looks like ceph only moves osd mappings within a
host. Is this the case?

I bring this up because we've got one disk that is sitting at 88%
utilization and I've been unable to bring this down. The next most
utilized disks are at 80%, and even then, I think that could be
reduced.

If the limitation is that upmap mode cannot map to osds to different
hosts, than that might be something to document. As it is a
significant difference to crush-compat.

Another thing to document would be how to move between the two modes.

I think this is needed to move between crush-compat and upmap: ceph
osd crush weight-set rm-compat

I don't know about the reverse, though.

ceph osd df tree [1]
pg upmap items from the osdmap [2]

[1] https://people.cs.ksu.edu/~mozes/ceph_balancer_query/ceph_osd_df_tree.txt
[2] https://people.cs.ksu.edu/~mozes/ceph_balancer_query/pg_upmap_items.txt

--
Adam
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com