Problems after migrating to straw2 (to enable the balancer)

Massimo Sgaravatto <massimo.sgaravatto@xxxxxxxxx> · Mon, 14 Jan 2019 15:06:37 +0100

I have a ceph luminous cluster running on CentOS7 nodes.
This cluster has 50 OSDs, all with the same size and all with the same weight.

Since I noticed that there was a quite "unfair" usage of OSD nodes (some used at 30 %, some used at 70 %) I tried to activate the balancer.

But the balancer doesn't start I guess because of this problem:

[root@ceph-mon-01 ~]# ceph osd crush weight-set create-compat
Error EPERM: crush map contains one or more bucket(s) that are not straw2

So I issued the command to convert from straw to straw2 (all the clients are running luminous):

[root@ceph-mon-01 ~]# ceph osd crush set-all-straw-buckets-to-straw2 
Error EINVAL: new crush map requires client version hammer but require_min_compat_client is firefly
[root@ceph-mon-01 ~]# ceph osd set-require-min-compat-client jewel 
set require_min_compat_client to jewel
[root@ceph-mon-01 ~]# ceph osd crush set-all-straw-buckets-to-straw2 
[root@ceph-mon-01 ~]# 

After having issued the command, the cluster went in WARNING state because ~ 12 % objects were misplaced.

Is this normal ?
I read somewhere that the migration from straw to straw2 should trigger a data migration only if the OSDs have different sizes, which is not my case.

The cluster is still recovering, but what is worrying me is that it looks like that data are being moved to the most used OSDs and the MAX_AVAIL value is decreasing quite quickly.

I hope that the recovery can finish without causing problems: then I will immediately activate the balancer.

But, if some OSDs are getting too full, is it safe to decrease their weights  while the cluster is still being recovered ?

Thanks a lot for your help
Of course I can provide other info, if needed

Cheers, Massimo

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com