Very unbalanced osd data placement with differing sized devices

Mark Kirkwood <mark.kirkwood@xxxxxxxxxxxxxxx> · Thu, 17 Oct 2013 16:15:52 +1300

I stumbled across this today:

4 osds on 4 hosts (names ceph1 -> ceph4). They are KVM guests (this is a 
play setup).

- ceph1 and ceph2 each have a 5G volume for osd data (+ 2G vol for journal)
- ceph3 and ceph4 each have a 10G volume for osd data (+ 2G vol for journal)

I do a standard installation via ceph-deploy (1.2.7) of ceph (0.67.4) on 
each one [1]. The topology looks like:

$ ceph osd tree
# id    weight    type name    up/down    reweight
-1    0.01999    root default
-2    0        host ceph1
0    0            osd.0    up    1
-3    0        host ceph2
1    0            osd.1    up    1
-4    0.009995        host ceph3
2    0.009995            osd.2    up    1
-5    0.009995        host ceph4
3    0.009995            osd.3    up    1

So osd.0 and osd.1 (on ceph1,2) have weight 0, and osd2 and osd.3 (on 
ceph3,4) have weight 0.009995 this suggests that data will flee osd.0,1 
and live only on osd.3.4. Sure enough putting in a few objects via radus 
put results in:

ceph1 $ df -m
Filesystem     1M-blocks  Used Available Use% Mounted on
/dev/vda1           5038  2508      2275  53% /
udev                 994     1       994   1% /dev
tmpfs                401     1       401   1% /run
none                   5     0         5   0% /run/lock
none                1002     0      1002   0% /run/shm
/dev/vdb1           5109    40      5070   1% /var/lib/ceph/osd/ceph-0

(similarly for ceph2), whereas:

ceph3 $df -m
Filesystem     1M-blocks  Used Available Use% Mounted on
/dev/vda1           5038  2405      2377  51% /
udev                 994     1       994   1% /dev
tmpfs                401     1       401   1% /run
none                   5     0         5   0% /run/lock
none                1002     0      1002   0% /run/shm
/dev/vdb1          10229  1315      8915  13% /var/lib/ceph/osd/ceph-2

(similarly for ceph4). Obviously I can fix this via the reweighting the 
first two osds to something like 0.005, but I'm wondering if there is 
something I've missed - clearly some kind of auto weighting is has been 
performed on the basis of the size difference in the data volumes, but 
looks to be skewing data far too much to the bigger ones. Is there 
perhaps a bug in the smarts for this? Or is it just because I'm using 
small volumes (5G = 0 weight)?

Cheers

Mark

[1] i.e:

$ ceph-deploy new ceph1
$ ceph-deploy mon create ceph1
$ ceph-deploy gatherkeys ceph1
$ ceph-deploy osd create ceph1:/dev/vdb:/dev/vdc
...
$ ceph-deploy osd create ceph4:/dev/vdb:/dev/vdc
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com