On Thu, 17 Oct 2013, Mark Kirkwood wrote: > I stumbled across this today: > > 4 osds on 4 hosts (names ceph1 -> ceph4). They are KVM guests (this is a play > setup). > > - ceph1 and ceph2 each have a 5G volume for osd data (+ 2G vol for journal) > - ceph3 and ceph4 each have a 10G volume for osd data (+ 2G vol for journal) > > I do a standard installation via ceph-deploy (1.2.7) of ceph (0.67.4) on each > one [1]. The topology looks like: > > $ ceph osd tree > # id weight type name up/down reweight > -1 0.01999 root default > -2 0 host ceph1 > 0 0 osd.0 up 1 > -3 0 host ceph2 > 1 0 osd.1 up 1 > -4 0.009995 host ceph3 > 2 0.009995 osd.2 up 1 > -5 0.009995 host ceph4 > 3 0.009995 osd.3 up 1 > > So osd.0 and osd.1 (on ceph1,2) have weight 0, and osd2 and osd.3 (on ceph3,4) > have weight 0.009995 this suggests that data will flee osd.0,1 and live only > on osd.3.4. Sure enough putting in a few objects via radus put results in: > > ceph1 $ df -m > Filesystem 1M-blocks Used Available Use% Mounted on > /dev/vda1 5038 2508 2275 53% / > udev 994 1 994 1% /dev > tmpfs 401 1 401 1% /run > none 5 0 5 0% /run/lock > none 1002 0 1002 0% /run/shm > /dev/vdb1 5109 40 5070 1% /var/lib/ceph/osd/ceph-0 > > (similarly for ceph2), whereas: > > ceph3 $df -m > Filesystem 1M-blocks Used Available Use% Mounted on > /dev/vda1 5038 2405 2377 51% / > udev 994 1 994 1% /dev > tmpfs 401 1 401 1% /run > none 5 0 5 0% /run/lock > none 1002 0 1002 0% /run/shm > /dev/vdb1 10229 1315 8915 13% /var/lib/ceph/osd/ceph-2 > > (similarly for ceph4). Obviously I can fix this via the reweighting the first > two osds to something like 0.005, but I'm wondering if there is something I've > missed - clearly some kind of auto weighting is has been performed on the > basis of the size difference in the data volumes, but looks to be skewing data > far too much to the bigger ones. Is there perhaps a bug in the smarts for > this? Or is it just because I'm using small volumes (5G = 0 weight)? Yeah, I think this is just rounding error. By default a weight of 1.0 == 1 TB, so these are just very small numbers. Internally, we're storing as a fixed-point 32-bit value where 1.0 == 0x10000, and 5MB is just too small for those units. You can disable this autoweighting with osd crush update on start = false in ceph.conf. sage > > Cheers > > Mark > > [1] i.e: > > $ ceph-deploy new ceph1 > $ ceph-deploy mon create ceph1 > $ ceph-deploy gatherkeys ceph1 > $ ceph-deploy osd create ceph1:/dev/vdb:/dev/vdc > ... > $ ceph-deploy osd create ceph4:/dev/vdb:/dev/vdc > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com