I may be wrong, but I always thought that a weight of 0 means don't put anything there. All weights > 0 will be looked at proportionally. See http://ceph.com/docs/master/rados/operations/crush-map/ which recommends higher weights anyway: Weighting Bucket Items Ceph expresses bucket weights as double integers, which allows for fine weighting. A weight is the relative difference between device capacities. We recommend using 1.00 as the relative weight for a 1TB storage device. In such a scenario, a weight of 0.5 would represent approximately 500GB, and a weight of 3.00 would represent approximately 3TB. Higher level buckets have a weight that is the sum total of the leaf items aggregated by the bucket. A bucket item weight is one dimensional, but you may also calculate your item weights to reflect the performance of the storage drive. For example, if you have many 1TB drives where some have relatively low data transfer rate and the others have a relatively high data transfer rate, you may weight them differently, even though they have the same capacity (e.g., a weight of 0.80 for the first set of drives with lower total throughput, and 1.20 for the second set of drives with higher total throughput). David Zafman Senior Developer http://www.inktank.com On Oct 16, 2013, at 8:15 PM, Mark Kirkwood <mark.kirkwood@xxxxxxxxxxxxxxx> wrote: > I stumbled across this today: > > 4 osds on 4 hosts (names ceph1 -> ceph4). They are KVM guests (this is a play setup). > > - ceph1 and ceph2 each have a 5G volume for osd data (+ 2G vol for journal) > - ceph3 and ceph4 each have a 10G volume for osd data (+ 2G vol for journal) > > I do a standard installation via ceph-deploy (1.2.7) of ceph (0.67.4) on each one [1]. The topology looks like: > > $ ceph osd tree > # id weight type name up/down reweight > -1 0.01999 root default > -2 0 host ceph1 > 0 0 osd.0 up 1 > -3 0 host ceph2 > 1 0 osd.1 up 1 > -4 0.009995 host ceph3 > 2 0.009995 osd.2 up 1 > -5 0.009995 host ceph4 > 3 0.009995 osd.3 up 1 > > So osd.0 and osd.1 (on ceph1,2) have weight 0, and osd2 and osd.3 (on ceph3,4) have weight 0.009995 this suggests that data will flee osd.0,1 and live only on osd.3.4. Sure enough putting in a few objects via radus put results in: > > ceph1 $ df -m > Filesystem 1M-blocks Used Available Use% Mounted on > /dev/vda1 5038 2508 2275 53% / > udev 994 1 994 1% /dev > tmpfs 401 1 401 1% /run > none 5 0 5 0% /run/lock > none 1002 0 1002 0% /run/shm > /dev/vdb1 5109 40 5070 1% /var/lib/ceph/osd/ceph-0 > > (similarly for ceph2), whereas: > > ceph3 $df -m > Filesystem 1M-blocks Used Available Use% Mounted on > /dev/vda1 5038 2405 2377 51% / > udev 994 1 994 1% /dev > tmpfs 401 1 401 1% /run > none 5 0 5 0% /run/lock > none 1002 0 1002 0% /run/shm > /dev/vdb1 10229 1315 8915 13% /var/lib/ceph/osd/ceph-2 > > (similarly for ceph4). Obviously I can fix this via the reweighting the first two osds to something like 0.005, but I'm wondering if there is something I've missed - clearly some kind of auto weighting is has been performed on the basis of the size difference in the data volumes, but looks to be skewing data far too much to the bigger ones. Is there perhaps a bug in the smarts for this? Or is it just because I'm using small volumes (5G = 0 weight)? > > Cheers > > Mark > > [1] i.e: > > $ ceph-deploy new ceph1 > $ ceph-deploy mon create ceph1 > $ ceph-deploy gatherkeys ceph1 > $ ceph-deploy osd create ceph1:/dev/vdb:/dev/vdc > ... > $ ceph-deploy osd create ceph4:/dev/vdb:/dev/vdc > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com