On Mon, 6 Jan 2014, Dietmar Maurer wrote: > > 'ceph osd crush tunables optimal' > > > > or adjust an offline map file via the crushtool command line (more > > annoying) and retest; I suspect that is the problem. > > > > http://ceph.com/docs/master/rados/operations/crush-map/#tunables > > That solves the bug with weight 0, thanks. > > But is still get the following distribution: > > device 0: 423 > device 1: 453 > device 2: 430 > device 3: 455 > device 4: 657 > device 5: 654 > > Host with only one osd gets too much data. I think this is just fundamentally a problem with distributing 3 replicas over only 4 hosts. Every piece of data in the system needs to include either host 3 or 4 (and thus device 4 or 5) in order to have 3 replicas (on separate hosts). Add more hosts or disks and the distribution will even out. sage > > > On Fri, 3 Jan 2014, Dietmar Maurer wrote: > > > > > > In both cases, you only get 2 replicas on the remaining 2 hosts. > > > > > > OK, I was able to reproduce this with crushtool. > > > > > > > The difference is if you have 4 hosts with 2 osds. In the choose > > > > case, you have some fraction of the data that chose the down host in > > > > the first step (most of the attempts, actually!) and then couldn't > > > > find a usable osd, leaving you with only 2 > > > > > > This is also reproducible. > > > > > > > replicas. With chooseleaf that doesn't happen. > > > > > > > > The other difference is if you have one of the two OSDs on the host marked > > out. > > > > In the choose case, the remaining OSD will get allocated 2x the > > > > data; in the chooseleaf case, usage will remain proportional with > > > > the rest of the cluster and the data from the out OSD will be > > > > distributed across other OSDs (at least when there are > 3 hosts!). > > > > > > I see, but data distribution seems not optimal in that case. > > > > > > For example using this crush map: > > > > > > # types > > > type 0 osd > > > type 1 host > > > type 2 rack > > > type 3 row > > > type 4 room > > > type 5 datacenter > > > type 6 root > > > > > > # buckets > > > host prox-ceph-1 { > > > id -2 # do not change unnecessarily > > > # weight 7.260 > > > alg straw > > > hash 0 # rjenkins1 > > > item osd.0 weight 3.630 > > > item osd.1 weight 3.630 > > > } > > > host prox-ceph-2 { > > > id -3 # do not change unnecessarily > > > # weight 7.260 > > > alg straw > > > hash 0 # rjenkins1 > > > item osd.2 weight 3.630 > > > item osd.3 weight 3.630 > > > } > > > host prox-ceph-3 { > > > id -4 # do not change unnecessarily > > > # weight 3.630 > > > alg straw > > > hash 0 # rjenkins1 > > > item osd.4 weight 3.630 > > > } > > > > > > host prox-ceph-4 { > > > id -5 # do not change unnecessarily > > > # weight 3.630 > > > alg straw > > > hash 0 # rjenkins1 > > > item osd.5 weight 3.630 > > > } > > > > > > root default { > > > id -1 # do not change unnecessarily > > > # weight 21.780 > > > alg straw > > > hash 0 # rjenkins1 > > > item prox-ceph-1 weight 7.260 # 2 OSDs > > > item prox-ceph-2 weight 7.260 # 2 OSDs > > > item prox-ceph-3 weight 3.630 # 1 OSD > > > item prox-ceph-4 weight 3.630 # 1 OSD > > > } > > > > > > # rules > > > rule data { > > > ruleset 0 > > > type replicated > > > min_size 1 > > > max_size 10 > > > step take default > > > step chooseleaf firstn 0 type host > > > step emit > > > } > > > # end crush map > > > > > > crushtool shows the following utilization: > > > > > > # crushtool --test -i my.map --rule 0 --num-rep 3 --show-utilization > > > device 0: 423 > > > device 1: 452 > > > device 2: 429 > > > device 3: 452 > > > device 4: 661 > > > device 5: 655 > > > > > > Any explanation for that? Maybe related to the small number of devices? > > > > > > > > > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com