> In both cases, you only get 2 replicas on the remaining 2 hosts. OK, I was able to reproduce this with crushtool. > The difference is if you have 4 hosts with 2 osds. In the choose case, you have > some fraction of the data that chose the down host in the first step (most of the > attempts, actually!) and then couldn't find a usable osd, leaving you with only 2 This is also reproducible. > replicas. With chooseleaf that doesn't happen. > > The other difference is if you have one of the two OSDs on the host marked out. > In the choose case, the remaining OSD will get allocated 2x the data; in the > chooseleaf case, usage will remain proportional with the rest of the cluster and > the data from the out OSD will be distributed across other OSDs (at least when > there are > 3 hosts!). I see, but data distribution seems not optimal in that case. For example using this crush map: # types type 0 osd type 1 host type 2 rack type 3 row type 4 room type 5 datacenter type 6 root # buckets host prox-ceph-1 { id -2 # do not change unnecessarily # weight 7.260 alg straw hash 0 # rjenkins1 item osd.0 weight 3.630 item osd.1 weight 3.630 } host prox-ceph-2 { id -3 # do not change unnecessarily # weight 7.260 alg straw hash 0 # rjenkins1 item osd.2 weight 3.630 item osd.3 weight 3.630 } host prox-ceph-3 { id -4 # do not change unnecessarily # weight 3.630 alg straw hash 0 # rjenkins1 item osd.4 weight 3.630 } host prox-ceph-4 { id -5 # do not change unnecessarily # weight 3.630 alg straw hash 0 # rjenkins1 item osd.5 weight 3.630 } root default { id -1 # do not change unnecessarily # weight 21.780 alg straw hash 0 # rjenkins1 item prox-ceph-1 weight 7.260 # 2 OSDs item prox-ceph-2 weight 7.260 # 2 OSDs item prox-ceph-3 weight 3.630 # 1 OSD item prox-ceph-4 weight 3.630 # 1 OSD } # rules rule data { ruleset 0 type replicated min_size 1 max_size 10 step take default step chooseleaf firstn 0 type host step emit } # end crush map crushtool shows the following utilization: # crushtool --test -i my.map --rule 0 --num-rep 3 --show-utilization device 0: 423 device 1: 452 device 2: 429 device 3: 452 device 4: 661 device 5: 655 Any explanation for that? Maybe related to the small number of devices? _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com