Re: crush chooseleaf vs. choose

Sage Weil <sage@xxxxxxxxxxx> · Fri, 3 Jan 2014 08:46:09 -0800 (PST)

Run

'ceph osd crush tunables optimal'

or adjust an offline map file via the crushtool command line (more 
annoying) and retest; I suspect that is the problem.

http://ceph.com/docs/master/rados/operations/crush-map/#tunables

sage

On Fri, 3 Jan 2014, Dietmar Maurer wrote:

> > In both cases, you only get 2 replicas on the remaining 2 hosts.
> 
> OK, I was able to reproduce this with crushtool.
> 
> > The difference is if you have 4 hosts with 2 osds.  In the choose case, you have
> > some fraction of the data that chose the down host in the first step (most of the
> > attempts, actually!) and then couldn't find a usable osd, leaving you with only 2
> 
> This is also reproducible.
> 
> > replicas.  With chooseleaf that doesn't happen.
> > 
> > The other difference is if you have one of the two OSDs on the host marked out.
> > In the choose case, the remaining OSD will get allocated 2x the data; in the
> > chooseleaf case, usage will remain proportional with the rest of the cluster and
> > the data from the out OSD will be distributed across other OSDs (at least when
> > there are > 3 hosts!).
> 
> I see, but data distribution seems not optimal in that case.
> 
> For example using this crush map:
> 
> # types
> type 0 osd
> type 1 host
> type 2 rack
> type 3 row
> type 4 room
> type 5 datacenter
> type 6 root
> 
> # buckets
> host prox-ceph-1 {
> 	id -2		# do not change unnecessarily
> 	# weight 7.260
> 	alg straw
> 	hash 0	# rjenkins1
> 	item osd.0 weight 3.630
> 	item osd.1 weight 3.630
> }
> host prox-ceph-2 {
> 	id -3		# do not change unnecessarily
> 	# weight 7.260
> 	alg straw
> 	hash 0	# rjenkins1
> 	item osd.2 weight 3.630
> 	item osd.3 weight 3.630
> }
> host prox-ceph-3 {
> 	id -4		# do not change unnecessarily
> 	# weight 3.630
> 	alg straw
> 	hash 0	# rjenkins1
> 	item osd.4 weight 3.630
> }
> 
> host prox-ceph-4 {
> 	id -5		# do not change unnecessarily
> 	# weight 3.630
> 	alg straw
> 	hash 0	# rjenkins1
> 	item osd.5 weight 3.630
> }
> 
> root default {
> 	id -1		# do not change unnecessarily
> 	# weight 21.780
> 	alg straw
> 	hash 0	# rjenkins1
> 	item prox-ceph-1 weight 7.260   # 2 OSDs
> 	item prox-ceph-2 weight 7.260   # 2 OSDs
> 	item prox-ceph-3 weight 3.630   # 1 OSD
> 	item prox-ceph-4 weight 3.630   # 1 OSD
> }
> 
> # rules
> rule data {
> 	ruleset 0
> 	type replicated
> 	min_size 1
> 	max_size 10
> 	step take default
> 	step chooseleaf firstn 0 type host
> 	step emit
> }
> # end crush map
> 
> crushtool shows the following utilization:
> 
> # crushtool --test -i my.map --rule 0 --num-rep 3 --show-utilization
>   device 0:	423
>   device 1:	452
>   device 2:	429
>   device 3:	452
>   device 4:	661
>   device 5:	655
> 
> Any explanation for that?  Maybe related to the small number of devices?
> 
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com