Re: crushmap rule issue: choose vs. chooseleaf

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2010-06-23 at 15:20 -0600, Sage Weil wrote:
> On Wed, 23 Jun 2010, Jim Schutt wrote:
> > I've been trying to get custom CRUSH maps to work, based on
> > http://ceph.newdream.net/wiki/Custom_data_placement_with_CRUSH
> > 
> > I've not had any success until I dumped the map from
> > a simple 4 device setup.  I noticed that map had a
> > rule using:
> >   step choose firstn 0 type device
> > 
> > whereas all the custom maps I was trying to build used
> > chooseleaf rather than choose.  So I modified those
> > default 4 device map rules to be:
> >   step chooseleaf firstn 0 type device
> 
> Hmm.  It's non-obvious, and should probably work, but chooseleaf on a 
> 'device' (which is the leaf) currently doesn't work.  If you have a 
> hiearchy like
> 
> root
> host
> controller
> disk
> device
> 
> You can either
> 
>          step take root
>          step choose firstn 0 type controller
>          step choose firstn 1 type device
>          step emit
> 
> to get N distinct controllers, and then for each of those, choose 1 
> device.  Or,
> 
>          step take root
>          step chooseleaf firstn 0 type controller
>          step emit
> 
> to choose (a device nested beneath) N distinct controllers.  The 
> difference is the latter will try to pick a nested device for each 
> controller and, if it can't find one, reject the controller choice and 
> continue.  It prevents situations where you have a controller with no 
> usable devices beneath it, the first rules picks one of those controllers 
> in the 'choose firstn 0 type controller' step, but then can't find a 
> device and you end up with (n-1) results.
> 
> The first problem you had was a bug when chooseleaf was given the leaf 
> type (device).  It normally takes intermediate type in the heirarchy, not 
> the leaf type.  That's now fixed, and should give an identical result to 
> 'choose' in that case.

OK, thanks.

> 
> 
> > Based on that, I reworked some of test maps with deeper device
> > hierarchies I had been trying, and got them to work
> > (i.e. the file system started) when I avoided chooseleaf rules.
> > 
> > E.g. with a device hierarchy like this
> > (a device here is a partition, as I am still
> > testing on limited hardware):
> > 
> > type 0 device
> > type 1 disk
> > type 2 controller
> > type 3 host
> > type 4 root
> > 
> > a map with rules like this worked:
> > 
> > rule data {
> >         ruleset 0
> >         type replicated
> >         min_size 2
> >         max_size 2
> >         step take root
> >         step choose firstn 0 type host
> >         step choose firstn 0 type controller
> >         step choose firstn 0 type disk
> >         step choose firstn 0 type device
> >         step emit
> > }

Based on your above explanation, I suspect this wasn't
doing what I wanted.

> > 
> > but a map with rules like this didn't:
> > 
> > rule data {
> >         ruleset 0
> >         type replicated
> >         min_size 2
> >         max_size 2
> >         step take root
> >         step chooseleaf firstn 0 type controller
> >         step emit
> > }
> 
> Hmm, this should work (assuming there are actually nodes of type 
> controller in the tree).  Can you send along the actual map you're trying?

Sure.  I've been using multiple partitions
per disk for learning about CRUSH maps, so 
in this map a device is a partition.

Here it is:

# begin crush map

# devices
device 0 device0
device 1 device1
device 2 device2
device 3 device3

# types
type 0 device
type 1 disk
type 2 controller
type 3 host
type 4 root

# buckets
disk disk0 {
	id -1		# do not change unnecessarily
	alg uniform	# do not change bucket size (1) unnecessarily
	hash 0	# rjenkins1
	item device0 weight 1.000 pos 0
}
disk disk1 {
	id -2		# do not change unnecessarily
	alg uniform	# do not change bucket size (1) unnecessarily
	hash 0	# rjenkins1
	item device1 weight 1.000 pos 0
}
disk disk2 {
	id -3		# do not change unnecessarily
	alg uniform	# do not change bucket size (1) unnecessarily
	hash 0	# rjenkins1
	item device2 weight 1.000 pos 0
}
disk disk3 {
	id -4		# do not change unnecessarily
	alg uniform	# do not change bucket size (1) unnecessarily
	hash 0	# rjenkins1
	item device3 weight 1.000 pos 0
}
controller controller0 {
	id -5		# do not change unnecessarily
	alg uniform	# do not change bucket size (2) unnecessarily
	hash 0	# rjenkins1
	item disk0 weight 1.000 pos 0
	item disk1 weight 1.000 pos 1
}
controller controller1 {
	id -6		# do not change unnecessarily
	alg uniform	# do not change bucket size (2) unnecessarily
	hash 0	# rjenkins1
	item disk2 weight 1.000 pos 0
	item disk3 weight 1.000 pos 1
}
host host0 {
	id -7		# do not change unnecessarily
	alg uniform	# do not change bucket size (2) unnecessarily
	hash 0	# rjenkins1
	item controller0 weight 2.000 pos 0
	item controller1 weight 2.000 pos 1
}
root root {
	id -8		# do not change unnecessarily
	alg straw
	hash 0	# rjenkins1
	item host0 weight 4.000
}

# rules
rule data {
	ruleset 0
	type replicated
	min_size 2
	max_size 2
	step take root
	step chooseleaf firstn 0 type controller
	step emit
}
rule metadata {
	ruleset 1
	type replicated
	min_size 2
	max_size 2
	step take root
	step chooseleaf firstn 0 type controller
	step emit
}
rule casdata {
	ruleset 2
	type replicated
	min_size 2
	max_size 2
	step take root
	step chooseleaf firstn 0 type controller
	step emit
}
rule rbd {
	ruleset 3
	type replicated
	min_size 2
	max_size 2
	step take root
	step chooseleaf firstn 0 type controller
	step emit
}

# end crush map

When I try to start a file system built with the above map,
the monitor never accepts connections (from either ceph -w
or the cosd instances).

Thanks for taking a look.

-- Jim

> 
> Thanks-
> sage
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux