Re: pool size 1 RBD distribution

Wido den Hollander <wido@xxxxxxxx> · Thu, 05 Dec 2013 14:35:42 +0100

On 12/05/2013 10:52 AM, Wolfgang Hennerbichler wrote:
hi ceph,

just for testing (on emperor  0.72.1) I created two OSD’s on a single server, resized the pool to a replication factor of one, and created 200 PG’s for that pool:

# ceph osd dump
...
pool 4 'rbd' rep size 1 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 200 pgp_num 200 last_change 64 owner 18446744073709551615

crush_ruleset 0 is - IMHO - stating that data should be distributed on the OSD level (step chooseleaf firstn 0 type osd):

# rules
rule data {
         ruleset 0
         type replicated
         min_size 1
         max_size 10
         step take default
         step chooseleaf firstn 0 type osd
         step emit
}

Now I do an rbd import of an RBD Image (which is 1G in size), and I would expect that RBD image to stripe across the two OSD’s. Well, this is just not happening, everything sits on OSD2 (osd1 and osd0 have been removed in the mean time, they have been part of the first test):

Could you run this against your crushmap?

$ ceph osd getcrushmap -o crushmap
$ crushtool --test -i crushmap --num-rep 1 --rule 0 --show-statistics

I tried that locally and gave me a result like:

rule 0 (data), x = 0..1023, numrep = 1..1
CRUSH rule 0 x 0 [0]
..
..
CRUSH rule 0 x 1019 [1]
CRUSH rule 0 x 1020 [0]
CRUSH rule 0 x 1021 [1]
CRUSH rule 0 x 1022 [1]
CRUSH rule 0 x 1023 [0]
rule 0 (data) num_rep 1 result size == 1:	1024/1024

My plain-text crushmap is attached and there you see it works. So I'm 
curious to what output you have?

Wido

# df -h
...
/dev/vdc1       2.0G  905M  1.2G  45% /var/lib/ceph/osd/ceph-2
/dev/vdb1       2.0G   37M  2.0G   2% /var/lib/ceph/osd/ceph-3

# ceph -w
     cluster 6db7c956-cfbb-437a-88b6-78e1c9e68c80
      health HEALTH_OK
      monmap e1: 1 mons at {ceph-node1=XXX:6789/0}, election epoch 1, quorum 0 ceph-node1
      osdmap e65: 2 osds: 2 up, 2 in
       pgmap v187: 200 pgs, 5 pools, 868 MB data, 220 objects
             941 MB used, 3132 MB / 4073 MB avail
                  200 active+clean

2013-12-05 09:46:43.210312 mon.0 [INF] pgmap v187: 200 pgs: 200 active+clean; 868 MB data, 941 MB used, 3132 MB / 4073 MB avail

Any hints are more than welcome, this is for sure not a real life scenario, but it really confuses my understanding of ceph.
Wolfgang

--
Wido den Hollander
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on
# begin crush map

# devices
device 0 osd.0
device 1 osd.1

# types
type 0 osd
type 1 host
type 2 rack
type 3 row
type 4 room
type 5 datacenter
type 6 root

# buckets
host wido-laptop {
	id -2		# do not change unnecessarily
	# weight 2.000
	alg straw
	hash 0	# rjenkins1
	item osd.0 weight 1.000
	item osd.1 weight 1.000
}
rack unknownrack {
	id -3		# do not change unnecessarily
	# weight 2.000
	alg straw
	hash 0	# rjenkins1
	item wido-laptop weight 2.000
}
root default {
	id -1		# do not change unnecessarily
	# weight 2.000
	alg straw
	hash 0	# rjenkins1
	item unknownrack weight 2.000
}

# rules
rule data {
	ruleset 0
	type replicated
	min_size 1
	max_size 10
	step take default
	step choose firstn 0 type osd
	step emit
}
rule metadata {
	ruleset 1
	type replicated
	min_size 1
	max_size 10
	step take default
	step choose firstn 0 type osd
	step emit
}
rule rbd {
	ruleset 2
	type replicated
	min_size 1
	max_size 10
	step take default
	step choose firstn 0 type osd
	step emit
}

# end crush map
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com