All replicas of pg 5.b got placed on the same host - how to correct?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Ceph-users,

after switching to luminous I was excited about the great crush-device-class feature - now we have 5 servers with 1x2TB NVMe based OSDs, 3 of them additionally with 4 HDDS per server. (we have only three 400G NVMe disks for block.wal and block.db and therefore can't distribute all HDDs evenly on all servers.)

Output from "ceph pg dump" shows that some PGs end up on HDD OSDs on the same
Host:

ceph pg map 5.b
osdmap e12912 pg 5.b (5.b) -> up [9,7,8] acting [9,7,8]

(on rebooting this host I had 4 stale PGs)

I've written a small perl script to add hostname after OSD number and got many PGs where
ceph placed 2 replicas on the same host... :

5.1e7: 8 - daniel 9 - daniel 11 - udo
5.1eb: 10 - udo 7 - daniel 9 - daniel
5.1ec: 10 - udo 11 - udo 7 - daniel
5.1ed: 13 - felix 16 - felix 5 - udo


Is there any way I can correct this?


Please see crushmap below. Thanks for any help!

# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54

# devices
device 0 osd.0 class hdd
device 1 device1
device 2 osd.2 class ssd
device 3 device3
device 4 device4
device 5 osd.5 class hdd
device 6 device6
device 7 osd.7 class hdd
device 8 osd.8 class hdd
device 9 osd.9 class hdd
device 10 osd.10 class hdd
device 11 osd.11 class hdd
device 12 osd.12 class hdd
device 13 osd.13 class hdd
device 14 osd.14 class hdd
device 15 device15
device 16 osd.16 class hdd
device 17 device17
device 18 device18
device 19 device19
device 20 device20
device 21 device21
device 22 device22
device 23 device23
device 24 osd.24 class hdd
device 25 device25
device 26 osd.26 class hdd
device 27 osd.27 class hdd
device 28 osd.28 class hdd
device 29 osd.29 class hdd
device 30 osd.30 class ssd
device 31 osd.31 class ssd
device 32 osd.32 class ssd
device 33 osd.33 class ssd

# types
type 0 osd
type 1 host
type 2 rack
type 3 row
type 4 room
type 5 datacenter
type 6 root

# buckets
host daniel {
	id -4		# do not change unnecessarily
	id -2 class hdd		# do not change unnecessarily
	id -9 class ssd		# do not change unnecessarily
	# weight 3.459
	alg straw2
	hash 0	# rjenkins1
	item osd.31 weight 1.819
	item osd.7 weight 0.547
	item osd.8 weight 0.547
	item osd.9 weight 0.547
}
host felix {
	id -5		# do not change unnecessarily
	id -3 class hdd		# do not change unnecessarily
	id -10 class ssd		# do not change unnecessarily
	# weight 3.653
	alg straw2
	hash 0	# rjenkins1
	item osd.33 weight 1.819
	item osd.13 weight 0.547
	item osd.14 weight 0.467
	item osd.16 weight 0.547
	item osd.0 weight 0.274
}
host udo {
	id -6		# do not change unnecessarily
	id -7 class hdd		# do not change unnecessarily
	id -11 class ssd		# do not change unnecessarily
	# weight 4.006
	alg straw2
	hash 0	# rjenkins1
	item osd.32 weight 1.819
	item osd.5 weight 0.547
	item osd.10 weight 0.547
	item osd.11 weight 0.547
	item osd.12 weight 0.547
}
host moritz {
	id -13		# do not change unnecessarily
	id -14 class hdd		# do not change unnecessarily
	id -15 class ssd		# do not change unnecessarily
	# weight 1.819
	alg straw2
	hash 0	# rjenkins1
	item osd.30 weight 1.819
}
host bruno {
	id -16		# do not change unnecessarily
	id -17 class hdd		# do not change unnecessarily
	id -18 class ssd		# do not change unnecessarily
	# weight 3.183
	alg straw2
	hash 0	# rjenkins1
	item osd.24 weight 0.273
	item osd.26 weight 0.273
	item osd.27 weight 0.273
	item osd.28 weight 0.273
	item osd.29 weight 0.273
	item osd.2 weight 1.819
}
root default {
	id -1		# do not change unnecessarily
	id -8 class hdd		# do not change unnecessarily
	id -12 class ssd		# do not change unnecessarily
	# weight 16.121
	alg straw2
	hash 0	# rjenkins1
	item daniel weight 3.459
	item felix weight 3.653
	item udo weight 4.006
	item moritz weight 1.819
	item bruno weight 3.183
}

# rules
rule ssd {
	id 0
	type replicated
	min_size 1
	max_size 10
	step take default class ssd
	step choose firstn 0 type osd
	step emit
}
rule hdd {
	id 1
	type replicated
	min_size 1
	max_size 10
	step take default class hdd
	step choose firstn 0 type osd
	step emit
}

# end crush map

--

Mit freundlichen Grüßen

Konrad Riedel

--

Berufsförderungswerk Dresden gemeinnützige GmbH
SG1
IT/Infrastruktur
Hellerhofstraße 35
D-01129 Dresden
Telefon (03 51) 85 48 - 115
Telefax (03 51) 85 48 - 507
E-Mail   it@xxxxxxxxxxxxxx

--------------------------------------------------------------------
Vorsitzende des Verwaltungsrates: Dr. Ina Ueberschär
Geschäftsführer: Henry Köhler
Unternehmenssitz: Dresden
Handelsregister: Amtsgericht Dresden HRB 2380
Zertifiziert nach ISO 9001:2015 und AZAV
--------------------------------------------------------------------

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux