PG mapped to OSDs on same host although 'chooseleaf type host'

Wido den Hollander <wido@xxxxxxxx> · Thu, 22 Feb 2018 18:28:44 +0100

Hi,

I have a situation with a cluster which was recently upgraded to 
Luminous and has a PG mapped to OSDs on the same host.

root@man:~# ceph pg map 1.41
osdmap e21543 pg 1.41 (1.41) -> up [15,7,4] acting [15,7,4]
root@man:~#

root@man:~# ceph osd find 15|jq -r '.crush_location.host'
n02
root@man:~# ceph osd find 7|jq -r '.crush_location.host'
n01
root@man:~# ceph osd find 4|jq -r '.crush_location.host'
n02
root@man:~#

As you can see, OSD 15 and 4 are both on the host 'n02'.

This PG went inactive when the machine hosting both OSDs went down for 
maintenance.

My first suspect was the CRUSHMap and the rules, but those are fine:

rule replicated_ruleset {
	id 0
	type replicated
	min_size 1
	max_size 10
	step take default
	step chooseleaf firstn 0 type host
	step emit
}

This is the only rule in the CRUSHMap.

ID CLASS WEIGHT   TYPE NAME      STATUS REWEIGHT PRI-AFF
-1       19.50325 root default
-2        2.78618     host n01
 5   ssd  0.92999         osd.5      up  1.00000 1.00000
 7   ssd  0.92619         osd.7      up  1.00000 1.00000
14   ssd  0.92999         osd.14     up  1.00000 1.00000
-3        2.78618     host n02
 4   ssd  0.92999         osd.4      up  1.00000 1.00000
 8   ssd  0.92619         osd.8      up  1.00000 1.00000
15   ssd  0.92999         osd.15     up  1.00000 1.00000
-4        2.78618     host n03
 3   ssd  0.92999         osd.3      up  0.94577 1.00000
 9   ssd  0.92619         osd.9      up  0.82001 1.00000
16   ssd  0.92999         osd.16     up  0.84885 1.00000
-5        2.78618     host n04
 2   ssd  0.92999         osd.2      up  0.93501 1.00000
10   ssd  0.92619         osd.10     up  0.76031 1.00000
17   ssd  0.92999         osd.17     up  0.82883 1.00000
-6        2.78618     host n05
 6   ssd  0.92999         osd.6      up  0.84470 1.00000
11   ssd  0.92619         osd.11     up  0.80530 1.00000
18   ssd  0.92999         osd.18     up  0.86501 1.00000
-7        2.78618     host n06
 1   ssd  0.92999         osd.1      up  0.88353 1.00000
12   ssd  0.92619         osd.12     up  0.79602 1.00000
19   ssd  0.92999         osd.19     up  0.83171 1.00000
-8        2.78618     host n07
 0   ssd  0.92999         osd.0      up  1.00000 1.00000
13   ssd  0.92619         osd.13     up  0.86043 1.00000
20   ssd  0.92999         osd.20     up  0.77153 1.00000

Here you see osd.15 and osd.4 on the same host 'n02'.

This cluster was upgraded from Hammer to Jewel and now Luminous and it 
doesn't have the latest tunables yet, but should that matter? I never 
encountered this before.

tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54

I don't want to touch this yet in the case this is a bug or glitch in 
the matrix somewhere.

I hope it's just a admin mistake, but so far I'm not able to find a clue 
pointing to that.

root@man:~# ceph osd dump|head -n 12
epoch 21545
fsid 0b6fb388-6233-4eeb-a55c-476ed12bdf0a
created 2015-04-28 14:43:53.950159
modified 2018-02-22 17:56:42.497849
flags sortbitwise,recovery_deletes,purged_snapdirs
crush_version 22
full_ratio 0.95
backfillfull_ratio 0.9
nearfull_ratio 0.85
require_min_compat_client luminous
min_compat_client luminous
require_osd_release luminous
root@man:~#

I also downloaded the CRUSHmap and ran crushtool with --test and 
--show-mappings, but that didn't show any PG mapped to the same host.

Any ideas on what might be going on here?

Wido
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com