Ok, so creating our setup in the lab and adding the pools, our
hybrid pool cannot even be properly created with around 1/3 of the
PGs stuck in various states:
cluster:
id: e07f568d-056c-4e01-9292-732c64ab4f8e
health: HEALTH_WARN
Reduced data availability: 1070 pgs inactive, 204
pgs peering
Degraded data redundancy: 1087 pgs unclean, 69 pgs
degraded, 69 pgs undersized
too many PGs per OSD (215 > max 200)
services:
mon: 3 daemons, quorum s11,s12,s13
mgr: s12(active), standbys: s11, s13
osd: 51 osds: 51 up, 51 in
data:
pools: 3 pools, 4608 pgs
objects: 0 objects, 0 bytes
usage: 56598 MB used, 706 GB / 761 GB avail
pgs: 17.643% pgs unknown
5.577% pgs not active
3521 active+clean
813 unknown
204 creating+peering
46 undersized+degraded+peered
17 active+undersized+degraded
6 creating+activating+undersized+degraded
1 creating+activating
It is stuck like this, and I cant query the problematic PGs:
# ceph pg 2.7cf query
Error ENOENT: i don't have pgid 2.7cf
So, so far, great success :). Now I only have to learn how to fix
it, any ideas anyone?
Den 2018-01-26 kl. 12:59, skrev Peter
Linder:
Well, we do, but our problem is with our hybrid setup (1 nvme
and 2 hdds). The other two (that we rarely use) are nvme only
and hdd only, as far as I can tell they work and "take" command
uses class to select only the relevant OSDs.
I'll just paste our entire crushmap dump here. This one starts
working when changing the 1.7 weight to 1.0... crushtool --test
doesn't show any errors in any case, all PGs seem to be properly
assigned to osds.
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54
# devices
device 0 osd.0 class nvme
device 1 osd.1 class nvme
device 2 osd.2 class nvme
device 3 osd.3 class nvme
device 4 osd.4 class nvme
device 5 osd.5 class nvme
device 6 osd.6 class nvme
device 7 osd.7 class nvme
device 8 osd.8 class nvme
device 9 osd.9 class nvme
device 10 osd.10 class nvme
device 12 osd.12 class hdd
device 13 osd.13 class hdd
device 14 osd.14 class hdd
device 15 osd.15 class hdd
device 16 osd.16 class hdd
device 17 osd.17 class hdd
device 18 osd.18 class hdd
device 19 osd.19 class hdd
device 20 osd.20 class hdd
device 21 osd.21 class hdd
device 22 osd.22 class hdd
device 23 osd.23 class hdd
device 24 osd.24 class nvme
device 25 osd.25 class nvme
device 26 osd.26 class nvme
device 27 osd.27 class nvme
device 36 osd.36 class hdd
device 37 osd.37 class hdd
device 38 osd.38 class hdd
device 39 osd.39 class hdd
device 40 osd.40 class hdd
device 41 osd.41 class hdd
device 42 osd.42 class hdd
device 43 osd.43 class hdd
device 44 osd.44 class hdd
device 45 osd.45 class hdd
device 46 osd.46 class hdd
device 47 osd.47 class hdd
device 48 osd.48 class hdd
device 49 osd.49 class hdd
device 50 osd.50 class hdd
device 51 osd.51 class hdd
device 52 osd.52 class hdd
device 53 osd.53 class hdd
device 54 osd.54 class hdd
device 55 osd.55 class hdd
device 56 osd.56 class hdd
device 57 osd.57 class hdd
device 58 osd.58 class hdd
device 59 osd.59 class hdd
# types
type 0 osd
type 1 host
type 2 hostgroup
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root
# buckets
host storage11 {
id -5 # do not change unnecessarily
id -6 class nvme # do not change
unnecessarily
id -10 class hdd # do not change
unnecessarily
# weight 4.612
alg straw2
hash 0 # rjenkins1
item osd.0 weight 0.728
item osd.3 weight 0.728
item osd.6 weight 0.728
item osd.7 weight 0.728
item osd.10 weight 1.700
}
host storage21 {
id -13 # do not change unnecessarily
id -14 class nvme # do not change
unnecessarily
id -15 class hdd # do not change
unnecessarily
# weight 65.496
alg straw2
hash 0 # rjenkins1
item osd.12 weight 5.458
item osd.13 weight 5.458
item osd.14 weight 5.458
item osd.15 weight 5.458
item osd.16 weight 5.458
item osd.17 weight 5.458
item osd.18 weight 5.458
item osd.19 weight 5.458
item osd.20 weight 5.458
item osd.21 weight 5.458
item osd.22 weight 5.458
item osd.23 weight 5.458
}
datacenter HORN79 {
id -19 # do not change unnecessarily
id -26 class nvme # do not change
unnecessarily
id -27 class hdd # do not change
unnecessarily
# weight 70.108
alg straw2
hash 0 # rjenkins1
item storage11 weight 4.612
item storage21 weight 65.496
}
host storage13 {
id -7 # do not change unnecessarily
id -8 class nvme # do not change
unnecessarily
id -11 class hdd # do not change
unnecessarily
# weight 4.612
alg straw2
hash 0 # rjenkins1
item osd.24 weight 0.728
item osd.25 weight 0.728
item osd.26 weight 0.728
item osd.27 weight 0.728
item osd.8 weight 1.700
}
host storage23 {
id -16 # do not change unnecessarily
id -17 class nvme # do not change
unnecessarily
id -18 class hdd # do not change
unnecessarily
# weight 65.784
alg straw2
hash 0 # rjenkins1
item osd.36 weight 5.482
item osd.37 weight 5.482
item osd.38 weight 5.482
item osd.39 weight 5.482
item osd.40 weight 5.482
item osd.41 weight 5.482
item osd.42 weight 5.482
item osd.43 weight 5.482
item osd.44 weight 5.482
item osd.45 weight 5.482
item osd.58 weight 5.482
item osd.59 weight 5.482
}
datacenter WAR {
id -20 # do not change unnecessarily
id -24 class nvme # do not change
unnecessarily
id -25 class hdd # do not change
unnecessarily
# weight 70.401
alg straw2
hash 0 # rjenkins1
item storage13 weight 4.612
item storage23 weight 65.789
}
host storage12 {
id -3 # do not change unnecessarily
id -4 class nvme # do not change
unnecessarily
id -9 class hdd # do not change unnecessarily
# weight 4.612
alg straw2
hash 0 # rjenkins1
item osd.1 weight 0.728
item osd.2 weight 0.728
item osd.4 weight 0.728
item osd.5 weight 0.728
item osd.9 weight 1.700
}
host storage22 {
id -67 # do not change unnecessarily
id -68 class nvme # do not change
unnecessarily
id -69 class hdd # do not change
unnecessarily
# weight 65.736
alg straw2
hash 0 # rjenkins1
item osd.46 weight 5.458
item osd.47 weight 5.458
item osd.48 weight 5.482
item osd.49 weight 5.482
item osd.50 weight 5.482
item osd.51 weight 5.482
item osd.52 weight 5.482
item osd.53 weight 5.482
item osd.54 weight 5.482
item osd.55 weight 5.482
item osd.56 weight 5.482
item osd.57 weight 5.482
}
datacenter TEG4 {
id -21 # do not change unnecessarily
id -22 class nvme # do not change
unnecessarily
id -23 class hdd # do not change
unnecessarily
# weight 70.352
alg straw2
hash 0 # rjenkins1
item storage12 weight 4.612
item storage22 weight 65.740
}
root default {
id -1 # do not change unnecessarily
id -2 class nvme # do not change
unnecessarily
id -12 class hdd # do not change
unnecessarily
# weight 210.861
alg straw2
hash 0 # rjenkins1
item HORN79 weight 70.108
item WAR weight 70.401
item TEG4 weight 70.352
}
hostgroup hg1-1 {
id -30 # do not change unnecessarily
# id -28 class nvme # do not change
unnecessarily
# id -54 class hdd # do not change
unnecessarily
# weight 1.700
alg straw2
hash 0 # rjenkins1
item storage11 weight 100.000
}
hostgroup hg1-2 {
id -31 # do not change unnecessarily
# id -29 class nvme # do not change
unnecessarily
# id -55 class hdd # do not change
unnecessarily
# weight 10.000
alg straw2
hash 0 # rjenkins1
item storage22 weight 100.000
}
hostgroup hg1-3 {
id -32 # do not change unnecessarily
# id -43 class nvme # do not change
unnecessarily
# id -56 class hdd # do not change
unnecessarily
# weight 10.000
alg straw2
hash 0 # rjenkins1
item storage23 weight 100.000
}
hostgroup hg2-1 {
id -33 # do not change unnecessarily
# id -45 class nvme # do not change
unnecessarily
# id -58 class hdd # do not change
unnecessarily
# weight 10.000
alg straw2
hash 0 # rjenkins1
item storage12 weight 100.000
}
hostgroup hg2-2 {
id -34 # do not change unnecessarily
# id -46 class nvme # do not change
unnecessarily
# id -59 class hdd # do not change
unnecessarily
# weight 10.000
alg straw2
hash 0 # rjenkins1
item storage21 weight 100.000
}
hostgroup hg2-3 {
id -35 # do not change unnecessarily
# id -47 class nvme # do not change
unnecessarily
# id -60 class hdd # do not change
unnecessarily
# weight 10.000
alg straw2
hash 0 # rjenkins1
item storage23 weight 100.000
}
hostgroup hg3-1 {
id -36 # do not change unnecessarily
# id -49 class nvme # do not change
unnecessarily
# id -62 class hdd # do not change
unnecessarily
# weight 10.000
alg straw2
hash 0 # rjenkins1
item storage13 weight 100.000
}
hostgroup hg3-2 {
id -37 # do not change unnecessarily
# id -50 class nvme # do not change
unnecessarily
# id -63 class hdd # do not change
unnecessarily
# weight 10.000
alg straw2
hash 0 # rjenkins1
item storage21 weight 100.000
}
hostgroup hg3-3 {
id -38 # do not change unnecessarily
# id -51 class nvme # do not change
unnecessarily
# id -64 class hdd # do not change
unnecessarily
# weight 10.000
alg straw2
hash 0 # rjenkins1
item storage22 weight 100.000
}
datacenter ldc1 {
id -39 # do not change unnecessarily
# id -44 class nvme # do not change
unnecessarily
# id -57 class hdd # do not change
unnecessarily
# weight 30.000
alg straw2
hash 0 # rjenkins1
item hg1-1 weight 100.000
item hg1-2 weight 100.000
item hg1-3 weight 100.000
}
datacenter ldc2 {
id -40 # do not change unnecessarily
# id -48 class nvme # do not change
unnecessarily
# id -61 class hdd # do not change
unnecessarily
# weight 30.000
alg straw2
hash 0 # rjenkins1
item hg2-1 weight 100.000
item hg2-2 weight 100.000
item hg2-3 weight 100.000
}
datacenter ldc3 {
id -41 # do not change unnecessarily
# id -52 class nvme # do not change
unnecessarily
# id -65 class hdd # do not change
unnecessarily
# weight 30.000
alg straw2
hash 0 # rjenkins1
item hg3-1 weight 100.000
item hg3-2 weight 100.000
item hg3-3 weight 100.000
}
root ldc {
id -42 # do not change unnecessarily
# id -53 class nvme # do not change
unnecessarily
# id -66 class hdd # do not change
unnecessarily
# weight 90.000
alg straw2
hash 0 # rjenkins1
item ldc1 weight 300.000
item ldc2 weight 300.000
item ldc3 weight 300.000
}
# rules
rule hybrid {
id 1
type replicated
min_size 1
max_size 10
step take ldc
step choose indep 1 type datacenter
step chooseleaf indep 0 type hostgroup
step emit
}
rule hdd {
id 2
type replicated
min_size 1
max_size 3
step take default class hdd
step chooseleaf firstn 0 type datacenter
step emit
}
rule nvme {
id 3
type replicated
min_size 1
max_size 3
step take default class nvme
step chooseleaf firstn 0 type datacenter
step emit
}
# end crush map
Den 2018-01-26 kl. 11:22, skrev
Thomas Bennett:
Hi Peter,
Just to check if your problem is similar to mine:
- Do you have any pools that follow a crush rule to only
use osds that are backed by hdds (i.e not nvmes)?
- Do these pools obey that rule? i.e do they maybe have
pgs that are on nvmes?
Regards,
Tom
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com