Hi,
I recently installed 3 nodes ceph cluster v.10.2.3. It has
3 mons, and 12 osds. I removed default pool and created
the following one:
pool 7 'data' replicated size 2
min_size 1 crush_ruleset 0 object_hash rjenkins pg_num
1024 pgp_num 1024 last_change 126 flags hashpspool
stripe_width 0
Cluster is healthy if all osds are up, however if I stop
any of the osds, it becomes stuck and undersized - it is
not rebuilding.
cluster *****
health HEALTH_WARN
166 pgs degraded
108 pgs stuck unclean
166 pgs undersized
recovery 67261/827220 objects degraded
(8.131%)
1/12 in osds are down
monmap e3: 3 mons at
{**osd01=***.144:6789/0,***osd02=***.145:6789/0,**osd03=*****.146:6789/0}
election epoch 14, quorum 0,1,2
**osd01,**osd02,**osd03
osdmap e161: 12 osds: 11 up, 12 in; 166 remapped
pgs
flags sortbitwise
pgmap v307710: 1024 pgs, 1 pools, 1230 GB data,
403 kobjects
2452 GB used, 42231 GB / 44684 GB avail
67261/827220 objects degraded (8.131%)
858 active+clean
166 active+undersized+degraded
Replica size is 2 and and I use the following crushmap:
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable straw_calc_version 1
# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 osd.3
device 4 osd.4
device 5 osd.5
device 6 osd.6
device 7 osd.7
device 8 osd.8
device 9 osd.9
device 10 osd.10
device 11 osd.11
# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root
# buckets
host osd01 {
id -2 # do not change unnecessarily
# weight 14.546
alg straw
hash 0 # rjenkins1
item osd.0 weight 3.636
item osd.1 weight 3.636
item osd.2 weight 3.636
item osd.3 weight 3.636
}
host osd02 {
id -3 # do not change unnecessarily
# weight 14.546
alg straw
hash 0 # rjenkins1
item osd.4 weight 3.636
item osd.5 weight 3.636
item osd.6 weight 3.636
item osd.7 weight 3.636
}
host osd03 {
id -4 # do not change unnecessarily
# weight 14.546
alg straw
hash 0 # rjenkins1
item osd.8 weight 3.636
item osd.9 weight 3.636
item osd.10 weight 3.636
item osd.11 weight 3.636
}
root default {
id -1 # do not change unnecessarily
# weight 43.637
alg straw
hash 0 # rjenkins1
item osd01 weight 14.546
item osd02 weight 14.546
item osd03 weight 14.546
}
# rules
rule replicated_ruleset {
ruleset 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}
# end crush map
I am not sure what is the reason for undersized state. All
osd disks are the same size and replica size is 2. Also
data is only replicated per hosts basis and I have 3
separate hosts. Maybe number of pg is incorrect ? Is 1024
too big ? or maybe there is some misconfiguration in
crushmap ?
Kind regards,
Piotr Dzionek