All OSD´s and Monitors are up from what I can see.
I read through the troubleshooting like mentioned in the ceph documentation for PGs and came to the conclusion that nothing there would help me, so I didn´t try anything - except restarting / rebooting OSD´s and Monitors.
How do I recover from this, it looks to me that the data itself should be safe for now, but why is it not restoring?
I guess the problem may be the crushmap.
Here are some outputs:
#ceph health detail
HEALTH_WARN 475 pgs degraded; 640 pgs stale; 475 pgs stuck degraded; 640 pgs stuck stale; 640 pgs stuck unclean; 475 pgs stuck undersized; 475 pgs undersized; recovery 104812/279550 objects degraded (37.493%); recovery 69926/279550 objects misplaced (25.014%)
pg 3.ec is stuck unclean for 3326815.935321, current state stale+active+remapped, last acting [7,6]
pg 3.ed is stuck unclean for 3288818.682456, current state stale+active+remapped, last acting [6,7]
pg 3.ee is stuck unclean for 409973.052061, current state stale+active+undersized+degraded, last acting [7]
pg 3.ef is stuck unclean for 3357894.554762, current state stale+active+undersized+degraded, last acting [7]
pg 3.e8 is stuck unclean for 384815.518837, current state stale+active+undersized+degraded, last acting [6]
pg 3.e9 is stuck unclean for 3274554.591000, current state stale+active+remapped, last acting [6,7]
......
################################################################################
This is the crushmap I created and intended to use and thought I used for the past 2 months:
- pvestorage1-ssd and pvestorage1-platter are the same hosts, it seems like this is not possible but I never noticed
- likewise with pvestorage2
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable straw_calc_version 1
# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 osd.3
device 4 osd.4
device 5 osd.5
device 6 osd.6
device 7 osd.7
# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root
# buckets
host pvestorage1-ssd {
id -2 # do not change unnecessarily
# weight 1.740
alg straw
hash 0 # rjenkins1
item osd.0 weight 0.870
item osd.1 weight 0.870
}
host pvestorage2-ssd {
id -3 # do not change unnecessarily
# weight 1.740
alg straw
hash 0 # rjenkins1
item osd.2 weight 0.870
item osd.3 weight 0.870
}
host pvestorage1-platter {
id -4 # do not change unnecessarily
# weight 4
alg straw
hash 0 # rjenkins1
item osd.4 weight 2.000
item osd.5 weight 2.000
}
host pvestorage2-platter {
id -5 # do not change unnecessarily
# weight 4
alg straw
hash 0 # rjenkins1
item osd.6 weight 2.000
item osd.7 weight 2.000
}
root ssd {
id -1 # do not change unnecessarily
# weight 3.480
alg straw
hash 0 # rjenkins1
item pvestorage1-ssd weight 1.740
item pvestorage2-ssd weight 1.740
}
root platter {
id -6 # do not change unnecessarily
# weight 8
alg straw
hash 0 # rjenkins1
item pvestorage1-platter weight 4.000
item pvestorage2-platter weight 4.000
}
# rules
rule ssd {
ruleset 0
type replicated
min_size 1
max_size 10
step take ssd
step chooseleaf firstn 0 type host
step emit
}
rule platter {
ruleset 1
type replicated
min_size 1
max_size 10
step take platter
step chooseleaf firstn 0 type host
step emit
}
# end crush map
################################################################################
This is the what ceph made of this crushmap and the one that is actually used right now, I never looked -_- :
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable straw_calc_version 1
# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 osd.3
device 4 osd.4
device 5 osd.5
device 6 osd.6
device 7 osd.7
# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root
# buckets
host pvestorage1-ssd {
id -2 # do not change unnecessarily
# weight 0.000
alg straw
hash 0 # rjenkins1
}
host pvestorage2-ssd {
id -3 # do not change unnecessarily
# weight 0.000
alg straw
hash 0 # rjenkins1
}
root ssd {
id -1 # do not change unnecessarily
# weight 0.000
alg straw
hash 0 # rjenkins1
item pvestorage1-ssd weight 0.000
item pvestorage2-ssd weight 0.000
}
host pvestorage1-platter {
id -4 # do not change unnecessarily
# weight 0.000
alg straw
hash 0 # rjenkins1
}
host pvestorage2-platter {
id -5 # do not change unnecessarily
# weight 0.000
alg straw
hash 0 # rjenkins1
}
root platter {
id -6 # do not change unnecessarily
# weight 0.000
alg straw
hash 0 # rjenkins1
item pvestorage1-platter weight 0.000
item pvestorage2-platter weight 0.000
}
host pvestorage1 {
id -7 # do not change unnecessarily
# weight 5.740
alg straw
hash 0 # rjenkins1
item osd.5 weight 2.000
item osd.4 weight 2.000
item osd.1 weight 0.870
item osd.0 weight 0.870
}
host pvestorage2 {
id -9 # do not change unnecessarily
# weight 5.740
alg straw
hash 0 # rjenkins1
item osd.3 weight 0.870
item osd.2 weight 0.870
item osd.6 weight 2.000
item osd.7 weight 2.000
}
root default {
id -8 # do not change unnecessarily
# weight 11.480
alg straw
hash 0 # rjenkins1
item pvestorage1 weight 5.740
item pvestorage2 weight 5.740
}
# rules
rule ssd {
ruleset 0
type replicated
min_size 1
max_size 10
step take ssd
step chooseleaf firstn 0 type host
step emit
}
rule platter {
ruleset 1
type replicated
min_size 1
max_size 10
step take platter
step chooseleaf firstn 0 type host
step emit
}
# end crush map
################################################################################
How do I recover from this?
Best Regards
Jonas
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxxhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com