Health_Warn recovery stuck / crushmap problem?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



All OSD´s and Monitors are up from what I can see.
I read through the troubleshooting like mentioned in the ceph documentation for PGs and came to the conclusion that nothing there would help me, so I didn´t try anything - except restarting / rebooting OSD´s and Monitors.

How do I recover from this, it looks to me that the data itself should be safe for now, but why is it not restoring?
I guess the problem may be the crushmap.

Here are some outputs:

#ceph health detail

HEALTH_WARN 475 pgs degraded; 640 pgs stale; 475 pgs stuck degraded; 640 pgs stuck stale; 640 pgs stuck unclean; 475 pgs stuck undersized; 475 pgs undersized; recovery 104812/279550 objects degraded (37.493%); recovery 69926/279550 objects misplaced (25.014%)
pg 3.ec is stuck unclean for 3326815.935321, current state stale+active+remapped, last acting [7,6]
pg 3.ed is stuck unclean for 3288818.682456, current state stale+active+remapped, last acting [6,7]
pg 3.ee is stuck unclean for 409973.052061, current state stale+active+undersized+degraded, last acting [7]
pg 3.ef is stuck unclean for 3357894.554762, current state stale+active+undersized+degraded, last acting [7]
pg 3.e8 is stuck unclean for 384815.518837, current state stale+active+undersized+degraded, last acting [6]
pg 3.e9 is stuck unclean for 3274554.591000, current state stale+active+remapped, last acting [6,7]
......

################################################################################

This is the crushmap I created and intended to use and thought I used for the past 2 months:
- pvestorage1-ssd and pvestorage1-platter are the same hosts, it seems like this is not possible but I never noticed
- likewise with pvestorage2

# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable straw_calc_version 1

# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 osd.3
device 4 osd.4
device 5 osd.5
device 6 osd.6
device 7 osd.7

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root

# buckets
host pvestorage1-ssd {
        id -2   # do not change unnecessarily
        # weight 1.740
        alg straw
        hash 0  # rjenkins1
        item osd.0 weight 0.870
        item osd.1 weight 0.870
}
host pvestorage2-ssd {
        id -3   # do not change unnecessarily
        # weight 1.740
        alg straw
        hash 0  # rjenkins1
        item osd.2 weight 0.870
        item osd.3 weight 0.870
}
host pvestorage1-platter {
        id -4           # do not change unnecessarily
        # weight 4
        alg straw
        hash 0  # rjenkins1
        item osd.4 weight 2.000
        item osd.5 weight 2.000
}
host pvestorage2-platter {
        id -5           # do not change unnecessarily
        # weight 4
        alg straw
        hash 0  # rjenkins1
        item osd.6 weight 2.000
        item osd.7 weight 2.000
}

root ssd {
        id -1   # do not change unnecessarily
        # weight 3.480
        alg straw
        hash 0  # rjenkins1
        item pvestorage1-ssd weight 1.740
        item pvestorage2-ssd weight 1.740
}

root platter {
        id -6           # do not change unnecessarily
        # weight 8
        alg straw
        hash 0  # rjenkins1
        item pvestorage1-platter weight 4.000
        item pvestorage2-platter weight 4.000
}

# rules
rule ssd {
        ruleset 0
        type replicated
        min_size 1
        max_size 10
        step take ssd
        step chooseleaf firstn 0 type host
        step emit
}

rule platter {
        ruleset 1
        type replicated
        min_size 1
        max_size 10
        step take platter
        step chooseleaf firstn 0 type host
        step emit
}
# end crush map
################################################################################

This is the what ceph made of this crushmap and the one that is actually used right now, I never looked -_- :

# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable straw_calc_version 1

# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 osd.3
device 4 osd.4
device 5 osd.5
device 6 osd.6
device 7 osd.7

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root

# buckets
host pvestorage1-ssd {
        id -2   # do not change unnecessarily
        # weight 0.000
        alg straw
        hash 0  # rjenkins1
}
host pvestorage2-ssd {
        id -3   # do not change unnecessarily
        # weight 0.000
        alg straw
        hash 0  # rjenkins1
}
root ssd {
        id -1   # do not change unnecessarily
        # weight 0.000
        alg straw
        hash 0  # rjenkins1
        item pvestorage1-ssd weight 0.000
        item pvestorage2-ssd weight 0.000
}
host pvestorage1-platter {
        id -4   # do not change unnecessarily
        # weight 0.000
        alg straw
        hash 0  # rjenkins1
}
host pvestorage2-platter {
        id -5   # do not change unnecessarily
        # weight 0.000
        alg straw
        hash 0  # rjenkins1
}
root platter {
        id -6   # do not change unnecessarily
        # weight 0.000
        alg straw
        hash 0  # rjenkins1
        item pvestorage1-platter weight 0.000
        item pvestorage2-platter weight 0.000
}
host pvestorage1 {
        id -7   # do not change unnecessarily
        # weight 5.740
        alg straw
        hash 0  # rjenkins1
        item osd.5 weight 2.000
        item osd.4 weight 2.000
        item osd.1 weight 0.870
        item osd.0 weight 0.870
}
host pvestorage2 {
        id -9   # do not change unnecessarily
        # weight 5.740
        alg straw
        hash 0  # rjenkins1
        item osd.3 weight 0.870
        item osd.2 weight 0.870
        item osd.6 weight 2.000
        item osd.7 weight 2.000
}
root default {
        id -8   # do not change unnecessarily
        # weight 11.480
        alg straw
        hash 0  # rjenkins1
        item pvestorage1 weight 5.740
        item pvestorage2 weight 5.740
}

# rules
rule ssd {
        ruleset 0
        type replicated
        min_size 1
        max_size 10
        step take ssd
        step chooseleaf firstn 0 type host
        step emit
}
rule platter {
        ruleset 1
        type replicated
        min_size 1
        max_size 10
        step take platter
        step chooseleaf firstn 0 type host
        step emit
}

# end crush map
################################################################################

How do I recover from this?

Best Regards
Jonas
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux