'pgs stuck unclean ' problem

houguanghua <houguanghua@xxxxxxxxxxx> · Fri, 20 Mar 2015 00:58:36 +0000

Dear all, 
Ceph 0.72.2 is deployed in three hosts. But the ceph's status is  HEALTH_WARN . The status is as follows:

 # ceph -s
    cluster e25909ed-25d9-42fd-8c97-0ed31eec6194
     health HEALTH_WARN 768 pgs degraded; 768 pgs stuck unclean; recovery 2/3 objects degraded (66.667%)
     monmap e3: 3 mons at {ceph-node1=192.168.57.101:6789/0,ceph-node2=192.168.57.102:6789/0,ceph-node3=192.168.57.103:6789/0}, election epoch 34, quorum 0,1,2 ceph-node1,ceph-node2,ceph-node3
     osdmap e170: 9 osds: 9 up, 9 in
      pgmap v1741: 768 pgs, 7 pools, 36 bytes data, 1 objects
            367 MB used, 45612 MB / 45980 MB avail
            2/3 objects degraded (66.667%)
                 768 active+degraded
There are 3 pools created, but 7 pools appears in above ceph status.
# ceph osd lspools
5 data,6 metadata,7 rbd,
The object in pool 'data' justs has one replication. But the pool's replication is set as 3.
 # ceph osd map data object1
osdmap e170 pool 'data' (5) object 'object1' -> pg 5.bac5debc (5.bc) -> up [6] acting [6]

# ceph osd dump|more
epoch 170
fsid e25909ed-25d9-42fd-8c97-0ed31eec6194
created 2015-03-16 11:23:28.805286
modified 2015-03-19 15:45:39.451077
flags 
pool 5 'data' rep size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 256 pgp_num 256 last_change 155 owner 0
pool 6 'metadata' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 256 pgp_num 256 last_change 161 owner 0
pool 7 'rbd' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 256 pgp_num 256 last_change 163 owner 0

Other info is depicted here.
# ceph osd tree
# id    weight  type name       up/down reweight
-1      0       root default
-7      0               rack rack03
-4      0                       host ceph-node3
6       0                               osd.6   up      1
7       0                               osd.7   up      1
8       0                               osd.8   up      1
-6      0               rack rack02
-3      0                       host ceph-node2
3       0                               osd.3   up      1
4       0                               osd.4   up      1
5       0                               osd.5   up      1
-5      0               rack rack01
-2      0                       host ceph-node1
0       0                               osd.0   up      1
1       0                               osd.1   up      1
2       0                               osd.2   up      1

The crushmap is :
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 osd.3
device 4 osd.4
device 5 osd.5
device 6 osd.6
device 7 osd.7
device 8 osd.8
# types
type 0 osd
type 1 host
type 2 rack
type 3 row
type 4 room
type 5 datacenter
type 6 root
# buckets
host ceph-node3 {
        id -4           # do not change unnecessarily
        # weight 0.000
        alg straw
        hash 0  # rjenkins1
        item osd.6 weight 0.000
        item osd.7 weight 0.000
        item osd.8 weight 0.000
}
rack rack03 {
        id -7           # do not change unnecessarily
        # weight 0.000
        alg straw
        hash 0  # rjenkins1
        item ceph-node3 weight 0.000
}
host ceph-node2 {
        id -3           # do not change unnecessarily
        # weight 0.000
        alg straw
        hash 0  # rjenkins1
        item osd.3 weight 0.000
        item osd.4 weight 0.000
        item osd.5 weight 0.000
}
rack rack02 {
        id -6           # do not change unnecessarily
        # weight 0.000
        alg straw
        hash 0  # rjenkins1
        item ceph-node2 weight 0.000
}
host ceph-node1 {
        id -2           # do not change unnecessarily
        # weight 0.000
        alg straw
        hash 0  # rjenkins1
        item osd.0 weight 0.000
        item osd.1 weight 0.000
        item osd.2 weight 0.000
}
rack rack01 {
        id -5           # do not change unnecessarily
        # weight 0.000
        alg straw
        hash 0  # rjenkins1
        item ceph-node1 weight 0.000
}
root default {
        id -1           # do not change unnecessarily
        # weight 0.000
        alg straw
        hash 0  # rjenkins1
        item rack03 weight 0.000
        item rack02 weight 0.000
        item rack01 weight 0.000
}
# rules
rule data {
        ruleset 0
        type replicated
        min_size 1
        max_size 10
        step take default
        step chooseleaf firstn 0 type host
        step emit
}
# end crush map

# ceph health detail |more
HEALTH_WARN 768 pgs degraded; 768 pgs stuck unclean; recovery 2/3 objects degraded (66.667%)
pg 5.17 is stuck unclean since forever, current state active+degraded, last acting [6]
pg 6.14 is stuck unclean since forever, current state active+degraded, last acting [6]
pg 7.15 is stuck unclean since forever, current state active+degraded, last acting [6]
pg 5.14 is stuck unclean since forever, current state active+degraded, last acting [6]
pg 6.17 is stuck unclean since forever, current state active+degraded, last acting [6]
pg 7.16 is stuck unclean since forever, current state active+degraded, last acting [6]
pg 5.15 is stuck unclean since forever, current state active+degraded, last acting [6]
pg 6.16 is stuck unclean since forever, current state active+degraded, last acting [6]
pg 7.17 is stuck unclean since forever, current state active+degraded, last acting [6]

 I had researched this problem for one week, but no solution is found.
Does anyone tell me how to fix it? Thanks!

Regards,
Guanghua

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com