Re: Problems with pgs incomplete

Georgios Dimitrakakis <giorgis@xxxxxxxxxxxx> · Mon, 01 Dec 2014 18:02:29 +0200

Hi!

I had a very similar issue a few days ago.

For me it wasn't too much of a problem since the cluster was new 
without data and I could force recreate the PGs. I really hope that in 
your case it won't be necessary to do the same thing.

As a first step try to reduce the min_size from 2 to 1 as suggested for 
the .rgw.buckets pool and see if this can bring you cluster back to 
health.

Regards,

George

On Mon, 01 Dec 2014 17:09:31 +0300, Butkeev Stas wrote:
Hi all,
I have Ceph cluster+rgw. Now I have problems with one of OSD, it's
down now. I check ceph status and see this information

[root@node-1 ceph-0]# ceph -s
    cluster fc8c3ecc-ccb8-4065-876c-dc9fc992d62d
     health HEALTH_WARN 4 pgs incomplete; 4 pgs stuck inactive; 4 pgs
stuck unclean
     monmap e1: 3 mons at
{a=10.29.226.39:6789/0,b=10.29.226.29:6789/0,c=10.29.226.40:6789/0},
election epoch 294, quorum 0,1,2 b,a,c
     osdmap e418: 6 osds: 5 up, 5 in
      pgmap v23588: 312 pgs, 16 pools, 141 kB data, 594 objects
            5241 MB used, 494 GB / 499 GB avail
                 308 active+clean
                   4 incomplete

Why am I having 4 pgs incomplete in bucket .rgw.buckets if I am
having replicated size 2 and min_size 2?

My osd tree
[root@node-1 ceph-0]# ceph osd tree
# id    weight  type name       up/down reweight
-1      4       root croc
-2      4               region ru
-4      3                       datacenter vol-5
-5      1                               host node-1
0       1                                       osd.0   down    0
-6      1                               host node-2
1       1                                       osd.1   up      1
-7      1                               host node-3
2       1                                       osd.2   up      1
-3      1                       datacenter comp
-8      1                               host node-4
3       1                                       osd.3   up      1
-9      1                               host node-5
4       1                                       osd.4   up      1
-10     1                               host node-6
5       1                                       osd.5   up      1

Addition information:

[root@node-1 ceph-0]# ceph health detail
HEALTH_WARN 4 pgs incomplete; 4 pgs stuck inactive; 4 pgs stuck 
unclean
pg 13.6 is stuck inactive for 1547.665758, current state incomplete,
last acting [1,3]
pg 13.4 is stuck inactive for 1547.652111, current state incomplete,
last acting [1,2]
pg 13.5 is stuck inactive for 4502.009928, current state incomplete,
last acting [1,3]
pg 13.2 is stuck inactive for 4501.979770, current state incomplete,
last acting [1,3]
pg 13.6 is stuck unclean for 4501.969914, current state incomplete,
last acting [1,3]
pg 13.4 is stuck unclean for 4502.001114, current state incomplete,
last acting [1,2]
pg 13.5 is stuck unclean for 4502.009942, current state incomplete,
last acting [1,3]
pg 13.2 is stuck unclean for 4501.979784, current state incomplete,
last acting [1,3]
pg 13.2 is incomplete, acting [1,3] (reducing pool .rgw.buckets
min_size from 2 may help; search ceph.com/docs for 'incomplete')
pg 13.6 is incomplete, acting [1,3] (reducing pool .rgw.buckets
min_size from 2 may help; search ceph.com/docs for 'incomplete')
pg 13.4 is incomplete, acting [1,2] (reducing pool .rgw.buckets
min_size from 2 may help; search ceph.com/docs for 'incomplete')
pg 13.5 is incomplete, acting [1,3] (reducing pool .rgw.buckets
min_size from 2 may help; search ceph.com/docs for 'incomplete')

[root@node-1 ceph-0]# ceph osd dump | grep 'pool'
pool 0 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash
rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool
stripe_width 0
pool 1 '.rgw.root' replicated size 3 min_size 2 crush_ruleset 0
object_hash rjenkins pg_num 8 pgp_num 8 last_change 34 owner
18446744073709551615 flags hashpspool stripe_width 0
pool 2 '.rgw.control' replicated size 3 min_size 2 crush_ruleset 0
object_hash rjenkins pg_num 8 pgp_num 8 last_change 36 owner
18446744073709551615 flags hashpspool stripe_width 0
pool 3 '.rgw' replicated size 3 min_size 2 crush_ruleset 0
object_hash rjenkins pg_num 8 pgp_num 8 last_change 38 owner
18446744073709551615 flags hashpspool stripe_width 0
pool 4 '.rgw.gc' replicated size 3 min_size 2 crush_ruleset 0
object_hash rjenkins pg_num 8 pgp_num 8 last_change 39 flags
hashpspool stripe_width 0
pool 5 '.users.uid' replicated size 3 min_size 2 crush_ruleset 0
object_hash rjenkins pg_num 8 pgp_num 8 last_change 40 owner
18446744073709551615 flags hashpspool stripe_width 0
pool 6 '.log' replicated size 3 min_size 2 crush_ruleset 0
object_hash rjenkins pg_num 8 pgp_num 8 last_change 42 owner
18446744073709551615 flags hashpspool stripe_width 0
pool 7 '.users' replicated size 3 min_size 2 crush_ruleset 0
object_hash rjenkins pg_num 8 pgp_num 8 last_change 44 flags
hashpspool stripe_width 0
pool 8 '.users.swift' replicated size 3 min_size 2 crush_ruleset 0
object_hash rjenkins pg_num 8 pgp_num 8 last_change 46 flags
hashpspool stripe_width 0
pool 9 '.usage' replicated size 3 min_size 2 crush_ruleset 0
object_hash rjenkins pg_num 8 pgp_num 8 last_change 48 flags
hashpspool stripe_width 0
pool 10 'test' replicated size 2 min_size 2 crush_ruleset 0
object_hash rjenkins pg_num 136 pgp_num 136 last_change 68 flags
hashpspool stripe_width 0
pool 11 '.rgw.buckets.index' replicated size 3 min_size 2
crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 
70
owner 18446744073709551615 flags hashpspool stripe_width 0
pool 12 '.rgw.buckets.extra' replicated size 3 min_size 2
crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 
72
owner 18446744073709551615 flags hashpspool stripe_width 0
pool 13 '.rgw.buckets' replicated size 2 min_size 2 crush_ruleset 0
object_hash rjenkins pg_num 8 pgp_num 8 last_change 383 owner
18446744073709551615 flags hashpspool stripe_width 0
pool 14 '.intent-log' replicated size 3 min_size 2 crush_ruleset 0
object_hash rjenkins pg_num 8 pgp_num 8 last_change 213 flags
hashpspool stripe_width 0
pool 15 '' replicated size 3 min_size 2 crush_ruleset 0 object_hash
rjenkins pg_num 8 pgp_num 8 last_change 238 flags hashpspool
stripe_width 0

--
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com