pgs stuck inactive

Laszlo Budai <laszlo@xxxxxxxxxxxxxxxx> · Thu, 9 Mar 2017 14:53:10 +0200

Hello,

After a major network outage our ceph cluster ended up with an inactive PG:

# ceph health detail
HEALTH_WARN 1 pgs incomplete; 1 pgs stuck inactive; 1 pgs stuck unclean; 1 requests are blocked > 32 sec; 1 osds have slow requests
pg 3.367 is stuck inactive for 912263.766607, current state incomplete, last acting [28,35,2]
pg 3.367 is stuck unclean for 912263.766688, current state incomplete, last acting [28,35,2]
pg 3.367 is incomplete, acting [28,35,2]
1 ops are blocked > 268435 sec
1 ops are blocked > 268435 sec on osd.28
1 osds have slow requests

# ceph -s
    cluster 6713d1b8-83da-11e6-aa79-525400d98c5a
     health HEALTH_WARN
            1 pgs incomplete
            1 pgs stuck inactive
            1 pgs stuck unclean
            1 requests are blocked > 32 sec
     monmap e3: 3 mons at {tv-dl360-1=10.12.193.73:6789/0,tv-dl360-2=10.12.193.74:6789/0,tv-dl360-3=10.12.193.75:6789/0}
            election epoch 72, quorum 0,1,2 tv-dl360-1,tv-dl360-2,tv-dl360-3
     osdmap e60609: 72 osds: 72 up, 72 in
      pgmap v3670252: 4864 pgs, 11 pools, 134 GB data, 23778 objects
            490 GB used, 130 TB / 130 TB avail
                4863 active+clean
                   1 incomplete
  client io 0 B/s rd, 38465 B/s wr, 2 op/s

ceph pg repair doesn't change anything. What should I try to recover it?
Attached is the result of ceph pg query on the problem PG.

Thank you,
Laszlo
Attachment:
pg_3.367_query.gz

Description: application/gzip
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com