Having an issue with: 7 pgs stuck inactive; 7 pgs stuck unclean; 71 requests are blocked > 32

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello fellow ceph users,

 

I ran into a major issue were two KVM hosts will not start due to issues with my Ceph cluster.

 

Here are some details:

 

Running ceph version 0.87.  There are 10 hosts with 6 drives each for 60 OSDs.

 

# ceph -s

    cluster 1431e336-faa2-4b13-b50d-c1d375b4e64b

     health HEALTH_WARN 7 pgs incomplete; 7 pgs stuck inactive; 7 pgs stuck unclean; 71 requests are blocked > 32 sec; pool rbd-b has too few pgs

     monmap e1: 3 mons at {xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx}, election epoch 92, quorum 0,1,2 ceph-b01,ceph-b02,ceph-b03

     mdsmap e49: 1/1/1 up {0=pmceph-b06=up:active}, 1 up:standby

     osdmap e10023: 60 osds: 60 up, 60 in

      pgmap v19851672: 45056 pgs, 22 pools, 13318 GB data, 3922 kobjects

            39863 GB used, 178 TB / 217 TB avail

               45049 active+clean

                   7 incomplete

  client io 954 kB/s rd, 386 kB/s wr, 78 op/s

 

# ceph health detail

HEALTH_WARN 7 pgs incomplete; 7 pgs stuck inactive; 7 pgs stuck unclean; 69 requests are blocked > 32 sec; 5 osds have slow requests; pool rbd-b has too few pgs

pg 3.38b is stuck inactive since forever, current state incomplete, last acting [48,35,2]

pg 1.541 is stuck inactive since forever, current state incomplete, last acting [48,20,2]

pg 3.57d is stuck inactive for 15676.967208, current state incomplete, last acting [55,48,2]

pg 3.5c9 is stuck inactive since forever, current state incomplete, last acting [48,2,15]

pg 3.540 is stuck inactive for 15676.959093, current state incomplete, last acting [57,48,2]

pg 3.5a5 is stuck inactive since forever, current state incomplete, last acting [2,48,57]

pg 3.305 is stuck inactive for 15676.855987, current state incomplete, last acting [39,2,48]

pg 3.38b is stuck unclean since forever, current state incomplete, last acting [48,35,2]

pg 1.541 is stuck unclean since forever, current state incomplete, last acting [48,20,2]

pg 3.57d is stuck unclean for 15676.971318, current state incomplete, last acting [55,48,2]

pg 3.5c9 is stuck unclean since forever, current state incomplete, last acting [48,2,15]

pg 3.540 is stuck unclean for 15676.963204, current state incomplete, last acting [57,48,2]

pg 3.5a5 is stuck unclean since forever, current state incomplete, last acting [2,48,57]

pg 3.305 is stuck unclean for 15676.860098, current state incomplete, last acting [39,2,48]

pg 3.5c9 is incomplete, acting [48,2,15] (reducing pool rbd-b min_size from 2 may help; search ceph.com/docs for 'incomplete')

pg 3.5a5 is incomplete, acting [2,48,57] (reducing pool rbd-b min_size from 2 may help; search ceph.com/docs for 'incomplete')

pg 3.57d is incomplete, acting [55,48,2] (reducing pool rbd-b min_size from 2 may help; search ceph.com/docs for 'incomplete')

pg 3.540 is incomplete, acting [57,48,2] (reducing pool rbd-b min_size from 2 may help; search ceph.com/docs for 'incomplete')

pg 1.541 is incomplete, acting [48,20,2] (reducing pool metadata min_size from 2 may help; search ceph.com/docs for 'incomplete')

pg 3.38b is incomplete, acting [48,35,2] (reducing pool rbd-b min_size from 2 may help; search ceph.com/docs for 'incomplete')

pg 3.305 is incomplete, acting [39,2,48] (reducing pool rbd-b min_size from 2 may help; search ceph.com/docs for 'incomplete')

20 ops are blocked > 2097.15 sec

49 ops are blocked > 1048.58 sec

13 ops are blocked > 2097.15 sec on osd.2

7 ops are blocked > 2097.15 sec on osd.39

3 ops are blocked > 1048.58 sec on osd.39

41 ops are blocked > 1048.58 sec on osd.48

4 ops are blocked > 1048.58 sec on osd.55

1 ops are blocked > 1048.58 sec on osd.57

5 osds have slow requests

pool rbd-b objects per pg (1084) is more than 12.1798 times cluster average (89)

 

I ran the following but did not help:

 

# ceph health detail | grep ^pg | cut -c4-9 | while read i; do ceph pg repair ${i} ; done

instructing pg 3.38b on osd.48 to repair

instructing pg 1.541 on osd.48 to repair

instructing pg 3.57d on osd.55 to repair

instructing pg 3.5c9 on osd.48 to repair

instructing pg 3.540 on osd.57 to repair

instructing pg 3.5a5 on osd.2 to repair

instructing pg 3.305 on osd.39 to repair

instructing pg 3.38b on osd.48 to repair

instructing pg 1.541 on osd.48 to repair

instructing pg 3.57d on osd.55 to repair

instructing pg 3.5c9 on osd.48 to repair

instructing pg 3.540 on osd.57 to repair

instructing pg 3.5a5 on osd.2 to repair

instructing pg 3.305 on osd.39 to repair

instructing pg 3.5c9 on osd.48 to repair

instructing pg 3.5a5 on osd.2 to repair

instructing pg 3.57d on osd.55 to repair

instructing pg 3.540 on osd.57 to repair

instructing pg 1.541 on osd.48 to repair

instructing pg 3.38b on osd.48 to repair

instructing pg 3.305 on osd.39 to repair

 

Also, if I run the following cmd, it seems to just hang.

 

rbd -p rbd-b info vm-50193-disk-1    ß hangs until I do CTRL-c…

 

 

Any help would be greatly appreciated!

 

Glen Aidukas

Manager IT Infrastructure

t: 610.813.2815

 

final logo for signature v

 

BehaviorMatrix, LLC | 676 Dekalb Pike, Suite 200, Blue Bell, PA, 19422

www.behaviormatrix.com

 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux