Hi Sam, > Incomplete usually means the pgs do not have any complete copies. Did > you previously have more osds? No. But could have OSDs quitting after hitting assert(0 == "we got a bad state machine event"), or interacting with kernel 3.14 clients have caused the incomplete copies? How can I probe the fate of one of the incomplete PGs? e.g. pg 4.152 is incomplete, acting [1,11] Also, how can I investigate why one osd has a blocked request? The hardware appears normal and the OSD is performing other requests like scrubs without problems. From its log: 2014-11-05 00:57:26.870867 7f7686331700 0 log [WRN] : 1 slow requests, 1 included below; oldest blocked for > 61440.449534 secs 2014-11-05 00:57:26.870873 7f7686331700 0 log [WRN] : slow request 61440.449534 seconds old, received at 2014-11-04 07:53:26.421301: osd_op(client.11334078.1:592 rb.0.206609.238e1f29.0000000752e8 [read 512~512] 4.17df39a7 RETRY=1 retry+read e115304) v4 currently reached pg 2014-11-05 00:57:31.816534 7f7665e4a700 0 -- 192.168.164.187:6800/7831 >> 192.168.164.191:6806/30336 pipe(0x44a98780 sd=89 :6800 s=0 pgs=0 c s=0 l=0 c=0x42f482c0).accept connect_seq 14 vs existing 13 state standby 2014-11-05 00:59:10.749429 7f7666e5a700 0 -- 192.168.164.187:6800/7831 >> 192.168.164.191:6800/20375 pipe(0x44a99900 sd=169 :6800 s=2 pgs=44 3 cs=29 l=0 c=0x42528b00).fault with nothing to send, going to standby 2014-11-05 01:02:09.746857 7f7664d39700 0 -- 192.168.164.187:6800/7831 >> 192.168.164.192:6802/9779 pipe(0x44a98280 sd=63 :6800 s=0 pgs=0 cs =0 l=0 c=0x42f48c60).accept connect_seq 26 vs existing 25 state standby Greg, I attempted to copy/paste you 'ceph scrub' output. Did I get the releveant bits? Thanks, Chad. _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com