Sounds like you needed osd 20. You can mark osd 20 lost. -Sam On Wed, Nov 5, 2014 at 9:41 AM, Gregory Farnum <greg@xxxxxxxxxxx> wrote: > On Wed, Nov 5, 2014 at 7:24 AM, Chad Seys <cwseys@xxxxxxxxxxxxxxxx> wrote: >> Hi Sam, >> >>> Incomplete usually means the pgs do not have any complete copies. Did >>> you previously have more osds? >> >> No. But could have OSDs quitting after hitting assert(0 == "we got a bad >> state machine event"), or interacting with kernel 3.14 clients have caused the >> incomplete copies? >> >> How can I probe the fate of one of the incomplete PGs? e.g. >> pg 4.152 is incomplete, acting [1,11] >> >> Also, how can I investigate why one osd has a blocked request? The hardware >> appears normal and the OSD is performing other requests like scrubs without >> problems. From its log: >> >> 2014-11-05 00:57:26.870867 7f7686331700 0 log [WRN] : 1 slow requests, 1 >> included below; oldest blocked for > 61440.449534 secs >> 2014-11-05 00:57:26.870873 7f7686331700 0 log [WRN] : slow request >> 61440.449534 seconds old, received at 2014-11-04 07:53:26.421301: >> osd_op(client.11334078.1:592 rb.0.206609.238e1f29.0000000752e8 [read 512~512] >> 4.17df39a7 RETRY=1 retry+read e115304) v4 currently reached pg >> 2014-11-05 00:57:31.816534 7f7665e4a700 0 -- 192.168.164.187:6800/7831 >> >> 192.168.164.191:6806/30336 pipe(0x44a98780 sd=89 :6800 s=0 pgs=0 c >> s=0 l=0 c=0x42f482c0).accept connect_seq 14 vs existing 13 state standby >> 2014-11-05 00:59:10.749429 7f7666e5a700 0 -- 192.168.164.187:6800/7831 >> >> 192.168.164.191:6800/20375 pipe(0x44a99900 sd=169 :6800 s=2 pgs=44 >> 3 cs=29 l=0 c=0x42528b00).fault with nothing to send, going to standby >> 2014-11-05 01:02:09.746857 7f7664d39700 0 -- 192.168.164.187:6800/7831 >> >> 192.168.164.192:6802/9779 pipe(0x44a98280 sd=63 :6800 s=0 pgs=0 cs >> =0 l=0 c=0x42f48c60).accept connect_seq 26 vs existing 25 state standby >> >> Greg, I attempted to copy/paste you 'ceph scrub' output. Did I get the >> releveant bits? > > Looks like you provided the monitor log, which is actually distinct > from the central log. I don't think it matters, though — I was looking > for a very specific type of corruption that would have put them into a > HEALTH_WARN or HEALTH_FAIL state if they detected it. At this point > Sam is going to be a lot more help than I am. :) > -Greg _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com