PG's incomplete after OSD failure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi All,

We've had a string of very unfortunate failures and need a hand fixing
the incomplete PG's that we're now left with. We're configured with 3
replicas over different hosts with 5 in total.

The timeline goes -
-1 week  :: A full server goes offline with a failed backplane. Still
not working
-1 day  ::  OSD 190 fails
-1 day + 3 minutes :: OSD 121 fails in a different server fails taking
out several PG's and blocking IO
Today  :: The first failed osd (osd.190) was cloned to a good drive
with xfs_dump | xfs_restore and now boots fine. The last failed osd
(osd.121) is completely unrecoverable and was marked as lost.

What we're left with now is 2 incomplete PG's that are preventing RBD
images from booting.

# ceph pg dump_stuck inactive
ok
pg_stat    objects    mip    degr    misp    unf    bytes    log
disklog    state    state_stamp    v    reported    up    up_primary
 acting    acting_primary    last_scrub    scrub_stamp
last_deep_scrub    deep_scrub_stamp
8.ca    2440    0    0    0    0    10219748864    9205    9205
incomplete    2014-11-11 10:29:04.910512    160435'959618
161358:6071679    [190,111]    190    [190,111]    190    86417'207324
   2013-09-09 12:58:10.749001    86229'196887    2013-09-02
12:57:58.162789
8.6ae    0    0    0    0    0    0    3176    3176    incomplete
2014-11-11 10:24:07.000373    160931'1935986    161358:267
[117,190]    117    [117,190]    117    86424'389748    2013-09-09
16:52:58.796650    86424'389748    2013-09-09 16:52:58.796650

We've tried doing a pg revert but it's saying 'no missing objects'
followed by not doing anything. I've also done the usual scrub,
deep-scrub, pg and osd repairs... so far nothing has helped.

I think it could be a similar situation to this post [
http://www.spinics.net/lists/ceph-users/msg11461.html ] where one of
the osd's it holding a slightly newer but incomplete version of the PG
which needs to be removed. Is anyone able to shed some light on how I
might be able to use the objectstore tool to check if this is the
case?

If anyone has any suggestions it would be greatly appreciated.
Likewise if you need any more information about my problem just let me
know

Thanks all
-Matt
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux