I've done a bit more work tonight and managed to get some more data back. Osd.121, which was previously completely dead, has made it through an XFS repair with a more fault tolerant HBA firmware and I was able to export both of the placement groups required using ceph_objectstore_tool. The osd would probably boot if I hadn't already marked it as lost :( I've basically got it down to two options. I can import the exported data from osd.121 into osd.190 which would complete the PG but this fails with a filestore feature mismatch because the sharded objects feature is missing on the target osd. Export has incompatible features set compat={},rocompat={},incompat={1=initial feature set(~v.18),2=pginfo object,3=object locator,4=last_epoch_clean,5=categories,6=hobjectpool,7=biginfo,8=leveldbinfo,9=leveldblog,10=snapmapper,11=sharded objects,12=transaction hints} The second one would be to run a ceph pg force_create_pg on each of the problem PG's to reset them back to empty and them import the data using ceph_objectstore_tool import-rados. Unfortunately this has failed as well when I tested ceph pg force_create_pg on an incomplete PG in another pool. The PG gets set to creating but then goes back to incomplete after a few minutes. I've trawled the mailing list for solutions but have come up empty, neither problem appears to have been resolved before. On Tue, Nov 11, 2014 at 5:54 PM, Matthew Anderson <manderson8787@xxxxxxxxx> wrote: > Thanks for your reply Sage! > > I've tested with 8.6ae and no luck I'm afraid. Steps taken were - > Stop osd.117 > Export 8.6ae from osd.117 > Remove 8.6ae from osd.117 > start osd.117 > restart osd.190 after still showing incomplete > > After this the PG was still showing incomplete and ceph pg dump_stuck > inactive shows - > pg_stat objects mip degr misp unf bytes log disklog state state_stamp > v reported up up_primary acting acting_primary last_scrub scrub_stamp > last_deep_scrub deep_scrub_stamp > 8.6ae 0 0 0 0 0 0 0 0 incomplete 2014-11-11 17:34:27.168078 0'0 > 161425:40 [117,190] 117 [117,190] 117 86424'389748 2013-09-09 > 16:52:58.796650 86424'389748 2013-09-09 16:52:58.796650 > > I then tried an export from OSD 190 to OSD 117 by doing - > Stop osd.190 and osd.117 > Export pg 8.6ae from osd.190 > Import from file generated in previous step into osd.117 > Boot both osd.190 and osd.117 > > When osd.117 attempts to start it generates an failed assert, full log > is here http://pastebin.com/S4CXrTAL > -1> 2014-11-11 17:25:15.130509 7f9f44512900 0 osd.117 161404 load_pgs > 0> 2014-11-11 17:25:18.604696 7f9f44512900 -1 osd/OSD.h: In > function 'OSDMapRef OSDService::get_map(epoch_t)' thread 7f9f44512900 > time 2014-11-11 17:25:18.602626 > osd/OSD.h: 715: FAILED assert(ret) > > ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578) > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x8b) [0xb8231b] > 2: (OSDService::get_map(unsigned int)+0x3f) [0x6eea2f] > 3: (OSD::load_pgs()+0x1b78) [0x6aae18] > 4: (OSD::init()+0x71f) [0x6abf5f] > 5: (main()+0x252c) [0x638cfc] > 6: (__libc_start_main()+0xf5) [0x7f9f41650ec5] > 7: /usr/bin/ceph-osd() [0x651027] > > I also attempted the same steps with 8.ca and got the same results. > The below is the current state of the pg with it removed from osd.111 > - > pg_stat objects mip degr misp unf bytes log disklog state state_stamp > v reported up up_primary acting acting_primary last_scrub scrub_stamp > last_deep_scrub deep_scrub_stamp > 8.ca 2440 0 0 0 0 10219748864 9205 9205 incomplete 2014-11-11 > 17:39:28.570675 160435'959618 161425:6071759 [190,111] 190 [190,111] > 190 86417'207324 2013-09-09 12:58:10.749001 86229'196887 2013-09-02 > 12:57:58.162789 > > Any idea of where I can go from here? > One thought I had was setting osd.111 and osd.117 out of the cluster > and once the data is moved I can shut them down and mark them as lost > which would make osd.190 the only replica available for those PG's. > > Thanks again > > On Tue, Nov 11, 2014 at 1:10 PM, Sage Weil <sweil@xxxxxxxxxx> wrote: >> On Tue, 11 Nov 2014, Matthew Anderson wrote: >>> Just an update, it appears that no data actually exists for those PG's >>> on osd.117 and osd.111 but it's showing as incomplete anyway. >>> >>> So for the 8.ca PG, osd.111 has only an empty directory but osd 190 is >>> filled with data. >>> For 8.6ae, osd.117 has no data in the pg directory and osd.190 is >>> filled with data as before. >>> >>> Since all of the required data is on OSD.190, would there be a way to >>> make osd.111 and osd.117 forget they have ever seen the two incomplete >>> PG's and therefore restart backfilling? >> >> Ah, that's good news. You should know that the copy on osd.190 is >> slightly out of date, but it is much better than losing the entire >> contents of the PG. More specifically, for 8.6ae the latest version was >> 1935986 but the osd.190 is 1935747, about 200 writes in the past. You'll >> need to fsck the RBD images after this is all done. >> >> I don't think we've tested this recovery scenario, but I think you'll be >> able to recovery with ceph_objectstore_tool, which has an import/export >> function and a delete function. First, try removing the newer version of >> the pg on osd.117. First export it for good measure (even tho it's >> empty): >> >> stop the osd >> >> ceph_objectstore_tool --data-path /var/lib/ceph/osd/ceph-117 \ >> --journal-path /var/lib/ceph/osd/ceph-117/journal \ >> --op export --pgid 8.6ae --file osd.117.8.7ae >> >> ceph_objectstore_tool --data-path /var/lib/ceph/osd/ceph-117 \ >> --journal-path /var/lib/ceph/osd/ceph-117/journal \ >> --op remove --pgid 8.6ae >> >> and restart. If that doesn't peer, you can also try exporting the pg from >> osd.190 and importing it into osd.117. I think just removing the >> newer empty pg on osd.117 will do the trick, though... >> >> sage >> >> >> >>> >>> >>> On Tue, Nov 11, 2014 at 10:37 AM, Matthew Anderson >>> <manderson8787@xxxxxxxxx> wrote: >>> > Hi All, >>> > >>> > We've had a string of very unfortunate failures and need a hand fixing >>> > the incomplete PG's that we're now left with. We're configured with 3 >>> > replicas over different hosts with 5 in total. >>> > >>> > The timeline goes - >>> > -1 week :: A full server goes offline with a failed backplane. Still >>> > not working >>> > -1 day :: OSD 190 fails >>> > -1 day + 3 minutes :: OSD 121 fails in a different server fails taking >>> > out several PG's and blocking IO >>> > Today :: The first failed osd (osd.190) was cloned to a good drive >>> > with xfs_dump | xfs_restore and now boots fine. The last failed osd >>> > (osd.121) is completely unrecoverable and was marked as lost. >>> > >>> > What we're left with now is 2 incomplete PG's that are preventing RBD >>> > images from booting. >>> > >>> > # ceph pg dump_stuck inactive >>> > ok >>> > pg_stat objects mip degr misp unf bytes log >>> > disklog state state_stamp v reported up up_primary >>> > acting acting_primary last_scrub scrub_stamp >>> > last_deep_scrub deep_scrub_stamp >>> > 8.ca 2440 0 0 0 0 10219748864 9205 9205 >>> > incomplete 2014-11-11 10:29:04.910512 160435'959618 >>> > 161358:6071679 [190,111] 190 [190,111] 190 86417'207324 >>> > 2013-09-09 12:58:10.749001 86229'196887 2013-09-02 >>> > 12:57:58.162789 >>> > 8.6ae 0 0 0 0 0 0 3176 3176 incomplete >>> > 2014-11-11 10:24:07.000373 160931'1935986 161358:267 >>> > [117,190] 117 [117,190] 117 86424'389748 2013-09-09 >>> > 16:52:58.796650 86424'389748 2013-09-09 16:52:58.796650 >>> > >>> > We've tried doing a pg revert but it's saying 'no missing objects' >>> > followed by not doing anything. I've also done the usual scrub, >>> > deep-scrub, pg and osd repairs... so far nothing has helped. >>> > >>> > I think it could be a similar situation to this post [ >>> > http://www.spinics.net/lists/ceph-users/msg11461.html ] where one of >>> > the osd's it holding a slightly newer but incomplete version of the PG >>> > which needs to be removed. Is anyone able to shed some light on how I >>> > might be able to use the objectstore tool to check if this is the >>> > case? >>> > >>> > If anyone has any suggestions it would be greatly appreciated. >>> > Likewise if you need any more information about my problem just let me >>> > know >>> > >>> > Thanks all >>> > -Matt >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@xxxxxxxxxxxxxx >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >>> _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com