Re: PG's incomplete after OSD failure

Matthew Anderson <manderson8787@xxxxxxxxx> · Tue, 11 Nov 2014 17:54:17 +0800

Thanks for your reply Sage!

I've tested with 8.6ae and no luck I'm afraid. Steps taken were -
Stop osd.117
Export 8.6ae from osd.117
Remove 8.6ae from osd.117
start osd.117
restart osd.190 after still showing incomplete

After this the PG was still showing incomplete and ceph pg dump_stuck
inactive shows -
pg_stat objects mip degr misp unf bytes log disklog state state_stamp
v reported up up_primary acting acting_primary last_scrub scrub_stamp
last_deep_scrub deep_scrub_stamp
8.6ae 0 0 0 0 0 0 0 0 incomplete 2014-11-11 17:34:27.168078 0'0
161425:40 [117,190] 117 [117,190] 117 86424'389748 2013-09-09
16:52:58.796650 86424'389748 2013-09-09 16:52:58.796650

I then tried an export from OSD 190 to OSD 117 by doing -
Stop osd.190 and osd.117
Export pg 8.6ae from osd.190
Import from file generated in previous step into osd.117
Boot both osd.190 and osd.117

When osd.117 attempts to start it generates an failed assert, full log
is here http://pastebin.com/S4CXrTAL
-1> 2014-11-11 17:25:15.130509 7f9f44512900  0 osd.117 161404 load_pgs
     0> 2014-11-11 17:25:18.604696 7f9f44512900 -1 osd/OSD.h: In
function 'OSDMapRef OSDService::get_map(epoch_t)' thread 7f9f44512900
time 2014-11-11 17:25:18.602626
osd/OSD.h: 715: FAILED assert(ret)

 ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x8b) [0xb8231b]
 2: (OSDService::get_map(unsigned int)+0x3f) [0x6eea2f]
 3: (OSD::load_pgs()+0x1b78) [0x6aae18]
 4: (OSD::init()+0x71f) [0x6abf5f]
 5: (main()+0x252c) [0x638cfc]
 6: (__libc_start_main()+0xf5) [0x7f9f41650ec5]
 7: /usr/bin/ceph-osd() [0x651027]

I also attempted the same steps with 8.ca and got the same results.
The below is the current state of the pg with it removed from osd.111
-
pg_stat objects mip degr misp unf bytes log disklog state state_stamp
v reported up up_primary acting acting_primary last_scrub scrub_stamp
last_deep_scrub deep_scrub_stamp
8.ca 2440 0 0 0 0 10219748864 9205 9205 incomplete 2014-11-11
17:39:28.570675 160435'959618 161425:6071759 [190,111] 190 [190,111]
190 86417'207324 2013-09-09 12:58:10.749001 86229'196887 2013-09-02
12:57:58.162789

Any idea of where I can go from here?
One thought I had was setting osd.111 and osd.117 out of the cluster
and once the data is moved I can shut them down and mark them as lost
which would make osd.190 the only replica available for those PG's.

Thanks again

On Tue, Nov 11, 2014 at 1:10 PM, Sage Weil <sweil@xxxxxxxxxx> wrote:
> On Tue, 11 Nov 2014, Matthew Anderson wrote:
>> Just an update, it appears that no data actually exists for those PG's
>> on osd.117 and osd.111 but it's showing as incomplete anyway.
>>
>> So for the 8.ca PG, osd.111 has only an empty directory but osd 190 is
>> filled with data.
>> For 8.6ae, osd.117 has no data in the pg directory and osd.190 is
>> filled with data as before.
>>
>> Since all of the required data is on OSD.190, would there be a way to
>> make osd.111 and osd.117 forget they have ever seen the two incomplete
>> PG's and therefore restart backfilling?
>
> Ah, that's good news.  You should know that the copy on osd.190 is
> slightly out of date, but it is much better than losing the entire
> contents of the PG.  More specifically, for 8.6ae the latest version was
> 1935986 but the osd.190 is 1935747, about 200 writes in the past.  You'll
> need to fsck the RBD images after this is all done.
>
> I don't think we've tested this recovery scenario, but I think you'll be
> able to recovery with ceph_objectstore_tool, which has an import/export
> function and a delete function.  First, try removing the newer version of
> the pg on osd.117.  First export it for good measure (even tho it's
> empty):
>
> stop the osd
>
> ceph_objectstore_tool --data-path /var/lib/ceph/osd/ceph-117  \
> --journal-path /var/lib/ceph/osd/ceph-117/journal \
> --op export --pgid 8.6ae --file osd.117.8.7ae
>
> ceph_objectstore_tool --data-path /var/lib/ceph/osd/ceph-117  \
> --journal-path /var/lib/ceph/osd/ceph-117/journal \
> --op remove --pgid 8.6ae
>
> and restart.  If that doesn't peer, you can also try exporting the pg from
> osd.190 and importing it into osd.117.  I think just removing the
> newer empty pg on osd.117 will do the trick, though...
>
> sage
>
>
>
>>
>>
>> On Tue, Nov 11, 2014 at 10:37 AM, Matthew Anderson
>> <manderson8787@xxxxxxxxx> wrote:
>> > Hi All,
>> >
>> > We've had a string of very unfortunate failures and need a hand fixing
>> > the incomplete PG's that we're now left with. We're configured with 3
>> > replicas over different hosts with 5 in total.
>> >
>> > The timeline goes -
>> > -1 week  :: A full server goes offline with a failed backplane. Still
>> > not working
>> > -1 day  ::  OSD 190 fails
>> > -1 day + 3 minutes :: OSD 121 fails in a different server fails taking
>> > out several PG's and blocking IO
>> > Today  :: The first failed osd (osd.190) was cloned to a good drive
>> > with xfs_dump | xfs_restore and now boots fine. The last failed osd
>> > (osd.121) is completely unrecoverable and was marked as lost.
>> >
>> > What we're left with now is 2 incomplete PG's that are preventing RBD
>> > images from booting.
>> >
>> > # ceph pg dump_stuck inactive
>> > ok
>> > pg_stat    objects    mip    degr    misp    unf    bytes    log
>> > disklog    state    state_stamp    v    reported    up    up_primary
>> >  acting    acting_primary    last_scrub    scrub_stamp
>> > last_deep_scrub    deep_scrub_stamp
>> > 8.ca    2440    0    0    0    0    10219748864    9205    9205
>> > incomplete    2014-11-11 10:29:04.910512    160435'959618
>> > 161358:6071679    [190,111]    190    [190,111]    190    86417'207324
>> >    2013-09-09 12:58:10.749001    86229'196887    2013-09-02
>> > 12:57:58.162789
>> > 8.6ae    0    0    0    0    0    0    3176    3176    incomplete
>> > 2014-11-11 10:24:07.000373    160931'1935986    161358:267
>> > [117,190]    117    [117,190]    117    86424'389748    2013-09-09
>> > 16:52:58.796650    86424'389748    2013-09-09 16:52:58.796650
>> >
>> > We've tried doing a pg revert but it's saying 'no missing objects'
>> > followed by not doing anything. I've also done the usual scrub,
>> > deep-scrub, pg and osd repairs... so far nothing has helped.
>> >
>> > I think it could be a similar situation to this post [
>> > http://www.spinics.net/lists/ceph-users/msg11461.html ] where one of
>> > the osd's it holding a slightly newer but incomplete version of the PG
>> > which needs to be removed. Is anyone able to shed some light on how I
>> > might be able to use the objectstore tool to check if this is the
>> > case?
>> >
>> > If anyone has any suggestions it would be greatly appreciated.
>> > Likewise if you need any more information about my problem just let me
>> > know
>> >
>> > Thanks all
>> > -Matt
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com