Re: OSD corruption and down PGs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Paul

I was able to mount both OSD's i need data from successfully using
"ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-92 --op fuse
--mountpoint /osd92/"

I see the PG slices that are missing in the mounted folder
"41.b3s3_head" "41.ccs5_head" etc. And i can copy any data from inside the
mounted folder and that works fine.

But when i try to export it fails. I get the same error when trying to list.

# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-92 --op list
--debug
Output @ https://pastebin.com/nXScEL6L

Any ideas ?

On Tue, May 12, 2020 at 12:17 PM Paul Emmerich <paul.emmerich@xxxxxxxx>
wrote:

> First thing I'd try is to use objectstore-tool to scrape the
> inactive/broken PGs from the dead OSDs using it's PG export feature.
> Then import these PGs into any other OSD which will automatically recover
> it.
>
> Paul
>
> --
> Paul Emmerich
>
> Looking for help with your Ceph cluster? Contact us at https://croit.io
>
> croit GmbH
> Freseniusstr. 31h
> 81247 München
> www.croit.io
> Tel: +49 89 1896585 90
>
>
> On Tue, May 12, 2020 at 2:07 PM Kári Bertilsson <karibertils@xxxxxxxxx>
> wrote:
>
>> Yes
>> ceph osd df tree and ceph -s is at https://pastebin.com/By6b1ps1
>>
>> On Tue, May 12, 2020 at 10:39 AM Eugen Block <eblock@xxxxxx> wrote:
>>
>> > Can you share your osd tree and the current ceph status?
>> >
>> >
>> > Zitat von Kári Bertilsson <karibertils@xxxxxxxxx>:
>> >
>> > > Hello
>> > >
>> > > I had an incidence where 3 OSD's crashed at once completely and won't
>> > power
>> > > up. And during recovery 3 OSD's in another host have somehow become
>> > > corrupted. I am running erasure coding with 8+2 setup using crush map
>> > which
>> > > takes 2 OSDs per host, and after losing the other 2 OSD i have few
>> PG's
>> > > down. Unfortunately these PG's seem to overlap almost all data on the
>> > pool,
>> > > so i believe the entire pool is mostly lost after only these 2% of
>> PG's
>> > > down.
>> > >
>> > > I am running ceph 14.2.9.
>> > >
>> > > OSD 92 log https://pastebin.com/5aq8SyCW
>> > > OSD 97 log https://pastebin.com/uJELZxwr
>> > >
>> > > ceph-bluestore-tool repair without --deep showed "success" but OSD's
>> > still
>> > > fail with the log above.
>> > >
>> > > Log from trying ceph-bluestore-tool repair --deep which is still
>> running,
>> > > not sure if it will actually fix anything and log looks pretty bad.
>> > > https://pastebin.com/gkqTZpY3
>> > >
>> > > Trying "ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-97
>> --op
>> > > list" gave me input/output error. But everything in SMART looks OK,
>> and i
>> > > see no indication of hardware read error in any logs. Same for both
>> OSD.
>> > >
>> > > The OSD's with corruption have absolutely no bad sectors and likely
>> have
>> > > only a minor corruption but at important locations.
>> > >
>> > > Any ideas on how to recover this kind of scenario ? Any tips would be
>> > > highly appreciated.
>> > >
>> > > Best regards,
>> > > Kári Bertilsson
>> > > _______________________________________________
>> > > ceph-users mailing list -- ceph-users@xxxxxxx
>> > > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>> >
>> >
>> > _______________________________________________
>> > ceph-users mailing list -- ceph-users@xxxxxxx
>> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>> >
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux