Re: OSD corruption and down PGs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



First thing I'd try is to use objectstore-tool to scrape the
inactive/broken PGs from the dead OSDs using it's PG export feature.
Then import these PGs into any other OSD which will automatically recover
it.

Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90


On Tue, May 12, 2020 at 2:07 PM Kári Bertilsson <karibertils@xxxxxxxxx>
wrote:

> Yes
> ceph osd df tree and ceph -s is at https://pastebin.com/By6b1ps1
>
> On Tue, May 12, 2020 at 10:39 AM Eugen Block <eblock@xxxxxx> wrote:
>
> > Can you share your osd tree and the current ceph status?
> >
> >
> > Zitat von Kári Bertilsson <karibertils@xxxxxxxxx>:
> >
> > > Hello
> > >
> > > I had an incidence where 3 OSD's crashed at once completely and won't
> > power
> > > up. And during recovery 3 OSD's in another host have somehow become
> > > corrupted. I am running erasure coding with 8+2 setup using crush map
> > which
> > > takes 2 OSDs per host, and after losing the other 2 OSD i have few PG's
> > > down. Unfortunately these PG's seem to overlap almost all data on the
> > pool,
> > > so i believe the entire pool is mostly lost after only these 2% of PG's
> > > down.
> > >
> > > I am running ceph 14.2.9.
> > >
> > > OSD 92 log https://pastebin.com/5aq8SyCW
> > > OSD 97 log https://pastebin.com/uJELZxwr
> > >
> > > ceph-bluestore-tool repair without --deep showed "success" but OSD's
> > still
> > > fail with the log above.
> > >
> > > Log from trying ceph-bluestore-tool repair --deep which is still
> running,
> > > not sure if it will actually fix anything and log looks pretty bad.
> > > https://pastebin.com/gkqTZpY3
> > >
> > > Trying "ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-97
> --op
> > > list" gave me input/output error. But everything in SMART looks OK,
> and i
> > > see no indication of hardware read error in any logs. Same for both
> OSD.
> > >
> > > The OSD's with corruption have absolutely no bad sectors and likely
> have
> > > only a minor corruption but at important locations.
> > >
> > > Any ideas on how to recover this kind of scenario ? Any tips would be
> > > highly appreciated.
> > >
> > > Best regards,
> > > Kári Bertilsson
> > > _______________________________________________
> > > ceph-users mailing list -- ceph-users@xxxxxxx
> > > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >
> >
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux