Re: OSD corruption and down PGs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Do you have access to another Ceph cluster with enough available space to
create rbds that you dd these failing disks into? That's what I'm doing
right now with some failing disks. I've recovered 2 out of 6 osds that
failed in this way. I would recommend against using the same cluster for
this, but a stage cluster or something would be great.

On Tue, May 12, 2020, 7:36 PM Kári Bertilsson <karibertils@xxxxxxxxx> wrote:

> Hi Paul
>
> I was able to mount both OSD's i need data from successfully using
> "ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-92 --op fuse
> --mountpoint /osd92/"
>
> I see the PG slices that are missing in the mounted folder
> "41.b3s3_head" "41.ccs5_head" etc. And i can copy any data from inside the
> mounted folder and that works fine.
>
> But when i try to export it fails. I get the same error when trying to
> list.
>
> # ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-92 --op list
> --debug
> Output @ https://pastebin.com/nXScEL6L
>
> Any ideas ?
>
> On Tue, May 12, 2020 at 12:17 PM Paul Emmerich <paul.emmerich@xxxxxxxx>
> wrote:
>
> > First thing I'd try is to use objectstore-tool to scrape the
> > inactive/broken PGs from the dead OSDs using it's PG export feature.
> > Then import these PGs into any other OSD which will automatically recover
> > it.
> >
> > Paul
> >
> > --
> > Paul Emmerich
> >
> > Looking for help with your Ceph cluster? Contact us at https://croit.io
> >
> > croit GmbH
> > Freseniusstr. 31h
> > 81247 München
> > www.croit.io
> > Tel: +49 89 1896585 90
> >
> >
> > On Tue, May 12, 2020 at 2:07 PM Kári Bertilsson <karibertils@xxxxxxxxx>
> > wrote:
> >
> >> Yes
> >> ceph osd df tree and ceph -s is at https://pastebin.com/By6b1ps1
> >>
> >> On Tue, May 12, 2020 at 10:39 AM Eugen Block <eblock@xxxxxx> wrote:
> >>
> >> > Can you share your osd tree and the current ceph status?
> >> >
> >> >
> >> > Zitat von Kári Bertilsson <karibertils@xxxxxxxxx>:
> >> >
> >> > > Hello
> >> > >
> >> > > I had an incidence where 3 OSD's crashed at once completely and
> won't
> >> > power
> >> > > up. And during recovery 3 OSD's in another host have somehow become
> >> > > corrupted. I am running erasure coding with 8+2 setup using crush
> map
> >> > which
> >> > > takes 2 OSDs per host, and after losing the other 2 OSD i have few
> >> PG's
> >> > > down. Unfortunately these PG's seem to overlap almost all data on
> the
> >> > pool,
> >> > > so i believe the entire pool is mostly lost after only these 2% of
> >> PG's
> >> > > down.
> >> > >
> >> > > I am running ceph 14.2.9.
> >> > >
> >> > > OSD 92 log https://pastebin.com/5aq8SyCW
> >> > > OSD 97 log https://pastebin.com/uJELZxwr
> >> > >
> >> > > ceph-bluestore-tool repair without --deep showed "success" but OSD's
> >> > still
> >> > > fail with the log above.
> >> > >
> >> > > Log from trying ceph-bluestore-tool repair --deep which is still
> >> running,
> >> > > not sure if it will actually fix anything and log looks pretty bad.
> >> > > https://pastebin.com/gkqTZpY3
> >> > >
> >> > > Trying "ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-97
> >> --op
> >> > > list" gave me input/output error. But everything in SMART looks OK,
> >> and i
> >> > > see no indication of hardware read error in any logs. Same for both
> >> OSD.
> >> > >
> >> > > The OSD's with corruption have absolutely no bad sectors and likely
> >> have
> >> > > only a minor corruption but at important locations.
> >> > >
> >> > > Any ideas on how to recover this kind of scenario ? Any tips would
> be
> >> > > highly appreciated.
> >> > >
> >> > > Best regards,
> >> > > Kári Bertilsson
> >> > > _______________________________________________
> >> > > ceph-users mailing list -- ceph-users@xxxxxxx
> >> > > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >> >
> >> >
> >> > _______________________________________________
> >> > ceph-users mailing list -- ceph-users@xxxxxxx
> >> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >> >
> >> _______________________________________________
> >> ceph-users mailing list -- ceph-users@xxxxxxx
> >> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >>
> >
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux