First thing I'd try is to use objectstore-tool to scrape the inactive/broken PGs from the dead OSDs using it's PG export feature. Then import these PGs into any other OSD which will automatically recover it. Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 On Tue, May 12, 2020 at 2:07 PM Kári Bertilsson <karibertils@xxxxxxxxx> wrote: > Yes > ceph osd df tree and ceph -s is at https://pastebin.com/By6b1ps1 > > On Tue, May 12, 2020 at 10:39 AM Eugen Block <eblock@xxxxxx> wrote: > > > Can you share your osd tree and the current ceph status? > > > > > > Zitat von Kári Bertilsson <karibertils@xxxxxxxxx>: > > > > > Hello > > > > > > I had an incidence where 3 OSD's crashed at once completely and won't > > power > > > up. And during recovery 3 OSD's in another host have somehow become > > > corrupted. I am running erasure coding with 8+2 setup using crush map > > which > > > takes 2 OSDs per host, and after losing the other 2 OSD i have few PG's > > > down. Unfortunately these PG's seem to overlap almost all data on the > > pool, > > > so i believe the entire pool is mostly lost after only these 2% of PG's > > > down. > > > > > > I am running ceph 14.2.9. > > > > > > OSD 92 log https://pastebin.com/5aq8SyCW > > > OSD 97 log https://pastebin.com/uJELZxwr > > > > > > ceph-bluestore-tool repair without --deep showed "success" but OSD's > > still > > > fail with the log above. > > > > > > Log from trying ceph-bluestore-tool repair --deep which is still > running, > > > not sure if it will actually fix anything and log looks pretty bad. > > > https://pastebin.com/gkqTZpY3 > > > > > > Trying "ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-97 > --op > > > list" gave me input/output error. But everything in SMART looks OK, > and i > > > see no indication of hardware read error in any logs. Same for both > OSD. > > > > > > The OSD's with corruption have absolutely no bad sectors and likely > have > > > only a minor corruption but at important locations. > > > > > > Any ideas on how to recover this kind of scenario ? Any tips would be > > > highly appreciated. > > > > > > Best regards, > > > Kári Bertilsson > > > _______________________________________________ > > > ceph-users mailing list -- ceph-users@xxxxxxx > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > > > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx