> Op 19 april 2016 om 19:15 schreef Mike Dawson <mike.dawson@xxxxxxxxxxxx>: > > > All, > > I was called in to assist in a failed Ceph environment with the cluster > in an inoperable state. No rbd volumes are mountable/exportable due to > missing PGs. > > The previous operator was using a replica count of 2. The cluster > suffered a power outage and various non-catastrophic hardware issues as > they were starting it back up. At some point during recovery, drives > were removed from the cluster leaving several PGs missing. > > Efforts to restore the missing PGs from the data on the removed drives > failed using the process detailed in a Red Hat Customer Support blog > post [0]. Upon starting the OSDs with recovered PGs, a segfault halts > progress. The original operator isn't clear on when, but there may have > been a software upgrade applied after the drives were pulled. > > I believe the cluster may be irrecoverable at this point. > That's not good to hear! > My recovery assistance has focused on a plan to: > > 1) Scrape all objects for several key rbd volumes from live OSDs and the > removed former OSD drives. > > 2) Compare and deduplicate the two copies of each object. > > 3) Recombine the objects for each volume into a raw image. > > I have completed steps 1 and 2 with apparent success. My initial stab at > step 3 yielded a raw image that could be mounted and had signs of a > filesystem, but it could not be read. Could anyone assist me with the > following questions? > > 1) Are the rbd objects in order by filename? If not, what is the method > to determine their order? > You might want to try my blogpost: http://blog.widodh.nl/2014/04/calculating-rados-objects-for-rbd-image/ > 2) How should objects smaller than the default 4MB chunk size be > handled? Should they be padded somehow? > Yes, with zeroes. But it depends on the offset. I don't know that for sure. > 3) If any objects were completely missing and therefore unavailable to > this process, how should they be handled? I assume we need to offset/pad > to compensate. If they are missing just add 4MB of zeroes. You might want to try importing the RBD objects into a fresh RBD cluster using the RADOS API. Just make sure you have a RBD header object with the proper object prefix and size in there. Through librbd you then might be able to recover the data. This way you use the RBD logic instead of having to script it yourself. Good luck! Wido > -- > Thanks, > > Mike Dawson > Co-Founder & Director of Cloud Architecture > Cloudapt LLC > 6330 East 75th Street, Suite 170 > Indianapolis, IN 46250 > M: 317-490-3018 > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com