Dear Cephers, A few days ago disaster struck the Ceph cluster (erasure-coded) I am administrating, as the UPS power was pull from the cluster causing a power outage. After rebooting the system, 6 osds were lost (spread over 5 osd nodes) as they could not mount anymore, several others had damages. This was more than the host-faliure domain was setup to handle and auto-recovery failed and osds started downing in a cascading maner. When the dust settled, there were 8 pgs (of 2048) inactive and a bunch of osds down. I managed to recover 5 pgs, mainly by ceph-objectstore-tool export/import/repair commands, but now I am left with 3 pgs that are inactive and incomplete. One of the pgs seems un-salvageable, as I cannot get to become active at all (repair/import/export/lowering min_size), but the two others I can get active if I export/import one of the pg shards and restart osd. Rebuilding then starts but after a while one of the osds holding the pgs goes down, with a "FAILED ceph_assert(clone_size.count(clone))" message in the log. If I set osds to noout nodown, then I can that it is only rather few objects e.g. 161 of a pg of >100000, that are failing to be remapped. Since most of the object in the two pgs seem intact, it would be sad to delete the whole pg (force-create-pg) and loose all that data. Is there a way to show and delete the failing objects? I have thought of a recovery plan and want to share that with you, so you can comment on this if it sounds doable or not? * Stop osds from recovering: ceph osd set norecover * bring back pgs active: ceph-objectstore-tool export/import and restart osd * find files in pgs: cephfs-data-scan pg_files <path> <pg id> * pull out as many as possible of those files to other location. * recreate pgs: ceph osd force-create-pg <pgid> * restart recovery: ceph osd unset norecover * copy back in the recovered files Would that work or do you have a better suggestion? Cheers, Jesper -------------------------- Jesper Lykkegaard Karlsen Scientific Computing Centre for Structural Biology Department of Molecular Biology and Genetics Aarhus University Gustav Wieds Vej 10 8000 Aarhus C E-mail: jelka@xxxxxxxxx Tlf: +45 50906203 _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx