Hi,
our production cluster have been sabotaged. someone ran
# rm -rf /var/lib/ceph/*
on all our cluster servers :(
we got the command killed, but it deleted lots of files from one OSD on
each node. on some nodes it finished the first OSD and started on the
second.
all in all, we had about 50 OSD with no data to partial data.
we replaced all the touched disks with new disks on the hope that we can
recover some data out of them. and we let the cluster recover.
all disks use XFS file system. we have replicated and erasure pools.
almost all the affected disks are missing the "omap" dir. lots of pgs on
those disks have correct file counts and overall size, but we are unable
to export them due to the missing omap
we had 43 replicated pg with "stale" state, we recreated them, and we
plan to extract the objects from the disks and do "rados put". not sure
if that is going to be flawless or not. I only tried this on failed
disks in the past with few missing objects. not an empty pg.
we also have 156 incomplete pgs, those are erasure pgs. and most of them
have the missing shards intact in those extracted disks, but I have not
been successful to find a proper way to put them back in the new OSDs.
using recovery tools, we were unable to recover the omap files.
unfortunately, as I understand, xfs is not the friendliest fs in terms
of undelete
any ideas are appreciated.
and I was wondering, can we recreate those omaps? at least for that pg
we need to export? or put it in some form that the objectstore tool can
do an import?
thanks
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html