Re: Restore OSD disks damaged by deployment misconfiguration

Sebastian Wagner <sewagner@xxxxxxxxxx> · Mon, 27 Sep 2021 10:38:18 +0200

Hi Phil,

Am 27.09.21 um 10:06 schrieb Phil Merricks:
Hey folks,

A recovery scenario I'm looking at right now is this:

1: In a clean 3-node Ceph cluster (pacific, deployed with cephadm), the OS
Disk is lost from all nodes
2: Trying to be helpful, a self-healing deployment system reinstalls the OS
on each node, and rebuilds the ceph services
3: Somewhere in the deployment system are 'sensible defaults' that assume
there are no stateful workloads, so the superblock is wiped from the other
block devices attached to these nodes to prevent stale metadata conflicts
4: The ceph rebuild has no knowledge of prior states and is expected to
simply restore based on discovery of existing devices.
5: Out of 5 block devices, 3 had their superblock wiped, 1 suffered
mechanical failure upon reboot, and 1 is completely intact, with
'ceph-volume lvm list' returning the correct information.
As soon as you manage to restore the OSDs, there is a command to 
re-create 
<https://docs.ceph.com/en/latest/cephadm/osd/#activate-existing-osds> 
the OSD containers.

Is there a way to restore the 3 devices with wiped superblocks?  Some basic
attempts with fsck to find the superblocks on the disks that were affected
yielded nothing.

Thanks for your time reading this message.

Cheers

Phil
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx