made a huge mistake, seeking recovery advice (osd zapped)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



hey folks,

I was deploying a new set of NVMe cards into my cluster, and while getting
the new devices ready, it seems the device names got mixed up, and I
managed to to run "sgdisk --zap-all" and "dd if=/dev/zero of="/dev/sd"
bs=1M count=100" on some of the active devices.

I was adding new cards so I could migrate off the 2k+2m erasure coded setup
to a more redundant config, but in my mess up I ran the commands above on 3
of the 4 devices before the ceph status changed and I noticed the mistake.

I managed to restore the LVM partition table from backup but it seems to
not be enough to restart the OSD... I just need to recover one of the 3
drives to save all of my VM+Docker backing filesystem.

I'm running on Kubernetes with Rook, after restoring the partition table it
seems to be starting up ok, but then I get a stack trace and the container
goes into Error state: https://pastebin.com/5wk1bKy9

Any ideas how to fix this? Or somehow extract the data and put it back
together?


-- 
Cheers,
Peter Sarossy
Technical Program Manager
Data Center Data Security - Google LLC.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux