Hello community,
hello ceph developers,
My name is Ralf working as IT-consultant. In this paticular case I do support a german customer running a 2 node CEPH cluster.
This customer is struggeling with a desasterous situation, where a full pool of rbd-data (about 12 TB valid production-data) is lost. Details will follow underneath (The fact; Things already done).
I urgently need to answer the following questions, where I am aware that any procedure (if working out) will take time and money.
We will solve this problem, once there is light to go the right way. So, if you could point out any path to this way, I'd love to hear from you.
For the community I'm willing and keen to document it for any unlucky one, who will face a comparable situation in the future.
That said:
- Is there any realistic chance to reconstruct the data?
- A filesystem data-recovery-tool (here: XFS) is able to restore lost+found folders/objects form the involved OSD's
Is ceph-objectstoor-tool is a valid tool to export -> import this folders to a new pool - If there is no way get it as a well defined structure back a cluster, i got aware of the tool rbd_restore.
http://ceph.com/planet/ceph-recover-a-rbd-image-from-a-dead-cluster/#more-6738
Is this one versatil path to reconstruct a rbd-object from the recovered objects (all as fs-objects in subpathes of the recovery-disk)?
Again, any help is appreciated very much
best regards
Ralf
PS: I will be in IRC on #ceph (dwsadmin)
A) The facts
The cluster: ceph (v10.2.3), state: healthy
State of rbd-pool in question: gone, all PG's are deleted on the underlying OSD's
Cluster-structure:
- 3 server-nodes (64 GB RAM, Opteron CPU's)
- 2 server acting as monitor and osd node, 1 server acting as monitor
- 2 osd-nodes (15 osd's each, spinning disks), journals: party on ssd-partions, partly on sata partions
- just used for rbd
- curshmap: will take care to store rbd-pool data to storage-bucketes (pool size: 2); storgage host1 and host2 take the replicas
B) Things already done
We did analyse the situation and try to make sure not loose any bits on the underlying OSD disks
- Cluster activity like : ceph osd set noout, nodeep-scrub, no-scrub
now cluster state change as expected to HEALTH_WARN
- shut down all involved OSD's (seen from crushmap) like : systemctl stop ceph-osd@<osd-id>
- Get and install a professional Data Recovery Tool handling xfs filesystems (on node, 3Ware controller does not support JBOD, so runs in RAID0 mode)
- Drop in new physical disks (node1: 2x 8TB SATA) to copy out Lost+Found objects from the OSD's
- make Backup for all other Objects of the Ceph-Cluster
Of cose, since we are talking about roughly 12 TB data chunks, backukp and recovary takes an awful lang time ....
C) References found
- Incomplete PGs — OH MY! -> https://ceph.com/community/incomplete-pgs-oh-my/#comments
- Recovering incomplete PGs -> http://ceph-users.ceph.narkive.com/lwDkR2fZ/recovering-incomplete-pgs-with-ceph-objectstore-tool
- ceph-users: Recover unfound objects from crashed OSD's underlying filesystem -> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-February/007637.html
Reference
=========
# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Thread(s) per core: 2 Core(s) per socket: 8 Socket(s): 1 NUMA node(s): 2 Vendor ID: AuthenticAMD CPU family: 21 Model: 1 Model name: AMD Opteron(TM) Processor 6272 Stepping: 2 CPU MHz: 1400.000 CPU max MHz: 2100.0000 CPU min MHz: 1400.0000 BogoMIPS: 4199.99 Virtualization: AMD-V NUMA node0 CPU(s): 0-7 NUMA node1 CPU(s): 8-15 # free
total used free shared buff/cache available
Mem: 65956972 751600 315672 1528 64889700 64383492 Swap: 16777212 0 16777212 # tw-cli show
Ctl Model (V)Ports Drives Units NotOpt RRate VRate BBU ------------------------------------------------------------------------ c2 9750-4i 16 16 16 1 1 1 OK Enclosure Slots Drives Fans TSUnits PSUnits Alarms -------------------------------------------------------------- /c2/e0 16 16 5 1 2 1 # ceph --version
ceph version 10.2.3-247-g0c83eb3 (0c83eb355e989fb6ed38a3b82f9705fd5d700e89)
# ceph osd tree
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-12 0 host xxxsrv1 -1 xx room server-room -2 xx rack rack-daywalker -4 29.16936 storage data -6 14.29945 host xxxsrv1-data 9 1.70000 osd.9 down 1.00000 1.00000 18 1.79990 osd.18 down 1.00000 1.00000 19 1.79990 osd.19 down 1.00000 1.00000 22 1.79990 osd.22 down 1.00000 1.00000 1 1.79990 osd.1 down 1.00000 1.00000 0 1.79990 osd.0 down 1.00000 1.00000 12 1.79999 osd.12 down 1.00000 1.00000 25 1.79999 osd.25 down 1.00000 1.00000 -7 14.86990 host xxxsrv2-data 3 1.79999 osd.3 up 1.00000 1.00000 11 1.79999 osd.11 up 1.00000 1.00000 13 1.79999 osd.13 up 1.00000 1.00000 4 1.79999 osd.4 up 1.00000 1.00000 20 1.79999 osd.20 up 1.00000 1.00000 21 1.79999 osd.21 up 1.00000 1.00000 23 2.26999 osd.23 up 1.00000 1.00000 24 1.79999 osd.24 up 1.00000 1.00000 -5 14.49991 storage archive -8 8.99994 host xxxsrv1-archive 7 0.89998 osd.7 down 1.00000 1.00000 8 0.89998 osd.8 down 1.00000 1.00000 10 3.59999 osd.10 down 1.00000 1.00000 26 3.59999 osd.26 down 1.00000 1.00000 -9 5.49997 host xxxsrv2-archive 5 0.89999 osd.5 up 1.00000 1.00000 2 3.50000 osd.2 up 1.00000 1.00000 6 0.89998 osd.6 up 1.00000 1.00000 17 0.20000 osd.17 up 1.00000 1.00000 # ceph osd crush rule dump vdi-data
{
"rule_id": 3, "rule_name": "vdi-data", "ruleset": 3, "type": 1, "min_size": 1, "max_size": 10, "steps": [ { "op": "take", "item": -4, "item_name": "data" }, { "op": "chooseleaf_firstn", "num": 0, "type": "host" }, { "op": "emit" } ] } |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com