Hi friends, we are a SME company that mounted a ceph storage system
several months ago as a proof of concept, then, as we liked it, started
to use it in production applications and as a corporative filesystem,
postponing taking the adequate measures to have a well deployed ceph
system (3 servers instead of 2, 3 object replica instead of 2, 3
monitors instead of 1...). The disaster has happened before than that
and we are desperately asking for your help in order to know whether we
can recover the system or at least the data.
In short, the boot disk of the server where the only monitor was running
has failed, containing as well the deamon monitor data (monitor map...).
We will appreciate any help you can offer us before we break anything
that could be recoverable trying non expert solutions.
Following are the details, thank you very much in advance:
* system overview:
2 commodity servers, 4 HD each, 6 HDs for ceph osds
2 replica; 1 only monitor
server 1: 1 mon, 1 mgr, 1 mds, 3 osds
server 2: 1 mgr, 1 mds, 3 osds
ceph octopus 15.2.11 containerized docker deamons; cephadm deployed
used for libvirt VMs rbd images, and 1 cephfs
* the problems:
--> HD 1.i failed, then server 1 is down: no monitors, server 2 osds
unable to start, ceph down
--> client.admin keyring lost
* hard disk structure details:
- server 1: MODEL SERIAL WWN
1.i) /dev/sda 1.8T WDC_WD2002FYPS-0 WD-WCAVY7030179
0x50014ee205e40c09
--> server 1 boot disk, root, and ceph deamons data (/var/lib/ceph, etc)
--> FAILED
1.ii) /dev/sdc 7.3T WDC_WD80EFAX-68L 7HKG3MEF 0x5000cca257f0b152
--> Osd.2
1.iii) /dev/sdb 7.3T WDC_WD80EFAX-68L 7HKG6H3F 0x5000cca257f0bc0f
--> Osd.1
1.iv) /dev/sdd 1.8T WDC_WD2002FYPS-0 WD-WCAVY6926130
0x50014ee25b180bf3
--> Osd.0
- server 2 MODEL SERIAL WWN
2.i) /dev/sda 223,6G INTEL_SSDSC2KB24 BTYF90350ENF240AGN
0x55cd2e4150390704
--> server 2 boot disk, root, and ceph deamons data (/var/lib/ceph, etc)
2.ii) /dev/sdb 7,3T HGST_HUS728T8TAL VAGUR01L 0x5000cca099cbafde
--> Osd.3
2.iii) /dev/sdc 7,3T HGST_HUS728T8TAL VGG2G7LG 0x5000cca0bec11e37
-> Osd.4
2.iv) /dev/sdd 1,8T WDC_WD2002FYPS-0 WD-WCAVY7261411
0x50014ee2064414f2
--> Osd.5
Ignacio G,
Live-Med Iberia