Hi,
In a production ceph system (with pending tasks as you'll see) we have
had a disaster: the server's boot disk where the only monitor was
running has failed, containing as well the monitor daemon data. We will
appreciate any help you can offer before we break anything that could be
recoverable trying non expert solutions.
Following are the details:
* system overview:
- 2 commodity servers, 4 HD each, 6 HDs for ceph osds
- replica size 2; 1 only monitor
- server 1: 1 mon, 1 mgr, 1 mds, 3 osds
- server 2: 1 mgr, 1 mds, 3 osds
- ceph octopus 15.2.11 containerized docker daemons; cephadm deployed
- used for libvirt VMs rbd images, and 1 cephfs
* hard disk structure details:
- server 1: running 1 mon, 1 mgr, 1 mds, 3 osds
/dev/sda 2TB --> server 1 boot disk, root, and ceph daemons data
(/var/lib/ceph, etc) --> FAILED
/dev/sdc 8TB --> Osd.2
/dev/sdb 8TB --> Osd.1
/dev/sdd 2TB --> Osd.0
- server 2: running 1 mgr, 1 mds, 3 osds
/dev/sda 240GB (SSD) --> server 2 boot disk, root, and ceph
daemons data (/var/lib/ceph, etc)
/dev/sdb 8T --> Osd.3
/dev/sdc 8T --> Osd.4
/dev/sdd 2T --> Osd.5
* the problems:
--> server 1 /dev/sda HD failed, then server 1 is down: no monitors,
server 2 osds unable to start, ceph down
--> client.admin keyring lost
Is there any solution to recover the system?? Thank you very much in
advance.
Miguel Garcia
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx