hard disk failure, unique monitor down: ceph down, please help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

In a production ceph system (with pending tasks as you'll see) we have had a disaster: the server's boot disk where the only monitor was running has failed, containing as well the monitor daemon data. We will appreciate any help you can offer before we break anything that could be recoverable trying non expert solutions.

Following are the details:


* system overview:

- 2 commodity servers, 4 HD each, 6 HDs for ceph osds
- replica size 2; 1 only monitor
- server 1: 1 mon, 1 mgr, 1 mds, 3 osds
- server 2: 1 mgr, 1 mds, 3 osds
- ceph octopus 15.2.11 containerized docker daemons; cephadm deployed
- used for libvirt VMs rbd images, and 1 cephfs


* hard disk structure details:

- server 1: running 1 mon, 1 mgr, 1 mds, 3 osds

 /dev/sda    2TB --> server 1 boot disk, root, and ceph daemons data (/var/lib/ceph, etc) --> FAILED
 /dev/sdc    8TB --> Osd.2
 /dev/sdb    8TB --> Osd.1
 /dev/sdd    2TB --> Osd.0

- server 2: running 1 mgr, 1 mds, 3 osds

 /dev/sda    240GB (SSD)  --> server 2 boot disk, root, and ceph daemons data (/var/lib/ceph, etc)
 /dev/sdb    8T --> Osd.3
 /dev/sdc    8T --> Osd.4
 /dev/sdd    2T --> Osd.5


* the problems:

--> server 1 /dev/sda HD failed, then server 1 is down: no monitors, server 2 osds unable to start, ceph down
--> client.admin keyring lost


Is there any solution to recover the system?? Thank you very much in advance.

Miguel Garcia
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux