hard disk failure monitor issue: ceph down, please help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi friends, we are a SME company that mounted a ceph storage system several months ago as a proof of concept, then, as we liked it, started to use it in production applications and as a corporative filesystem, postponing taking the adequate measures to have a well deployed ceph system (3 servers instead of 2, 3 object replica instead of 2, 3 monitors instead of 1...). The disaster has happened before than that and we are desperately asking for your help in order to know whether we can recover the system or at least the data.

In short, the boot disk of the server where the only monitor was running has failed, containing as well the deamon monitor data (monitor map...). We will appreciate any help you can offer us before we break anything that could be recoverable trying non expert solutions.

Following are the details, thank you very much in advance:


* system overview:


2 commodity servers, 4 HD each, 6 HDs for ceph osds

2 replica; 1 only monitor

server 1: 1 mon, 1 mgr, 1 mds, 3 osds

server 2: 1 mgr, 1 mds, 3 osds

ceph octopus 15.2.11 containerized docker deamons; cephadm deployed

used for libvirt VMs rbd images, and 1 cephfs


* the problems:


--> HD 1.i failed, then server 1 is down: no monitors, server 2 osds unable to start, ceph down

--> client.admin keyring lost


* hard disk structure details:


- server 1:            MODEL    SERIAL    WWN

1.i)    /dev/sda    1.8T  WDC_WD2002FYPS-0     WD-WCAVY7030179 0x50014ee205e40c09

--> server 1 boot disk, root, and ceph deamons data (/var/lib/ceph, etc) --> FAILED

1.ii)    /dev/sdc    7.3T  WDC_WD80EFAX-68L    7HKG3MEF 0x5000cca257f0b152

--> Osd.2

1.iii)    /dev/sdb    7.3T WDC_WD80EFAX-68L    7HKG6H3F 0x5000cca257f0bc0f

--> Osd.1

1.iv)    /dev/sdd    1.8T WDC_WD2002FYPS-0     WD-WCAVY6926130 0x50014ee25b180bf3

--> Osd.0



- server 2            MODEL    SERIAL    WWN

2.i)    /dev/sda    223,6G  INTEL_SSDSC2KB24 BTYF90350ENF240AGN    0x55cd2e4150390704

--> server 2 boot disk, root, and ceph deamons data (/var/lib/ceph, etc)

2.ii)    /dev/sdb    7,3T  HGST_HUS728T8TAL    VAGUR01L 0x5000cca099cbafde

--> Osd.3

2.iii)    /dev/sdc    7,3T  HGST_HUS728T8TAL    VGG2G7LG 0x5000cca0bec11e37

->  Osd.4

2.iv)    /dev/sdd    1,8T  WDC_WD2002FYPS-0    WD-WCAVY7261411 0x50014ee2064414f2

-->  Osd.5


Ignacio G,

Live-Med Iberia





[Index of Archives]     [CEPH Users]     [Ceph Large]     [Ceph Dev]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux