Re: hard disk failure, unique monitor down: ceph down, please help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks for your answer Greg.

In our case, from the 2 servers we only keep 1 alive, so the first part of the script

ms=/root/mon-store mkdir $ms # collect the cluster map from stopped OSDs for  hostin  $hosts;  do
  rsync -avz$ms/. user@$host:$ms.remote
  rm -rf$ms
  ssh user@$host  <<EOF
for osd in /var/lib/ceph/osd/ceph-*; do
ceph-objectstore-tool --data-path \$osd --no-mon-config --op update-mon-db --mon-store-path $ms.remote
done
EOF
  rsync -avz user@$host:$ms.remote/.$ms
done

could be for us:

ceph-objectstore-tool --data-path /var/lib/ceph/ab7a7632-4388-11eb-ad9d-83fe4e551178/osd.3 --no-mon-config --op update-mon-db --mon-store-path /root/mon-store ceph-objectstore-tool --data-path /var/lib/ceph/ab7a7632-4388-11eb-ad9d-83fe4e551178/osd.4 --no-mon-config --op update-mon-db --mon-store-path /root/mon-store ceph-objectstore-tool --data-path /var/lib/ceph/ab7a7632-4388-11eb-ad9d-83fe4e551178/osd.5 --no-mon-config --op update-mon-db --mon-store-path /root/mon-store
but we get this error:

did not load config file, using default settings.
Mount failed with '(2) No such file or directory


Which is the right value we have to use for the --data-path option?


For the case it matters, the lvm structure the osds are working on is:

# pvdisplay -C
  PV         VG                                        Fmt  Attr PSize    PFree   /dev/sdb   ceph-a228cec5-405a-458d-9a41-201db41ea392 lvm2 a-- <7,28t      0    --> Osd.3   /dev/sdc   ceph-c51f4b74-daba-4417-abd4-a977e94a8126 lvm2 a-- <7,28t      0   --> Osd.4   /dev/sdd   ceph-5a34e8e4-c7c7-42fd-83ee-e2409563fe54 lvm2 a-- <1,82t      0  --> Osd.5

# lvdisplay -C
  LV VG                                        Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert   osd-block-26df91f7-60f1-4127-aceb-0c0974b0ea0c ceph-5a34e8e4-c7c7-42fd-83ee-e2409563fe54 -wi-a-----  <1,82t --> Osd.5   osd-block-e10e5ca5-f301-46a9-86ea-98a042125a4b ceph-a228cec5-405a-458d-9a41-201db41ea392 -wi-a-----  <7,28t --> Osd.3   osd-block-25263b2f-a727-4a1b-a5f7-248b6fd83d90 ceph-c51f4b74-daba-4417-abd4-a977e94a8126 -wi-a-----  <7,28t --> Osd.4


Miguel


El 20/8/21 a las 0:15, Gregory Farnum escribió:
On Thu, Aug 19, 2021 at 2:52 PM Ignacio García
<igarcia@xxxxxxxxxxxxxxxxx> wrote:
Hi,

In a production ceph system (with pending tasks as you'll see) we have
had a disaster: the server's boot disk where the only monitor was
running has failed, containing as well the monitor daemon data. We will
appreciate any help you can offer before we break anything that could be
recoverable trying non expert solutions.

Following are the details:


* system overview:

- 2 commodity servers, 4 HD each, 6 HDs for ceph osds
- replica size 2; 1 only monitor
- server 1: 1 mon, 1 mgr, 1 mds, 3 osds
- server 2: 1 mgr, 1 mds, 3 osds
- ceph octopus 15.2.11 containerized docker daemons; cephadm deployed
- used for libvirt VMs rbd images, and 1 cephfs


* hard disk structure details:

- server 1: running 1 mon, 1 mgr, 1 mds, 3 osds

   /dev/sda    2TB --> server 1 boot disk, root, and ceph daemons data
(/var/lib/ceph, etc) --> FAILED
   /dev/sdc    8TB --> Osd.2
   /dev/sdb    8TB --> Osd.1
   /dev/sdd    2TB --> Osd.0

- server 2: running 1 mgr, 1 mds, 3 osds

   /dev/sda    240GB (SSD)  --> server 2 boot disk, root, and ceph
daemons data (/var/lib/ceph, etc)
   /dev/sdb    8T --> Osd.3
   /dev/sdc    8T --> Osd.4
   /dev/sdd    2T --> Osd.5


* the problems:

--> server 1 /dev/sda HD failed, then server 1 is down: no monitors,
server 2 osds unable to start, ceph down
--> client.admin keyring lost


Is there any solution to recover the system?? Thank you very much in
advance.
https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-mon/#recovery-using-osds

Note the caveats about needing to re-create the filesystem metadata
separately. I think that's described somewhere in the mailing list
archive, but I've not done it myself.
-Greg

Miguel Garcia
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux