Re: Recovering from no quorum (2/3 monitors down) via 1 good monitor

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



easy:

1. make sure that none of the mons are running
2. extract the monmap from the good one
3. use monmaptool to remove the two other mons from it
4. inject the mon map back into the good mon
5. start the good mon
6. you now have a running cluster with only one mon, add two new ones


  Paul


2018-07-10 5:50 GMT+02:00 Syahrul Sazli Shaharir <sazli@xxxxxxxxxx>:
Hi,

I am running proxmox pve-5.1, with ceph luminous 12.2.4 as storage. I
have been running on 3 monitors, up until an abrupt power outage,
resulting in 2 monitors down and unable to start, while 1 monitor up
but with no quorum.

I tried extracting monmap from the good monitor and injecting it into
the other two, but got different errors for each:-

1. mon.mail1

# ceph-mon -i mail1 --inject-monmap /tmp/monmap
2018-07-10 11:29:03.562840 7f7d82845f80 -1 abort: Corruption: Bad
table magic number*** Caught signal (Aborted) **
 in thread 7f7d82845f80 thread_name:ceph-mon

 ceph version 12.2.4 (4832b6f0acade977670a37c20ff5dbe69e727416)
luminous (stable)
 1: (()+0x9439e4) [0x5652655669e4]
 2: (()+0x110c0) [0x7f7d81bfe0c0]
 3: (gsignal()+0xcf) [0x7f7d7ee12fff]
 4: (abort()+0x16a) [0x7f7d7ee1442a]
 5: (RocksDBStore::get(std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > const&,
std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> > const&, ceph::buffer::list*)+0x2f9)
[0x5652650a2eb9]
 6: (main()+0x1377) [0x565264ec3c57]
 7: (__libc_start_main()+0xf1) [0x7f7d7ee002e1]
 8: (_start()+0x2a) [0x565264f5954a]
2018-07-10 11:29:03.563721 7f7d82845f80 -1 *** Caught signal (Aborted) **
 in thread 7f7d82845f80 thread_name:ceph-mon

2.  mon,mail2

# ceph-mon -i mail2 --inject-monmap /tmp/monmap
2018-07-10 11:18:07.536097 7f161e2e3f80 -1 rocksdb: Corruption: Can't
access /065339.sst: IO error:
/var/lib/ceph/mon/ceph-mail2/store.db/065339.sst: No such file or
directory
Can't access /065337.sst: IO error:
/var/lib/ceph/mon/ceph-mail2/store.db/065337.sst: No such file or
directory

2018-07-10 11:18:07.536106 7f161e2e3f80 -1 error opening mon data
directory at '/var/lib/ceph/mon/ceph-mail2': (22) Invalid argument

Any other way I can recover other than rebuilding the monitor store
from the OSDs?

Thanks.

--
--sazli
Syahrul Sazli Shaharir <sazli@xxxxxxxxxx>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux