Hi All, i have one big problem. My ceph cluster working only one month with this configuration: Centos 7 64bit ceph-admin 10.0.34.10 OSD01 10.0.34.21 OSD02 10.0.34.22 OSD04 10.0.34.24 After one month, in the server with Mon1, crashed Raid. Mon1 lost. one day cluster worked without Mon1. I try recovery Mon1 form OSDs. I created new Mon1 with the same ip. on all 4 OSDs: systemctl stop ceph-osd.target mkdir -p /tmp/mon-store ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-[1-2-3-4]/ --op update-mon-db --mon-store-path /tmp/mon-store/ rsync all data to new MON1. rsync -avz root@CLNODE0[1-2-3-4]:/tmp/mon-store/ /tmp/mon-store/ mkdir -p /var/lib/ceph/mon/ceph-MON1/ cp -r /tmp/mon-store/* /var/lib/ceph/mon/ceph-MON1/ cp keyring-mon to /var/lib/ceph/mon/ceph-MON1/ chown ceph:ceph -R /var/lib/ceph and ceph-monstore-tool /tmp/mon-store rebuild -- --keyring /etc/ceph/ceph.client.admin.keyring after that i copy all to /var/lib/ceph/mon1 (done keyring store.db systemd) touch done, systemd cp keyring-mon to keyring systemctl start ceph-mon@MON1.service but after that> :/1811249608 >> 10.0.34.11:6789/0 pipe(0x7f71d805c8c0 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f71d805db80).fault mon/AuthMonitor.cc: In function 'virtual void AuthMonitor::update_from_paxos(bool*)' thread 7f0db1661600 time 2017-12-25 00:31:44.810236 mon/AuthMonitor.cc: 160: FAILED assert(ret == 0) ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x5645fa9e5895] 2: (AuthMonitor::update_from_paxos(bool*)+0x1953) [0x5645fa77af83] 3: (PaxosService::refresh(bool*)+0x1a5) [0x5645fa68dc05] 4: (Monitor::refresh_from_paxos(bool*)+0x15b) [0x5645fa6248eb] 5: (Monitor::init_paxos()+0x95) [0x5645fa624d85] 6: (Monitor::preinit()+0x949) [0x5645fa6378f9] 7: (main()+0x242d) [0x5645fa5c266d] 8: (__libc_start_main()+0xf5) [0x7f0dae9d5c05] 9: (()+0x25ec3f) [0x5645fa615c3f] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. 2017-12-25 00:31:44.812415 7f0db1661600 -1 mon/AuthMonitor.cc: In function 'virtual void AuthMonitor::update_from_paxos(bool*)' thread 7f0db1661600 time 2017-12-25 00:31:44.810236 mon/AuthMonitor.cc: 160: FAILED assert(ret == 0) ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe) 1: (()+0x509c2a) [0x5645fa8c0c2a] 2: (()+0xf5e0) [0x7f0db01eb5e0] 3: (gsignal()+0x37) [0x7f0dae9e91f7] 4: (abort()+0x148) [0x7f0dae9ea8e8] 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x267) [0x5645fa9e5a77] 6: (AuthMonitor::update_from_paxos(bool*)+0x1953) [0x5645fa77af83] 7: (PaxosService::refresh(bool*)+0x1a5) [0x5645fa68dc05] 8: (Monitor::refresh_from_paxos(bool*)+0x15b) [0x5645fa6248eb] 9: (Monitor::init_paxos()+0x95) [0x5645fa624d85] 10: (Monitor::preinit()+0x949) [0x5645fa6378f9] 11: (main()+0x242d) [0x5645fa5c266d] 12: (__libc_start_main()+0xf5) [0x7f0dae9d5c05] 13: (()+0x25ec3f) [0x5645fa615c3f] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. Aborted What i not right make? Maybe have everyone faq, how to recovery lost MON from OSDs…. Thank you. Best regards, Alex. |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com