Hi there,
I have a Ceph cluster version 12.2.2 on CentOS 7.after some days, I see that one of the three mons has stopped(out of quorum) and I can't start it anymore.
I checked the mon service log and the output shows this error:
"""
mon.XXXXXX@-1(probing) e4 preinit clean up potentially inconsistent store state
rocksdb: submit_transaction_sync error: Corruption: block checksum mismatch
code = 2 Rocksdb transaction:
0> 2018-02-16 17:37:07.041812 7f45a1e52e40 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUI
LD/ceph-12.2.2/src/mon/MonitorDBStore.h: In function 'void MonitorDBStore::clear(std::set<std::basic_string<char> >&)' thread 7f45a1e52e40 time 2018-02-16 17:37:07.040846
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/mon/MonitorDBStore.h: 581: FAILE
D assert(r >= 0)
"""
"""
mon.XXXXXX@-1(probing) e4 preinit clean up potentially inconsistent store state
rocksdb: submit_transaction_sync error: Corruption: block checksum mismatch
code = 2 Rocksdb transaction:
0> 2018-02-16 17:37:07.041812 7f45a1e52e40 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUI
LD/ceph-12.2.2/src/mon/MonitorDBStore.h: In function 'void MonitorDBStore::clear(std::set<std::basic_string<char> >&)' thread 7f45a1e52e40 time 2018-02-16 17:37:07.040846
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/mon/MonitorDBStore.h: 581: FAILE
D assert(r >= 0)
"""
the only solution I found is to remove this mon from quorum and remove all mon data and re-add this mon to quorum again.
and ceph goes to the healthy status again.
but now after some days this mon has stopped and I face the same problem again.
My cluster setup is:
4 osd hosts
total 8 osds
3 mons
1 rgw
this cluster has setup with ceph-volume lvm and wal/db separation on logical volumes.
Best regards,
Behnam Loghmani
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com