Hi, we upgraded from 12.2.8 to 13.2.4 (ubuntu 16.04) - after the upgrade (~2 hours after the upgrade) the replay-mds keep crashing so we tryed to restart all MDS than the filesystem was in 'failed' state and no MDS is in "activ"-state - we than tryed to downgrade the MDS to 13.2.1 but had no luck. ceph version 13.2.1 (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic (stable), process ceph-mds, pid 6843 ... log_channel(cluster) log [ERR] : corrupt sessionmap values: buffer::malformed_input: void session_info_t::decode(ceph::buffer::list::iterator&) no longer understand old encoding version 6 < 7 so we "fixed" the filsystem by following the manual (http://docs.ceph.com/docs/master/cephfs/disaster-recovery-experts/) systemctl stop ceph-mds@`hostname -s`.service cephfs-journal-tool journal export backup.bin cephfs-journal-tool --help cephfs-journal-tool --rank=cephfs:all journal export backup.bin cephfs-journal-tool --rank=cephfs:all event recover_dentries summary cephfs-journal-tool --rank=cephfs:all journal reset cephfs-table-tool all reset session systemctl restart ceph-mds@mds05.service systemctl stop ceph-mds@`hostname -s`.service ceph fs set cephfs down true ceph fs set cephfs down false cephfs-table-tool all reset session cephfs-table-tool all reset inode cephfs-table-tool all reset snap systemctl restart ceph-mds@mds05.service less /var/log/ceph/ceph-mds.mds05.log systemctl stop ceph-mds@`hostname -s`.service ceph fs reset cephfs --yes-i-really-mean-it systemctl restart ceph-mds@mds05.service less /var/log/ceph/ceph-mds.mds05.log after the reset we was able to use the cephfs but we still have some errors ... log [ERR] : unmatched rstat rbytes on single dirfrag 0x10002253f4e, inode has n(v9 rc2019-01-28 14:42:47.371612 b1158 71=8+63), dirfrag has n(v9 rc2019-01-28 14:42:47.371612 b1004 65=7+58) log [ERR] : unmatched fragstat size on single dirfrag 0x10002253db6, inode has f(v0 m2019-01-28 14:46:47.983292 59=0+59), dirfrag has f(v0 m2019-01-28 14:46:47.983292 58=0+58) log [ERR] : unmatched rstat rbytes on single dirfrag 0x10002253db6, inode has n(v11 rc2019-01-28 14:46:47.983292 b1478 71=11+60), dirfrag has n(v11 rc2019-01-28 14:46:47.983292 b1347 68=10+58) ... any help is welcome, Ansgar Am Di., 29. Jan. 2019 um 12:32 Uhr schrieb Yan, Zheng <ukernel@xxxxxxxxx>: > > upgraded from which version? have you try downgrade ceph-mds to old version? > > > On Mon, Jan 28, 2019 at 9:20 PM Ansgar Jazdzewski > <a.jazdzewski@xxxxxxxxxxxxxx> wrote: > > > > hi folks we need some help with our cephfs, all mds keep crashing > > > > starting mds.mds02 at - > > terminate called after throwing an instance of > > 'ceph::buffer::bad_alloc' > > what(): buffer::bad_alloc > > *** Caught signal (Aborted) ** > > in thread 7f542d825700 thread_name:md_log_replay > > ceph version 13.2.4 (b10be4d44915a4d78a8e06aa31919e74927b142e) mimic (stable) > > 1: /usr/bin/ceph-mds() [0x7cc8a0] > > 2: (()+0x11390) [0x7f543cf29390] > > 3: (gsignal()+0x38) [0x7f543c676428] > > 4: (abort()+0x16a) [0x7f543c67802a] > > 5: (__gnu_cxx::__verbose_terminate_handler()+0x135) [0x7f543dae6e65] > > 6: (__cxxabiv1::__terminate(void (*)())+0x6) [0x7f543dadae46] > > 7: (()+0x734e91) [0x7f543dadae91] > > 8: (()+0x7410a4) [0x7f543dae70a4] > > 9: (ceph::buffer::create_aligned_in_mempool(unsigned int, unsigned > > int, int)+0x258) [0x7f543d63b348] > > 10: (ceph::buffer::list::iterator_impl<false>::copy_shallow(unsigned > > int, ceph::buffer::ptr&)+0xa2) [0x7f543d640ee2] > > 11: (compact_map_base<std::__cxx11::basic_string<char, > > std::char_traits<char>, > > mempool::pool_allocator<(mempool::pool_index_t)18, char> >, > > ceph::buffer::ptr, std::map<std::__cxx11::basic_string<char, > > std::char_traits<char>, > > mempool::pool_allocator<(mempool::pool_index_t)18, char> >, > > ceph::buffer::ptr, std::less<std::__cxx11::basic_string<char, > > std::char_traits<char>, mempool::po > > ol_allocator<(mempool::pool_index_t)18, char> > >, > > mempool::pool_allocator<(mempool::pool_index_t)18, > > std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, > > mempool::pool_allocator<(mempool::pool_index_t)18, char> > const, > > ceph::buffer::ptr> > > >::decode(ceph::buffer::list::iterator&)+0x122) > > [0x66b202] > > 12: (EMetaBlob::fullbit::decode(ceph::buffer::list::iterator&)+0xe3) [0x7aa633] > > 13: /usr/bin/ceph-mds() [0x7aeae6] > > 14: (EMetaBlob::replay(MDSRank*, LogSegment*, MDSlaveUpdate*)+0x3d36) [0x7b4fa6] > > 15: (EImportStart::replay(MDSRank*)+0x5b) [0x7bbb1b] > > 16: (MDLog::_replay_thread()+0x864) [0x760024] > > 17: (MDLog::ReplayThread::entry()+0xd) [0x4f487d] > > 18: (()+0x76ba) [0x7f543cf1f6ba] > > 19: (clone()+0x6d) [0x7f543c74841d] > > 2019-01-28 13:10:02.202 7f542d825700 -1 *** Caught signal (Aborted) ** > > in thread 7f542d825700 thread_name:md_log_replay > > > > ceph version 13.2.4 (b10be4d44915a4d78a8e06aa31919e74927b142e) mimic (stable) > > 1: /usr/bin/ceph-mds() [0x7cc8a0] > > 2: (()+0x11390) [0x7f543cf29390] > > 3: (gsignal()+0x38) [0x7f543c676428] > > 4: (abort()+0x16a) [0x7f543c67802a] > > 5: (__gnu_cxx::__verbose_terminate_handler()+0x135) [0x7f543dae6e65] > > 6: (__cxxabiv1::__terminate(void (*)())+0x6) [0x7f543dadae46] > > 7: (()+0x734e91) [0x7f543dadae91] > > 8: (()+0x7410a4) [0x7f543dae70a4] > > 9: (ceph::buffer::create_aligned_in_mempool(unsigned int, unsigned > > int, int)+0x258) [0x7f543d63b348] > > 10: (ceph::buffer::list::iterator_impl<false>::copy_shallow(unsigned > > int, ceph::buffer::ptr&)+0xa2) [0x7f543d640ee2] > > 11: (compact_map_base<std::__cxx11::basic_string<char, > > std::char_traits<char>, > > mempool::pool_allocator<(mempool::pool_index_t)18, char> >, > > ceph::buffer::ptr, std::map<std::__cxx11::basic_string<char, > > std::char_traits<char>, > > mempool::pool_allocator<(mempool::pool_index_t)18, char> >, > > ceph::buffer::ptr, std::less<std::__cxx11::basic_string<char, > > std::char_traits<char>, mempool::po > > ol_allocator<(mempool::pool_index_t)18, char> > >, > > mempool::pool_allocator<(mempool::pool_index_t)18, > > std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, > > mempool::pool_allocator<(mempool::pool_index_t)18, char> > const, > > ceph::buffer::ptr> > > >::decode(ceph::buffer::list::iterator&)+0x122) > > [0x66b202] > > 12: (EMetaBlob::fullbit::decode(ceph::buffer::list::iterator&)+0xe3) [0x7aa633] > > 13: /usr/bin/ceph-mds() [0x7aeae6] > > 14: (EMetaBlob::replay(MDSRank*, LogSegment*, MDSlaveUpdate*)+0x3d36) [0x7b4fa6] > > 15: (EImportStart::replay(MDSRank*)+0x5b) [0x7bbb1b] > > 16: (MDLog::_replay_thread()+0x864) [0x760024] > > 17: (MDLog::ReplayThread::entry()+0xd) [0x4f487d] > > 18: (()+0x76ba) [0x7f543cf1f6ba] > > 19: (clone()+0x6d) [0x7f543c74841d] > > NOTE: a copy of the executable, or `objdump -rdS <executable>` is > > needed to interpret this. > > > > 0> 2019-01-28 13:10:02.202 7f542d825700 -1 *** Caught signal > > (Aborted) ** > > in thread 7f542d825700 thread_name:md_log_replay > > > > ceph version 13.2.4 (b10be4d44915a4d78a8e06aa31919e74927b142e) mimic (stable) > > 1: /usr/bin/ceph-mds() [0x7cc8a0] > > 2: (()+0x11390) [0x7f543cf29390] > > 3: (gsignal()+0x38) [0x7f543c676428] > > 4: (abort()+0x16a) [0x7f543c67802a] > > 5: (__gnu_cxx::__verbose_terminate_handler()+0x135) [0x7f543dae6e65] > > 6: (__cxxabiv1::__terminate(void (*)())+0x6) [0x7f543dadae46] > > 7: (()+0x734e91) [0x7f543dadae91] > > 8: (()+0x7410a4) [0x7f543dae70a4] > > 9: (ceph::buffer::create_aligned_in_mempool(unsigned int, unsigned > > int, int)+0x258) [0x7f543d63b348] > > 10: (ceph::buffer::list::iterator_impl<false>::copy_shallow(unsigned > > int, ceph::buffer::ptr&)+0xa2) [0x7f543d640ee2] > > 11: (compact_map_base<std::__cxx11::basic_string<char, > > std::char_traits<char>, > > mempool::pool_allocator<(mempool::pool_index_t)18, char> >, > > ceph::buffer::ptr, std::map<std::__cxx11::basic_string<char, > > std::char_traits<char>, > > mempool::pool_allocator<(mempool::pool_index_t)18, char> >, > > ceph::buffer::ptr, std::less<std::__cxx11::basic_string<char, > > std::char_traits<char>, mempool::po > > ol_allocator<(mempool::pool_index_t)18, char> > >, > > mempool::pool_allocator<(mempool::pool_index_t)18, > > std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, > > mempool::pool_allocator<(mempool::pool_index_t)18, char> > const, > > ceph::buffer::ptr> > > >::decode(ceph::buffer::list::iterator&)+0x122) > > [0x66b202] > > 12: (EMetaBlob::fullbit::decode(ceph::buffer::list::iterator&)+0xe3) [0x7aa633] > > 13: /usr/bin/ceph-mds() [0x7aeae6] > > 14: (EMetaBlob::replay(MDSRank*, LogSegment*, MDSlaveUpdate*)+0x3d36) [0x7b4fa6] > > 15: (EImportStart::replay(MDSRank*)+0x5b) [0x7bbb1b] > > 16: (MDLog::_replay_thread()+0x864) [0x760024] > > 17: (MDLog::ReplayThread::entry()+0xd) [0x4f487d] > > 18: (()+0x76ba) [0x7f543cf1f6ba] > > 19: (clone()+0x6d) [0x7f543c74841d] > > NOTE: a copy of the executable, or `objdump -rdS <executable>` is > > needed to interpret this. > > > > Aborted > > _______________________________________________ > > ceph-users mailing list > > ceph-users@xxxxxxxxxxxxxx > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com