Re: ceph-fs crashed after upgrade to 13.2.4

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

we upgraded from 12.2.8 to 13.2.4 (ubuntu 16.04)

- after the upgrade (~2 hours after the upgrade) the replay-mds keep
crashing so we tryed to restart all MDS than the filesystem was in
'failed' state and no MDS is in "activ"-state
- we than tryed to downgrade the MDS to 13.2.1 but had no luck.
    ceph version 13.2.1 (5533ecdc0fda920179d7ad84e0aa65a127b20d77)
mimic (stable), process ceph-mds, pid 6843
    ...
    log_channel(cluster) log [ERR] : corrupt sessionmap values:
buffer::malformed_input: void
session_info_t::decode(ceph::buffer::list::iterator&) no longer
understand old encoding version 6 < 7

so we "fixed" the filsystem by following the manual
(http://docs.ceph.com/docs/master/cephfs/disaster-recovery-experts/)

systemctl stop ceph-mds@`hostname -s`.service
cephfs-journal-tool journal export backup.bin
cephfs-journal-tool --help
cephfs-journal-tool --rank=cephfs:all journal export backup.bin
cephfs-journal-tool --rank=cephfs:all event recover_dentries summary
cephfs-journal-tool --rank=cephfs:all journal reset
cephfs-table-tool all reset session
systemctl restart ceph-mds@mds05.service
systemctl stop ceph-mds@`hostname -s`.service
ceph fs set cephfs down true
ceph fs set cephfs down false
cephfs-table-tool all reset session
cephfs-table-tool all reset inode
cephfs-table-tool all reset snap
systemctl restart ceph-mds@mds05.service
less /var/log/ceph/ceph-mds.mds05.log
systemctl stop ceph-mds@`hostname -s`.service
ceph fs reset cephfs --yes-i-really-mean-it
systemctl restart ceph-mds@mds05.service
less /var/log/ceph/ceph-mds.mds05.log

after the reset we was able to use the cephfs but we still have some errors

...
log [ERR] : unmatched rstat rbytes on single dirfrag 0x10002253f4e,
inode has n(v9 rc2019-01-28 14:42:47.371612 b1158 71=8+63), dirfrag
has n(v9 rc2019-01-28 14:42:47.371612 b1004 65=7+58)
log [ERR] : unmatched fragstat size on single dirfrag 0x10002253db6,
inode has f(v0 m2019-01-28 14:46:47.983292 59=0+59), dirfrag has f(v0
m2019-01-28 14:46:47.983292 58=0+58)
log [ERR] : unmatched rstat rbytes on single dirfrag 0x10002253db6,
inode has n(v11 rc2019-01-28 14:46:47.983292 b1478 71=11+60), dirfrag
has n(v11 rc2019-01-28 14:46:47.983292 b1347 68=10+58)
...

any help is welcome,
Ansgar

Am Di., 29. Jan. 2019 um 12:32 Uhr schrieb Yan, Zheng <ukernel@xxxxxxxxx>:
>
> upgraded from which version?  have you try downgrade ceph-mds to old version?
>
>
> On Mon, Jan 28, 2019 at 9:20 PM Ansgar Jazdzewski
> <a.jazdzewski@xxxxxxxxxxxxxx> wrote:
> >
> > hi folks we need some help with our cephfs, all mds keep crashing
> >
> > starting mds.mds02 at -
> > terminate called after throwing an instance of
> > 'ceph::buffer::bad_alloc'
> >  what():  buffer::bad_alloc
> > *** Caught signal (Aborted) **
> > in thread 7f542d825700 thread_name:md_log_replay
> > ceph version 13.2.4 (b10be4d44915a4d78a8e06aa31919e74927b142e) mimic (stable)
> > 1: /usr/bin/ceph-mds() [0x7cc8a0]
> > 2: (()+0x11390) [0x7f543cf29390]
> > 3: (gsignal()+0x38) [0x7f543c676428]
> > 4: (abort()+0x16a) [0x7f543c67802a]
> > 5: (__gnu_cxx::__verbose_terminate_handler()+0x135) [0x7f543dae6e65]
> > 6: (__cxxabiv1::__terminate(void (*)())+0x6) [0x7f543dadae46]
> > 7: (()+0x734e91) [0x7f543dadae91]
> > 8: (()+0x7410a4) [0x7f543dae70a4]
> > 9: (ceph::buffer::create_aligned_in_mempool(unsigned int, unsigned
> > int, int)+0x258) [0x7f543d63b348]
> > 10: (ceph::buffer::list::iterator_impl<false>::copy_shallow(unsigned
> > int, ceph::buffer::ptr&)+0xa2) [0x7f543d640ee2]
> > 11: (compact_map_base<std::__cxx11::basic_string<char,
> > std::char_traits<char>,
> > mempool::pool_allocator<(mempool::pool_index_t)18, char> >,
> > ceph::buffer::ptr, std::map<std::__cxx11::basic_string<char,
> > std::char_traits<char>,
> > mempool::pool_allocator<(mempool::pool_index_t)18, char> >,
> > ceph::buffer::ptr, std::less<std::__cxx11::basic_string<char,
> > std::char_traits<char>, mempool::po
> > ol_allocator<(mempool::pool_index_t)18, char> > >,
> > mempool::pool_allocator<(mempool::pool_index_t)18,
> > std::pair<std::__cxx11::basic_string<char, std::char_traits<char>,
> > mempool::pool_allocator<(mempool::pool_index_t)18, char> > const,
> > ceph::buffer::ptr> > > >::decode(ceph::buffer::list::iterator&)+0x122)
> > [0x66b202]
> > 12: (EMetaBlob::fullbit::decode(ceph::buffer::list::iterator&)+0xe3) [0x7aa633]
> > 13: /usr/bin/ceph-mds() [0x7aeae6]
> > 14: (EMetaBlob::replay(MDSRank*, LogSegment*, MDSlaveUpdate*)+0x3d36) [0x7b4fa6]
> > 15: (EImportStart::replay(MDSRank*)+0x5b) [0x7bbb1b]
> > 16: (MDLog::_replay_thread()+0x864) [0x760024]
> > 17: (MDLog::ReplayThread::entry()+0xd) [0x4f487d]
> > 18: (()+0x76ba) [0x7f543cf1f6ba]
> > 19: (clone()+0x6d) [0x7f543c74841d]
> > 2019-01-28 13:10:02.202 7f542d825700 -1 *** Caught signal (Aborted) **
> > in thread 7f542d825700 thread_name:md_log_replay
> >
> > ceph version 13.2.4 (b10be4d44915a4d78a8e06aa31919e74927b142e) mimic (stable)
> > 1: /usr/bin/ceph-mds() [0x7cc8a0]
> > 2: (()+0x11390) [0x7f543cf29390]
> > 3: (gsignal()+0x38) [0x7f543c676428]
> > 4: (abort()+0x16a) [0x7f543c67802a]
> > 5: (__gnu_cxx::__verbose_terminate_handler()+0x135) [0x7f543dae6e65]
> > 6: (__cxxabiv1::__terminate(void (*)())+0x6) [0x7f543dadae46]
> > 7: (()+0x734e91) [0x7f543dadae91]
> > 8: (()+0x7410a4) [0x7f543dae70a4]
> > 9: (ceph::buffer::create_aligned_in_mempool(unsigned int, unsigned
> > int, int)+0x258) [0x7f543d63b348]
> > 10: (ceph::buffer::list::iterator_impl<false>::copy_shallow(unsigned
> > int, ceph::buffer::ptr&)+0xa2) [0x7f543d640ee2]
> > 11: (compact_map_base<std::__cxx11::basic_string<char,
> > std::char_traits<char>,
> > mempool::pool_allocator<(mempool::pool_index_t)18, char> >,
> > ceph::buffer::ptr, std::map<std::__cxx11::basic_string<char,
> > std::char_traits<char>,
> > mempool::pool_allocator<(mempool::pool_index_t)18, char> >,
> > ceph::buffer::ptr, std::less<std::__cxx11::basic_string<char,
> > std::char_traits<char>, mempool::po
> > ol_allocator<(mempool::pool_index_t)18, char> > >,
> > mempool::pool_allocator<(mempool::pool_index_t)18,
> > std::pair<std::__cxx11::basic_string<char, std::char_traits<char>,
> > mempool::pool_allocator<(mempool::pool_index_t)18, char> > const,
> > ceph::buffer::ptr> > > >::decode(ceph::buffer::list::iterator&)+0x122)
> > [0x66b202]
> > 12: (EMetaBlob::fullbit::decode(ceph::buffer::list::iterator&)+0xe3) [0x7aa633]
> > 13: /usr/bin/ceph-mds() [0x7aeae6]
> > 14: (EMetaBlob::replay(MDSRank*, LogSegment*, MDSlaveUpdate*)+0x3d36) [0x7b4fa6]
> > 15: (EImportStart::replay(MDSRank*)+0x5b) [0x7bbb1b]
> > 16: (MDLog::_replay_thread()+0x864) [0x760024]
> > 17: (MDLog::ReplayThread::entry()+0xd) [0x4f487d]
> > 18: (()+0x76ba) [0x7f543cf1f6ba]
> > 19: (clone()+0x6d) [0x7f543c74841d]
> > NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> > needed to interpret this.
> >
> >     0> 2019-01-28 13:10:02.202 7f542d825700 -1 *** Caught signal
> > (Aborted) **
> > in thread 7f542d825700 thread_name:md_log_replay
> >
> > ceph version 13.2.4 (b10be4d44915a4d78a8e06aa31919e74927b142e) mimic (stable)
> > 1: /usr/bin/ceph-mds() [0x7cc8a0]
> > 2: (()+0x11390) [0x7f543cf29390]
> > 3: (gsignal()+0x38) [0x7f543c676428]
> > 4: (abort()+0x16a) [0x7f543c67802a]
> > 5: (__gnu_cxx::__verbose_terminate_handler()+0x135) [0x7f543dae6e65]
> > 6: (__cxxabiv1::__terminate(void (*)())+0x6) [0x7f543dadae46]
> > 7: (()+0x734e91) [0x7f543dadae91]
> > 8: (()+0x7410a4) [0x7f543dae70a4]
> > 9: (ceph::buffer::create_aligned_in_mempool(unsigned int, unsigned
> > int, int)+0x258) [0x7f543d63b348]
> > 10: (ceph::buffer::list::iterator_impl<false>::copy_shallow(unsigned
> > int, ceph::buffer::ptr&)+0xa2) [0x7f543d640ee2]
> > 11: (compact_map_base<std::__cxx11::basic_string<char,
> > std::char_traits<char>,
> > mempool::pool_allocator<(mempool::pool_index_t)18, char> >,
> > ceph::buffer::ptr, std::map<std::__cxx11::basic_string<char,
> > std::char_traits<char>,
> > mempool::pool_allocator<(mempool::pool_index_t)18, char> >,
> > ceph::buffer::ptr, std::less<std::__cxx11::basic_string<char,
> > std::char_traits<char>, mempool::po
> > ol_allocator<(mempool::pool_index_t)18, char> > >,
> > mempool::pool_allocator<(mempool::pool_index_t)18,
> > std::pair<std::__cxx11::basic_string<char, std::char_traits<char>,
> > mempool::pool_allocator<(mempool::pool_index_t)18, char> > const,
> > ceph::buffer::ptr> > > >::decode(ceph::buffer::list::iterator&)+0x122)
> > [0x66b202]
> > 12: (EMetaBlob::fullbit::decode(ceph::buffer::list::iterator&)+0xe3) [0x7aa633]
> > 13: /usr/bin/ceph-mds() [0x7aeae6]
> > 14: (EMetaBlob::replay(MDSRank*, LogSegment*, MDSlaveUpdate*)+0x3d36) [0x7b4fa6]
> > 15: (EImportStart::replay(MDSRank*)+0x5b) [0x7bbb1b]
> > 16: (MDLog::_replay_thread()+0x864) [0x760024]
> > 17: (MDLog::ReplayThread::entry()+0xd) [0x4f487d]
> > 18: (()+0x76ba) [0x7f543cf1f6ba]
> > 19: (clone()+0x6d) [0x7f543c74841d]
> > NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> > needed to interpret this.
> >
> > Aborted
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux