Re: ceph-fs crashed after upgrade to 13.2.4

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jan 29, 2019 at 8:30 PM Ansgar Jazdzewski
<a.jazdzewski@xxxxxxxxxxxxxx> wrote:
>
> Hi,
>
> we upgraded from 12.2.8 to 13.2.4 (ubuntu 16.04)
>
> - after the upgrade (~2 hours after the upgrade) the replay-mds keep
> crashing so we tryed to restart all MDS than the filesystem was in
> 'failed' state and no MDS is in "activ"-state
> - we than tryed to downgrade the MDS to 13.2.1 but had no luck.
>     ceph version 13.2.1 (5533ecdc0fda920179d7ad84e0aa65a127b20d77)
> mimic (stable), process ceph-mds, pid 6843
>     ...
>     log_channel(cluster) log [ERR] : corrupt sessionmap values:
> buffer::malformed_input: void
> session_info_t::decode(ceph::buffer::list::iterator&) no longer
> understand old encoding version 6 < 7
>
> so we "fixed" the filsystem by following the manual
> ()
>
> systemctl stop ceph-mds@`hostname -s`.service
> cephfs-journal-tool journal export backup.bin
> cephfs-journal-tool --help
> cephfs-journal-tool --rank=cephfs:all journal export backup.bin
> cephfs-journal-tool --rank=cephfs:all event recover_dentries summary
> cephfs-journal-tool --rank=cephfs:all journal reset
> cephfs-table-tool all reset session
> systemctl restart ceph-mds@mds05.service
> systemctl stop ceph-mds@`hostname -s`.service
> ceph fs set cephfs down true
> ceph fs set cephfs down false
> cephfs-table-tool all reset session
> cephfs-table-tool all reset inode
> cephfs-table-tool all reset snap

you have reset inode table. it will cause further damage. you should
stop using your fs (umount your client),  immediately run ' ceph tell
mds.xx scrub start / recursive repair'


> systemctl restart ceph-mds@mds05.service
> less /var/log/ceph/ceph-mds.mds05.log
> systemctl stop ceph-mds@`hostname -s`.service
> ceph fs reset cephfs --yes-i-really-mean-it
> systemctl restart ceph-mds@mds05.service
> less /var/log/ceph/ceph-mds.mds05.log
>
> after the reset we was able to use the cephfs but we still have some errors
>
> ...
> log [ERR] : unmatched rstat rbytes on single dirfrag 0x10002253f4e,
> inode has n(v9 rc2019-01-28 14:42:47.371612 b1158 71=8+63), dirfrag
> has n(v9 rc2019-01-28 14:42:47.371612 b1004 65=7+58)
> log [ERR] : unmatched fragstat size on single dirfrag 0x10002253db6,
> inode has f(v0 m2019-01-28 14:46:47.983292 59=0+59), dirfrag has f(v0
> m2019-01-28 14:46:47.983292 58=0+58)
> log [ERR] : unmatched rstat rbytes on single dirfrag 0x10002253db6,
> inode has n(v11 rc2019-01-28 14:46:47.983292 b1478 71=11+60), dirfrag
> has n(v11 rc2019-01-28 14:46:47.983292 b1347 68=10+58)
> ...
>
> any help is welcome,
> Ansgar
>
> Am Di., 29. Jan. 2019 um 12:32 Uhr schrieb Yan, Zheng <ukernel@xxxxxxxxx>:
> >
> > upgraded from which version?  have you try downgrade ceph-mds to old version?
> >
> >
> > On Mon, Jan 28, 2019 at 9:20 PM Ansgar Jazdzewski
> > <a.jazdzewski@xxxxxxxxxxxxxx> wrote:
> > >
> > > hi folks we need some help with our cephfs, all mds keep crashing
> > >
> > > starting mds.mds02 at -
> > > terminate called after throwing an instance of
> > > 'ceph::buffer::bad_alloc'
> > >  what():  buffer::bad_alloc
> > > *** Caught signal (Aborted) **
> > > in thread 7f542d825700 thread_name:md_log_replay
> > > ceph version 13.2.4 (b10be4d44915a4d78a8e06aa31919e74927b142e) mimic (stable)
> > > 1: /usr/bin/ceph-mds() [0x7cc8a0]
> > > 2: (()+0x11390) [0x7f543cf29390]
> > > 3: (gsignal()+0x38) [0x7f543c676428]
> > > 4: (abort()+0x16a) [0x7f543c67802a]
> > > 5: (__gnu_cxx::__verbose_terminate_handler()+0x135) [0x7f543dae6e65]
> > > 6: (__cxxabiv1::__terminate(void (*)())+0x6) [0x7f543dadae46]
> > > 7: (()+0x734e91) [0x7f543dadae91]
> > > 8: (()+0x7410a4) [0x7f543dae70a4]
> > > 9: (ceph::buffer::create_aligned_in_mempool(unsigned int, unsigned
> > > int, int)+0x258) [0x7f543d63b348]
> > > 10: (ceph::buffer::list::iterator_impl<false>::copy_shallow(unsigned
> > > int, ceph::buffer::ptr&)+0xa2) [0x7f543d640ee2]
> > > 11: (compact_map_base<std::__cxx11::basic_string<char,
> > > std::char_traits<char>,
> > > mempool::pool_allocator<(mempool::pool_index_t)18, char> >,
> > > ceph::buffer::ptr, std::map<std::__cxx11::basic_string<char,
> > > std::char_traits<char>,
> > > mempool::pool_allocator<(mempool::pool_index_t)18, char> >,
> > > ceph::buffer::ptr, std::less<std::__cxx11::basic_string<char,
> > > std::char_traits<char>, mempool::po
> > > ol_allocator<(mempool::pool_index_t)18, char> > >,
> > > mempool::pool_allocator<(mempool::pool_index_t)18,
> > > std::pair<std::__cxx11::basic_string<char, std::char_traits<char>,
> > > mempool::pool_allocator<(mempool::pool_index_t)18, char> > const,
> > > ceph::buffer::ptr> > > >::decode(ceph::buffer::list::iterator&)+0x122)
> > > [0x66b202]
> > > 12: (EMetaBlob::fullbit::decode(ceph::buffer::list::iterator&)+0xe3) [0x7aa633]
> > > 13: /usr/bin/ceph-mds() [0x7aeae6]
> > > 14: (EMetaBlob::replay(MDSRank*, LogSegment*, MDSlaveUpdate*)+0x3d36) [0x7b4fa6]
> > > 15: (EImportStart::replay(MDSRank*)+0x5b) [0x7bbb1b]
> > > 16: (MDLog::_replay_thread()+0x864) [0x760024]
> > > 17: (MDLog::ReplayThread::entry()+0xd) [0x4f487d]
> > > 18: (()+0x76ba) [0x7f543cf1f6ba]
> > > 19: (clone()+0x6d) [0x7f543c74841d]
> > > 2019-01-28 13:10:02.202 7f542d825700 -1 *** Caught signal (Aborted) **
> > > in thread 7f542d825700 thread_name:md_log_replay
> > >
> > > ceph version 13.2.4 (b10be4d44915a4d78a8e06aa31919e74927b142e) mimic (stable)
> > > 1: /usr/bin/ceph-mds() [0x7cc8a0]
> > > 2: (()+0x11390) [0x7f543cf29390]
> > > 3: (gsignal()+0x38) [0x7f543c676428]
> > > 4: (abort()+0x16a) [0x7f543c67802a]
> > > 5: (__gnu_cxx::__verbose_terminate_handler()+0x135) [0x7f543dae6e65]
> > > 6: (__cxxabiv1::__terminate(void (*)())+0x6) [0x7f543dadae46]
> > > 7: (()+0x734e91) [0x7f543dadae91]
> > > 8: (()+0x7410a4) [0x7f543dae70a4]
> > > 9: (ceph::buffer::create_aligned_in_mempool(unsigned int, unsigned
> > > int, int)+0x258) [0x7f543d63b348]
> > > 10: (ceph::buffer::list::iterator_impl<false>::copy_shallow(unsigned
> > > int, ceph::buffer::ptr&)+0xa2) [0x7f543d640ee2]
> > > 11: (compact_map_base<std::__cxx11::basic_string<char,
> > > std::char_traits<char>,
> > > mempool::pool_allocator<(mempool::pool_index_t)18, char> >,
> > > ceph::buffer::ptr, std::map<std::__cxx11::basic_string<char,
> > > std::char_traits<char>,
> > > mempool::pool_allocator<(mempool::pool_index_t)18, char> >,
> > > ceph::buffer::ptr, std::less<std::__cxx11::basic_string<char,
> > > std::char_traits<char>, mempool::po
> > > ol_allocator<(mempool::pool_index_t)18, char> > >,
> > > mempool::pool_allocator<(mempool::pool_index_t)18,
> > > std::pair<std::__cxx11::basic_string<char, std::char_traits<char>,
> > > mempool::pool_allocator<(mempool::pool_index_t)18, char> > const,
> > > ceph::buffer::ptr> > > >::decode(ceph::buffer::list::iterator&)+0x122)
> > > [0x66b202]
> > > 12: (EMetaBlob::fullbit::decode(ceph::buffer::list::iterator&)+0xe3) [0x7aa633]
> > > 13: /usr/bin/ceph-mds() [0x7aeae6]
> > > 14: (EMetaBlob::replay(MDSRank*, LogSegment*, MDSlaveUpdate*)+0x3d36) [0x7b4fa6]
> > > 15: (EImportStart::replay(MDSRank*)+0x5b) [0x7bbb1b]
> > > 16: (MDLog::_replay_thread()+0x864) [0x760024]
> > > 17: (MDLog::ReplayThread::entry()+0xd) [0x4f487d]
> > > 18: (()+0x76ba) [0x7f543cf1f6ba]
> > > 19: (clone()+0x6d) [0x7f543c74841d]
> > > NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> > > needed to interpret this.
> > >
> > >     0> 2019-01-28 13:10:02.202 7f542d825700 -1 *** Caught signal
> > > (Aborted) **
> > > in thread 7f542d825700 thread_name:md_log_replay
> > >
> > > ceph version 13.2.4 (b10be4d44915a4d78a8e06aa31919e74927b142e) mimic (stable)
> > > 1: /usr/bin/ceph-mds() [0x7cc8a0]
> > > 2: (()+0x11390) [0x7f543cf29390]
> > > 3: (gsignal()+0x38) [0x7f543c676428]
> > > 4: (abort()+0x16a) [0x7f543c67802a]
> > > 5: (__gnu_cxx::__verbose_terminate_handler()+0x135) [0x7f543dae6e65]
> > > 6: (__cxxabiv1::__terminate(void (*)())+0x6) [0x7f543dadae46]
> > > 7: (()+0x734e91) [0x7f543dadae91]
> > > 8: (()+0x7410a4) [0x7f543dae70a4]
> > > 9: (ceph::buffer::create_aligned_in_mempool(unsigned int, unsigned
> > > int, int)+0x258) [0x7f543d63b348]
> > > 10: (ceph::buffer::list::iterator_impl<false>::copy_shallow(unsigned
> > > int, ceph::buffer::ptr&)+0xa2) [0x7f543d640ee2]
> > > 11: (compact_map_base<std::__cxx11::basic_string<char,
> > > std::char_traits<char>,
> > > mempool::pool_allocator<(mempool::pool_index_t)18, char> >,
> > > ceph::buffer::ptr, std::map<std::__cxx11::basic_string<char,
> > > std::char_traits<char>,
> > > mempool::pool_allocator<(mempool::pool_index_t)18, char> >,
> > > ceph::buffer::ptr, std::less<std::__cxx11::basic_string<char,
> > > std::char_traits<char>, mempool::po
> > > ol_allocator<(mempool::pool_index_t)18, char> > >,
> > > mempool::pool_allocator<(mempool::pool_index_t)18,
> > > std::pair<std::__cxx11::basic_string<char, std::char_traits<char>,
> > > mempool::pool_allocator<(mempool::pool_index_t)18, char> > const,
> > > ceph::buffer::ptr> > > >::decode(ceph::buffer::list::iterator&)+0x122)
> > > [0x66b202]
> > > 12: (EMetaBlob::fullbit::decode(ceph::buffer::list::iterator&)+0xe3) [0x7aa633]
> > > 13: /usr/bin/ceph-mds() [0x7aeae6]
> > > 14: (EMetaBlob::replay(MDSRank*, LogSegment*, MDSlaveUpdate*)+0x3d36) [0x7b4fa6]
> > > 15: (EImportStart::replay(MDSRank*)+0x5b) [0x7bbb1b]
> > > 16: (MDLog::_replay_thread()+0x864) [0x760024]
> > > 17: (MDLog::ReplayThread::entry()+0xd) [0x4f487d]
> > > 18: (()+0x76ba) [0x7f543cf1f6ba]
> > > 19: (clone()+0x6d) [0x7f543c74841d]
> > > NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> > > needed to interpret this.
> > >
> > > Aborted
> > > _______________________________________________
> > > ceph-users mailing list
> > > ceph-users@xxxxxxxxxxxxxx
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux