Mds crash at cscs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dear all,

We have started to use more intensively cephfs for some wlcg related workload.
We have 3 active mds instances spread on 3 servers, mds_cache_memory_limit=12G, most of the other configs are default ones.
One of them has crashed this night leaving the log below.
Do you have any hint on what could be the cause and how to avoid it?

Regards,

Giuseppe

[root@naret-monitor03 ~]# journalctl -u ceph-63334166-d991-11eb-99de-40a6b72108d0@mds.cephfs.naret-monitor03.lqppte.service
...
Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific >
Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  1: /lib64/libpthread.so.0(+0x12ce0) [0x7fe291e4fce0]
Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  2: abort()
Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  3: /lib64/libstdc++.so.6(+0x987ba) [0x7fe2912567ba]
Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  4: /lib64/libstdc++.so.6(+0x9653c) [0x7fe29125453c]
Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  5: /lib64/libstdc++.so.6(+0x95559) [0x7fe291253559]
Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  6: __gxx_personality_v0()
Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  7: /lib64/libgcc_s.so.1(+0x10b03) [0x7fe290c34b03]
Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  8: _Unwind_Resume()
Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  9: /usr/bin/ceph-mds(+0x18c104) [0x5638351e7104]
Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  10: /lib64/libpthread.so.0(+0x12ce0) [0x7fe291e4fce0]
Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  11: gsignal()
Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  12: abort()
Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  13: /lib64/libstdc++.so.6(+0x9009b) [0x7fe29124e09b]
Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  14: /lib64/libstdc++.so.6(+0x9653c) [0x7fe29125453c]
Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  15: /lib64/libstdc++.so.6(+0x96597) [0x7fe291254597]
Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  16: /lib64/libstdc++.so.6(+0x967f8) [0x7fe2912547f8]
Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  17: /lib64/libtcmalloc.so.4(+0x19fa4) [0x7fe29bae6fa4]
Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  18: (tcmalloc::ThreadCache::FetchFromCentralCache(unsigned int, int, vo>
Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  19: (std::shared_ptr<inode_t<mempool::mds_co::pool_allocator> > InodeSt>
Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  20: (CInode::_decode_base(ceph::buffer::v15_2_0::list::iterator_impl<tr>
Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  21: (CInode::decode_import(ceph::buffer::v15_2_0::list::iterator_impl<t>
Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  22: (Migrator::decode_import_inode(CDentry*, ceph::buffer::v15_2_0::lis>
Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  23: (Migrator::decode_import_dir(ceph::buffer::v15_2_0::list::iterator_>
Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  24: (Migrator::handle_export_dir(boost::intrusive_ptr<MExportDir const>>
Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  25: (Migrator::dispatch(boost::intrusive_ptr<Message const> const&)+0x1>
Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  26: (MDSRank::handle_message(boost::intrusive_ptr<Message const> const&>
Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  27: (MDSRank::_dispatch(boost::intrusive_ptr<Message const> const&, boo>
Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  28: (MDSRankDispatcher::ms_dispatch(boost::intrusive_ptr<Message const>>
Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  29: (MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x10>
Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  30: (DispatchQueue::entry()+0x126a) [0x7fe2930a5aba]
Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  31: (DispatchQueue::DispatchThread::entry()+0x11) [0x7fe2931575d1]
Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  32: /lib64/libpthread.so.0(+0x81cf) [0x7fe291e451cf]
Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  33: clone()
Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  NOTE: a copy of the executable, or `objdump -rdS <executable>` is neede>
Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: --- begin dump of recent events ---
Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: terminate called recursively
Jan 19 04:49:43 naret-monitor03 systemd[1]: ceph-63334166-d991-11eb-99de-40a6b72108d0@mds.cephfs.naret-monitor03.lqppte.service: Main process exited, code=exited, status=127/n/a
Jan 19 04:49:43 naret-monitor03 systemd[1]: ceph-63334166-d991-11eb-99de-40a6b72108d0@mds.cephfs.naret-monitor03.lqppte.service: Failed with result 'exit-code'.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux