Dear all, We have started to use more intensively cephfs for some wlcg related workload. We have 3 active mds instances spread on 3 servers, mds_cache_memory_limit=12G, most of the other configs are default ones. One of them has crashed this night leaving the log below. Do you have any hint on what could be the cause and how to avoid it? Regards, Giuseppe [root@naret-monitor03 ~]# journalctl -u ceph-63334166-d991-11eb-99de-40a6b72108d0@mds.cephfs.naret-monitor03.lqppte.service ... Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific > Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 1: /lib64/libpthread.so.0(+0x12ce0) [0x7fe291e4fce0] Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 2: abort() Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 3: /lib64/libstdc++.so.6(+0x987ba) [0x7fe2912567ba] Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 4: /lib64/libstdc++.so.6(+0x9653c) [0x7fe29125453c] Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 5: /lib64/libstdc++.so.6(+0x95559) [0x7fe291253559] Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 6: __gxx_personality_v0() Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 7: /lib64/libgcc_s.so.1(+0x10b03) [0x7fe290c34b03] Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 8: _Unwind_Resume() Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 9: /usr/bin/ceph-mds(+0x18c104) [0x5638351e7104] Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 10: /lib64/libpthread.so.0(+0x12ce0) [0x7fe291e4fce0] Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 11: gsignal() Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 12: abort() Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 13: /lib64/libstdc++.so.6(+0x9009b) [0x7fe29124e09b] Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 14: /lib64/libstdc++.so.6(+0x9653c) [0x7fe29125453c] Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 15: /lib64/libstdc++.so.6(+0x96597) [0x7fe291254597] Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 16: /lib64/libstdc++.so.6(+0x967f8) [0x7fe2912547f8] Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 17: /lib64/libtcmalloc.so.4(+0x19fa4) [0x7fe29bae6fa4] Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 18: (tcmalloc::ThreadCache::FetchFromCentralCache(unsigned int, int, vo> Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 19: (std::shared_ptr<inode_t<mempool::mds_co::pool_allocator> > InodeSt> Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 20: (CInode::_decode_base(ceph::buffer::v15_2_0::list::iterator_impl<tr> Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 21: (CInode::decode_import(ceph::buffer::v15_2_0::list::iterator_impl<t> Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 22: (Migrator::decode_import_inode(CDentry*, ceph::buffer::v15_2_0::lis> Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 23: (Migrator::decode_import_dir(ceph::buffer::v15_2_0::list::iterator_> Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 24: (Migrator::handle_export_dir(boost::intrusive_ptr<MExportDir const>> Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 25: (Migrator::dispatch(boost::intrusive_ptr<Message const> const&)+0x1> Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 26: (MDSRank::handle_message(boost::intrusive_ptr<Message const> const&> Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 27: (MDSRank::_dispatch(boost::intrusive_ptr<Message const> const&, boo> Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 28: (MDSRankDispatcher::ms_dispatch(boost::intrusive_ptr<Message const>> Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 29: (MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x10> Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 30: (DispatchQueue::entry()+0x126a) [0x7fe2930a5aba] Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 31: (DispatchQueue::DispatchThread::entry()+0x11) [0x7fe2931575d1] Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 32: /lib64/libpthread.so.0(+0x81cf) [0x7fe291e451cf] Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 33: clone() Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is neede> Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: --- begin dump of recent events --- Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: terminate called recursively Jan 19 04:49:43 naret-monitor03 systemd[1]: ceph-63334166-d991-11eb-99de-40a6b72108d0@mds.cephfs.naret-monitor03.lqppte.service: Main process exited, code=exited, status=127/n/a Jan 19 04:49:43 naret-monitor03 systemd[1]: ceph-63334166-d991-11eb-99de-40a6b72108d0@mds.cephfs.naret-monitor03.lqppte.service: Failed with result 'exit-code'. _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx