On Thu, Jan 19, 2023 at 9:07 PM Lo Re Giuseppe <giuseppe.lore@xxxxxxx> wrote: > > Dear all, > > We have started to use more intensively cephfs for some wlcg related workload. > We have 3 active mds instances spread on 3 servers, mds_cache_memory_limit=12G, most of the other configs are default ones. > One of them has crashed this night leaving the log below. > Do you have any hint on what could be the cause and how to avoid it? Not atm. Telemetry reported similar crashes https://tracker.ceph.com/issues/54959 (cephfs) https://tracker.ceph.com/issues/54685 (mgr) BT indicates tcmalloc involvement, but not sure what's going on. > > Regards, > > Giuseppe > > [root@naret-monitor03 ~]# journalctl -u ceph-63334166-d991-11eb-99de-40a6b72108d0@mds.cephfs.naret-monitor03.lqppte.service > ... > Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific > > Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 1: /lib64/libpthread.so.0(+0x12ce0) [0x7fe291e4fce0] > Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 2: abort() > Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 3: /lib64/libstdc++.so.6(+0x987ba) [0x7fe2912567ba] > Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 4: /lib64/libstdc++.so.6(+0x9653c) [0x7fe29125453c] > Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 5: /lib64/libstdc++.so.6(+0x95559) [0x7fe291253559] > Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 6: __gxx_personality_v0() > Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 7: /lib64/libgcc_s.so.1(+0x10b03) [0x7fe290c34b03] > Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 8: _Unwind_Resume() > Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 9: /usr/bin/ceph-mds(+0x18c104) [0x5638351e7104] > Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 10: /lib64/libpthread.so.0(+0x12ce0) [0x7fe291e4fce0] > Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 11: gsignal() > Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 12: abort() > Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 13: /lib64/libstdc++.so.6(+0x9009b) [0x7fe29124e09b] > Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 14: /lib64/libstdc++.so.6(+0x9653c) [0x7fe29125453c] > Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 15: /lib64/libstdc++.so.6(+0x96597) [0x7fe291254597] > Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 16: /lib64/libstdc++.so.6(+0x967f8) [0x7fe2912547f8] > Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 17: /lib64/libtcmalloc.so.4(+0x19fa4) [0x7fe29bae6fa4] > Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 18: (tcmalloc::ThreadCache::FetchFromCentralCache(unsigned int, int, vo> > Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 19: (std::shared_ptr<inode_t<mempool::mds_co::pool_allocator> > InodeSt> > Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 20: (CInode::_decode_base(ceph::buffer::v15_2_0::list::iterator_impl<tr> > Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 21: (CInode::decode_import(ceph::buffer::v15_2_0::list::iterator_impl<t> > Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 22: (Migrator::decode_import_inode(CDentry*, ceph::buffer::v15_2_0::lis> > Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 23: (Migrator::decode_import_dir(ceph::buffer::v15_2_0::list::iterator_> > Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 24: (Migrator::handle_export_dir(boost::intrusive_ptr<MExportDir const>> > Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 25: (Migrator::dispatch(boost::intrusive_ptr<Message const> const&)+0x1> > Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 26: (MDSRank::handle_message(boost::intrusive_ptr<Message const> const&> > Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 27: (MDSRank::_dispatch(boost::intrusive_ptr<Message const> const&, boo> > Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 28: (MDSRankDispatcher::ms_dispatch(boost::intrusive_ptr<Message const>> > Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 29: (MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x10> > Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 30: (DispatchQueue::entry()+0x126a) [0x7fe2930a5aba] > Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 31: (DispatchQueue::DispatchThread::entry()+0x11) [0x7fe2931575d1] > Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 32: /lib64/libpthread.so.0(+0x81cf) [0x7fe291e451cf] > Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 33: clone() > Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is neede> > Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: > Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: --- begin dump of recent events --- > Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: terminate called recursively > Jan 19 04:49:43 naret-monitor03 systemd[1]: ceph-63334166-d991-11eb-99de-40a6b72108d0@mds.cephfs.naret-monitor03.lqppte.service: Main process exited, code=exited, status=127/n/a > Jan 19 04:49:43 naret-monitor03 systemd[1]: ceph-63334166-d991-11eb-99de-40a6b72108d0@mds.cephfs.naret-monitor03.lqppte.service: Failed with result 'exit-code'. > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > -- Cheers, Venky _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx