Re: Mds crash at cscs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jan 19, 2023 at 9:07 PM Lo Re Giuseppe <giuseppe.lore@xxxxxxx> wrote:
>
> Dear all,
>
> We have started to use more intensively cephfs for some wlcg related workload.
> We have 3 active mds instances spread on 3 servers, mds_cache_memory_limit=12G, most of the other configs are default ones.
> One of them has crashed this night leaving the log below.
> Do you have any hint on what could be the cause and how to avoid it?

Not atm. Telemetry reported similar crashes

        https://tracker.ceph.com/issues/54959 (cephfs)
        https://tracker.ceph.com/issues/54685 (mgr)

BT indicates tcmalloc involvement, but not sure what's going on.

>
> Regards,
>
> Giuseppe
>
> [root@naret-monitor03 ~]# journalctl -u ceph-63334166-d991-11eb-99de-40a6b72108d0@mds.cephfs.naret-monitor03.lqppte.service
> ...
> Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific >
> Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  1: /lib64/libpthread.so.0(+0x12ce0) [0x7fe291e4fce0]
> Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  2: abort()
> Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  3: /lib64/libstdc++.so.6(+0x987ba) [0x7fe2912567ba]
> Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  4: /lib64/libstdc++.so.6(+0x9653c) [0x7fe29125453c]
> Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  5: /lib64/libstdc++.so.6(+0x95559) [0x7fe291253559]
> Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  6: __gxx_personality_v0()
> Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  7: /lib64/libgcc_s.so.1(+0x10b03) [0x7fe290c34b03]
> Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  8: _Unwind_Resume()
> Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  9: /usr/bin/ceph-mds(+0x18c104) [0x5638351e7104]
> Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  10: /lib64/libpthread.so.0(+0x12ce0) [0x7fe291e4fce0]
> Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  11: gsignal()
> Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  12: abort()
> Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  13: /lib64/libstdc++.so.6(+0x9009b) [0x7fe29124e09b]
> Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  14: /lib64/libstdc++.so.6(+0x9653c) [0x7fe29125453c]
> Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  15: /lib64/libstdc++.so.6(+0x96597) [0x7fe291254597]
> Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  16: /lib64/libstdc++.so.6(+0x967f8) [0x7fe2912547f8]
> Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  17: /lib64/libtcmalloc.so.4(+0x19fa4) [0x7fe29bae6fa4]
> Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  18: (tcmalloc::ThreadCache::FetchFromCentralCache(unsigned int, int, vo>
> Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  19: (std::shared_ptr<inode_t<mempool::mds_co::pool_allocator> > InodeSt>
> Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  20: (CInode::_decode_base(ceph::buffer::v15_2_0::list::iterator_impl<tr>
> Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  21: (CInode::decode_import(ceph::buffer::v15_2_0::list::iterator_impl<t>
> Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  22: (Migrator::decode_import_inode(CDentry*, ceph::buffer::v15_2_0::lis>
> Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  23: (Migrator::decode_import_dir(ceph::buffer::v15_2_0::list::iterator_>
> Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  24: (Migrator::handle_export_dir(boost::intrusive_ptr<MExportDir const>>
> Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  25: (Migrator::dispatch(boost::intrusive_ptr<Message const> const&)+0x1>
> Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  26: (MDSRank::handle_message(boost::intrusive_ptr<Message const> const&>
> Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  27: (MDSRank::_dispatch(boost::intrusive_ptr<Message const> const&, boo>
> Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  28: (MDSRankDispatcher::ms_dispatch(boost::intrusive_ptr<Message const>>
> Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  29: (MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x10>
> Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  30: (DispatchQueue::entry()+0x126a) [0x7fe2930a5aba]
> Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  31: (DispatchQueue::DispatchThread::entry()+0x11) [0x7fe2931575d1]
> Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  32: /lib64/libpthread.so.0(+0x81cf) [0x7fe291e451cf]
> Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  33: clone()
> Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:  NOTE: a copy of the executable, or `objdump -rdS <executable>` is neede>
> Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
> Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: --- begin dump of recent events ---
> Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: terminate called recursively
> Jan 19 04:49:43 naret-monitor03 systemd[1]: ceph-63334166-d991-11eb-99de-40a6b72108d0@mds.cephfs.naret-monitor03.lqppte.service: Main process exited, code=exited, status=127/n/a
> Jan 19 04:49:43 naret-monitor03 systemd[1]: ceph-63334166-d991-11eb-99de-40a6b72108d0@mds.cephfs.naret-monitor03.lqppte.service: Failed with result 'exit-code'.
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>


--
Cheers,
Venky
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux