Hi, The problem seems to come from the clients (reconnect). Test by disabling metrics on all clients: echo Y > /sys/module/ceph/parameters/disable_send_metrics ________________________________________________________ Cordialement, *David CASIER* ________________________________________________________ Le ven. 23 févr. 2024 à 10:20, Eugen Block <eblock@xxxxxx> a écrit : > This seems to be the relevant stack trace: > > ---snip--- > Feb 23 15:18:39 cephgw02 conmon[2158052]: debug -1> > 2024-02-23T08:18:39.609+0000 7fccc03c0700 -1 > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.4/rpm/el8/BUILD/ceph-16.2.4/src/include/cephfs/metrics/Types.h: > In function 'std::ostream& operator<<(std::ostream&, const > ClientMetricType&)' thread 7fccc03c0700 time > 2024-02-23T08:18:39.609581+0000 > Feb 23 15:18:39 cephgw02 conmon[2158052]: > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.4/rpm/el8/BUILD/ceph-16.2.4/src/include/cephfs/metrics/Types.h: > 56: ceph_abort_msg("abort() > called") > Feb 23 15:18:39 cephgw02 conmon[2158052]: > Feb 23 15:18:39 cephgw02 conmon[2158052]: ceph version 16.2.4 > (3cbe25cde3cfa028984618ad32de9edc4c1eaed0) pacific (stable) > Feb 23 15:18:39 cephgw02 conmon[2158052]: 1: (ceph::__ceph_abort(char > const*, int, char const*, std::__cxx11::basic_string<char, > std::char_traits<char>, std::allocator<char> > const&)+0xe5) > [0x7fccc9021cdc] > Feb 23 15:18:39 cephgw02 conmon[2158052]: 2: > (operator<<(std::ostream&, ClientMetricType const&)+0x10e) > [0x7fccc92a642e] > Feb 23 15:18:39 cephgw02 conmon[2158052]: 3: > (MClientMetrics::print(std::ostream&) const+0x1a1) [0x7fccc92a6601] > Feb 23 15:18:39 cephgw02 conmon[2158052]: 4: > (DispatchQueue::pre_dispatch(boost::intrusive_ptr<Message> > const&)+0x710) [0x7fccc9259c30] > Feb 23 15:18:39 cephgw02 conmon[2158052]: 5: > (DispatchQueue::entry()+0xdeb) [0x7fccc925b69b] > Feb 23 15:18:39 cephgw02 conmon[2158052]: 6: > (DispatchQueue::DispatchThread::entry()+0x11) [0x7fccc930bb71] > Feb 23 15:18:39 cephgw02 conmon[2158052]: 7: > /lib64/libpthread.so.0(+0x814a) [0x7fccc7dc314a] > Feb 23 15:18:39 cephgw02 conmon[2158052]: 8: clone() > Feb 23 15:18:39 cephgw02 conmon[2158052]: > Feb 23 15:18:39 cephgw02 conmon[2158052]: debug 0> > 2024-02-23T08:18:39.610+0000 7fccc03c0700 -1 *** Caught signal > (Aborted) ** > Feb 23 15:18:39 cephgw02 conmon[2158052]: in thread 7fccc03c0700 > thread_name:ms_dispatch > Feb 23 15:18:39 cephgw02 conmon[2158052]: > Feb 23 15:18:39 cephgw02 conmon[2158052]: ceph version 16.2.4 > (3cbe25cde3cfa028984618ad32de9edc4c1eaed0) pacific (stable) > Feb 23 15:18:39 cephgw02 conmon[2158052]: 1: > /lib64/libpthread.so.0(+0x12b20) [0x7fccc7dcdb20] > Feb 23 15:18:39 cephgw02 conmon[2158052]: 2: gsignal() > Feb 23 15:18:39 cephgw02 conmon[2158052]: 3: abort() > Feb 23 15:18:39 cephgw02 conmon[2158052]: 4: (ceph::__ceph_abort(char > const*, int, char const*, std::__cxx11::basic_string<char, > std::char_traits<char>, std::allocator<char> > const&)+0x1b6) > [0x7fccc9021dad] > Feb 23 15:18:39 cephgw02 conmon[2158052]: 5: (opera > Feb 23 15:18:39 cephgw02 conmon[2158052]: tor<<(std::ostream&, > ClientMetricType const&)+0x10e) [0x7fccc92a642e] > Feb 23 15:18:39 cephgw02 conmon[2158052]: 6: > (MClientMetrics::print(std::ostream&) const+0x1a1) [0x7fccc92a6601] > Feb 23 15:18:39 cephgw02 conmon[2158052]: 7: > (DispatchQueue::pre_dispatch(boost::intrusive_ptr<Message> > const&)+0x710) [0x7fccc9259c30] > Feb 23 15:18:39 cephgw02 conmon[2158052]: 8: > (DispatchQueue::entry()+0xdeb) [0x7fccc925b69b] > Feb 23 15:18:39 cephgw02 conmon[2158052]: 9: > (DispatchQueue::DispatchThread::entry()+0x11) [0x7fccc930bb71] > Feb 23 15:18:39 cephgw02 conmon[2158052]: 10: > /lib64/libpthread.so.0(+0x814a) [0x7fccc7dc314a] > Feb 23 15:18:39 cephgw02 conmon[2158052]: 11: clone() > ---snip--- > > But I can't really help here, hopefully someone else can chime in and > interpret it. > > > Zitat von nguyenvandiep@xxxxxxxxxxxxxx: > > > > https://drive.google.com/file/d/1OIN5O2Vj0iWfEMJ2fyHN_xV6fpknBmym/view?usp=sharing > > > > Pls check my mds log which generate by command > > > > cephadm logs --name mds.cephfs.cephgw02.qqsavr --fsid > > 258af72a-cff3-11eb-a261-d4f5ef25154c > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx