Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,
The problem seems to come from the clients (reconnect).

Test by disabling metrics on all clients:
echo Y > /sys/module/ceph/parameters/disable_send_metrics

________________________________________________________

Cordialement,

*David CASIER*

________________________________________________________



Le ven. 23 févr. 2024 à 10:20, Eugen Block <eblock@xxxxxx> a écrit :

> This seems to be the relevant stack trace:
>
> ---snip---
> Feb 23 15:18:39 cephgw02 conmon[2158052]: debug     -1>
> 2024-02-23T08:18:39.609+0000 7fccc03c0700 -1
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.4/rpm/el8/BUILD/ceph-16.2.4/src/include/cephfs/metrics/Types.h:
> In function 'std::ostream& operator<<(std::ostream&, const
> ClientMetricType&)' thread 7fccc03c0700 time
> 2024-02-23T08:18:39.609581+0000
> Feb 23 15:18:39 cephgw02 conmon[2158052]:
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.4/rpm/el8/BUILD/ceph-16.2.4/src/include/cephfs/metrics/Types.h:
> 56: ceph_abort_msg("abort()
> called")
> Feb 23 15:18:39 cephgw02 conmon[2158052]:
> Feb 23 15:18:39 cephgw02 conmon[2158052]:  ceph version 16.2.4
> (3cbe25cde3cfa028984618ad32de9edc4c1eaed0) pacific (stable)
> Feb 23 15:18:39 cephgw02 conmon[2158052]:  1: (ceph::__ceph_abort(char
> const*, int, char const*, std::__cxx11::basic_string<char,
> std::char_traits<char>, std::allocator<char> > const&)+0xe5)
> [0x7fccc9021cdc]
> Feb 23 15:18:39 cephgw02 conmon[2158052]:  2:
> (operator<<(std::ostream&, ClientMetricType const&)+0x10e)
> [0x7fccc92a642e]
> Feb 23 15:18:39 cephgw02 conmon[2158052]:  3:
> (MClientMetrics::print(std::ostream&) const+0x1a1) [0x7fccc92a6601]
> Feb 23 15:18:39 cephgw02 conmon[2158052]:  4:
> (DispatchQueue::pre_dispatch(boost::intrusive_ptr<Message>
> const&)+0x710) [0x7fccc9259c30]
> Feb 23 15:18:39 cephgw02 conmon[2158052]:  5:
> (DispatchQueue::entry()+0xdeb) [0x7fccc925b69b]
> Feb 23 15:18:39 cephgw02 conmon[2158052]:  6:
> (DispatchQueue::DispatchThread::entry()+0x11) [0x7fccc930bb71]
> Feb 23 15:18:39 cephgw02 conmon[2158052]:  7:
> /lib64/libpthread.so.0(+0x814a) [0x7fccc7dc314a]
> Feb 23 15:18:39 cephgw02 conmon[2158052]:  8: clone()
> Feb 23 15:18:39 cephgw02 conmon[2158052]:
> Feb 23 15:18:39 cephgw02 conmon[2158052]: debug      0>
> 2024-02-23T08:18:39.610+0000 7fccc03c0700 -1 *** Caught signal
> (Aborted) **
> Feb 23 15:18:39 cephgw02 conmon[2158052]:  in thread 7fccc03c0700
> thread_name:ms_dispatch
> Feb 23 15:18:39 cephgw02 conmon[2158052]:
> Feb 23 15:18:39 cephgw02 conmon[2158052]:  ceph version 16.2.4
> (3cbe25cde3cfa028984618ad32de9edc4c1eaed0) pacific (stable)
> Feb 23 15:18:39 cephgw02 conmon[2158052]:  1:
> /lib64/libpthread.so.0(+0x12b20) [0x7fccc7dcdb20]
> Feb 23 15:18:39 cephgw02 conmon[2158052]:  2: gsignal()
> Feb 23 15:18:39 cephgw02 conmon[2158052]:  3: abort()
> Feb 23 15:18:39 cephgw02 conmon[2158052]:  4: (ceph::__ceph_abort(char
> const*, int, char const*, std::__cxx11::basic_string<char,
> std::char_traits<char>, std::allocator<char> > const&)+0x1b6)
> [0x7fccc9021dad]
> Feb 23 15:18:39 cephgw02 conmon[2158052]:  5: (opera
> Feb 23 15:18:39 cephgw02 conmon[2158052]: tor<<(std::ostream&,
> ClientMetricType const&)+0x10e) [0x7fccc92a642e]
> Feb 23 15:18:39 cephgw02 conmon[2158052]:  6:
> (MClientMetrics::print(std::ostream&) const+0x1a1) [0x7fccc92a6601]
> Feb 23 15:18:39 cephgw02 conmon[2158052]:  7:
> (DispatchQueue::pre_dispatch(boost::intrusive_ptr<Message>
> const&)+0x710) [0x7fccc9259c30]
> Feb 23 15:18:39 cephgw02 conmon[2158052]:  8:
> (DispatchQueue::entry()+0xdeb) [0x7fccc925b69b]
> Feb 23 15:18:39 cephgw02 conmon[2158052]:  9:
> (DispatchQueue::DispatchThread::entry()+0x11) [0x7fccc930bb71]
> Feb 23 15:18:39 cephgw02 conmon[2158052]:  10:
> /lib64/libpthread.so.0(+0x814a) [0x7fccc7dc314a]
> Feb 23 15:18:39 cephgw02 conmon[2158052]:  11: clone()
> ---snip---
>
> But I can't really help here, hopefully someone else can chime in and
> interpret it.
>
>
> Zitat von nguyenvandiep@xxxxxxxxxxxxxx:
>
> >
> https://drive.google.com/file/d/1OIN5O2Vj0iWfEMJ2fyHN_xV6fpknBmym/view?usp=sharing
> >
> > Pls check my mds log which generate by command
> >
> > cephadm logs --name mds.cephfs.cephgw02.qqsavr --fsid
> > 258af72a-cff3-11eb-a261-d4f5ef25154c
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux