Hi Team, this night I have caught the following segfault. Nothing else looks suspicious (but I'm a quite newbie in ceph management, so perhaps just don't know where to look at). I could not google any similar segfault from anybody else. Was it a known problem fixed in later versions? The cluster has been running for quite a while now (several months) and this has happened for the first time. ``` debug 0> 2020-12-02T10:31:09.295+0000 7f3e61943700 -1 *** Caught signal (Segmentation fault) ** in thread 7f3e61943700 thread_name:msgr-worker-1 ceph version 15.2.4 (7447c15c6ff58d7fce91843b705a268a1917325c) octopus (stable) 1: (()+0x12dd0) [0x7f3e65933dd0] 2: (std::_Rb_tree<boost::intrusive_ptr<AsyncConnection>, boost::intrusive_ptr<AsyncConnection>, std::_Identity<boost::intrusive_ptr<AsyncConnection> >, std::less<boost::intrusive_ptr<AsyncConnection> >, std::allocator<boost::intrusive_ptr<AsyncConnection> > >::find(boost::intrusive_ptr<AsyncConnection> const&) const+0x2c) [0x55cee407ca7c] 3: (AsyncConnection::_stop()+0xab) [0x55cee40767eb] 4: (ProtocolV2::stop()+0x8f) [0x55cee40a189f] 5: (ProtocolV2::_fault()+0x133) [0x55cee40a1b03] 6: (ProtocolV2::handle_read_frame_preamble_main(std::unique_ptr<ceph::buffer::v15_2_0::ptr_node, ceph::buffer::v15_2_0::ptr_node::disposer>&&, int)+0x551) [0x55cee40a63d1] 7: (ProtocolV2::run_continuation(Ct<ProtocolV2>&)+0x3c) [0x55cee40a273c] 8: (AsyncConnection::process()+0x8a9) [0x55cee4079ab9] 9: (EventCenter::process_events(unsigned int, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*)+0xcb7) [0x55cee3eceb67] 10: (()+0xdb914c) [0x55cee3ed414c] 11: (()+0xc2b73) [0x7f3e64f83b73] 12: (()+0x82de) [0x7f3e659292de] 13: (clone()+0x43) [0x7f3e64660e83] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- logging levels --- 0/ 5 none 0/ 1 lockdep 0/ 1 context 1/ 1 crush 1/ 5 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 0/ 1 buffer 0/ 1 timer 0/ 1 filer 0/ 1 striper 0/ 1 objecter 0/ 5 rados 0/ 5 rbd 0/ 5 rbd_mirror 0/ 5 rbd_replay 0/ 5 rbd_rwl 0/ 5 journaler 0/ 5 objectcacher 0/ 5 immutable_obj_cache 0/ 5 client 1/ 5 osd 0/ 5 optracker 0/ 5 objclass 1/ 3 filestore 1/ 3 journal 0/ 0 ms 1/ 5 mon 0/10 monc 1/ 5 paxos 0/ 5 tp 1/ 5 auth 1/ 5 crypto 1/ 1 finisher 1/ 1 reserver 1/ 5 heartbeatmap 1/ 5 perfcounter 1/ 5 rgw 1/ 5 rgw_sync 1/10 civetweb 1/ 5 javaclient 1/ 5 asok 1/ 1 throttle 0/ 0 refs 1/ 5 compressor 1/ 5 bluestore 1/ 5 bluefs 1/ 3 bdev 1/ 5 kstore 4/ 5 rocksdb 4/ 5 leveldb 4/ 5 memdb 1/ 5 fuse 1/ 5 mgr 1/ 5 mgrc 1/ 5 dpdk 1/ 5 eventtrace 1/ 5 prioritycache 0/ 5 test -2/-2 (syslog threshold) 99/99 (stderr threshold) --- pthread ID / name mapping for recent threads --- 7f3e3d88b700 / osd_srv_heartbt 7f3e41893700 / tp_osd_tp 7f3e4589b700 / tp_osd_tp 7f3e4f8af700 / rocksdb:dump_st 7f3e52a8c700 / safe_timer 7f3e53a8e700 / ms_dispatch 7f3e566b7700 / bstore_mempool 7f3e5d0ca700 / safe_timer 7f3e61943700 / msgr-worker-1 7f3e62144700 / msgr-worker-0 max_recent 10000 max_new 1000 log_file /var/lib/ceph/crash/2020-12-02T10:31:09.301492Z_84e8430f-30fd-469f-8e22-c2e1ccc675da/log --- end dump of recent events --- reraise_fatal: default handler for signal 11 didn't terminate the process? ``` -- With best regards, Ivan Kurnosov _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx