Hello everybody, I'm encountering strange behavior on an infrastructure (it's pre-production but it's very ugly). After a "drain" on monitor (and a manager). MGRs all crash on startup: Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: mgr ms_dispatch2 standby mgrmap(e 1310) v1 Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: mgr handle_mgr_map received map epoch 1310 Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: mgr handle_mgr_map active in map: 1 active is 99148504 Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: mgr handle_mgr_map Activating! Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: mgr handle_mgr_map I am now activating Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: mgr ms_dispatch2 mgrmap(e 1310) v1 Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: mgrc handle_mgr_map Got map version 1310 Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: mgrc handle_mgr_map Active mgr is now Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: mgrc reconnect No active mgr available yet Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: mgr init waiting for OSDMap... Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: monclient: _renew_subs Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: monclient: _send_mon_message to mon.idf-pprod-osd3 at v2:X.X.X.X:3300/0 <http://10.191.10.3:3300/0> Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: monclient: _reopen_session rank -1 Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: *** Caught signal (Aborted) ** in thread 7f9a07a27640 thread_name:mgr-fin ceph version 17.2.6-170.el9cp (59bbeb8815ec3aeb3c8bba1e1866f8f6729eb840) quincy (stable) 1: /lib64/libc.so.6(+0x54db0) [0x7f9a2364ddb0] 2: /lib64/libc.so.6(+0xa154c) [0x7f9a2369a54c] 3: raise() 4: abort() 5: /usr/lib64/ceph/libceph-common.so.2(+0x1c1fa8) [0x7f9a23ce2fa8] 6: /usr/lib64/ceph/libceph-common.so.2(+0x444425) [0x7f9a23f65425] 7: /usr/lib64/ceph/libceph-common.so.2(+0x4442e0) [0x7f9a23f652e0] 8: /usr/lib64/ceph/libceph-common.so.2(+0x4442e0) [0x7f9a23f652e0] 9: (MonClient::_add_conns()+0x242) [0x7f9a23f5fa42] 10: (MonClient::_reopen_session(int)+0x428) [0x7f9a23f60518] 11: (Mgr::init()+0x384) [0x5604667a6434] 12: /usr/bin/ceph-mgr(+0x1af271) [0x5604667ae271] 13: /usr/bin/ceph-mgr(+0x11364d) [0x56046671264d] 14: (Finisher::finisher_thread_entry()+0x175) [0x7f9a23d10645] 15: /lib64/libc.so.6(+0x9f802) [0x7f9a23698802] 16: /lib64/libc.so.6(+0x3f450) [0x7f9a23638450] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. I have the impression that the MGRs are ejected by the monitors, however after debugging monitor, I don't see anything abnormal on the monitor side (if I haven't missed something). All we can see is that we get an exception on the "_add_conn" method ( https://github.com/ceph/ceph/blob/v17.2.6/src/mon/MonClient.cc #L775) Version : 17.2.6-170.el9cp (RHCS6) _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx