I took the wrong ligne => https://github.com/ceph/ceph/blob/v17.2.6/src/mon/MonClient.cc#L822 Le jeu. 7 mars 2024 à 18:21, David C. <david.casier@xxxxxxxx> a écrit : > > Hello everybody, > > I'm encountering strange behavior on an infrastructure (it's > pre-production but it's very ugly). After a "drain" on monitor (and a > manager). MGRs all crash on startup: > > Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: mgr ms_dispatch2 standby > mgrmap(e 1310) v1 > Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: mgr handle_mgr_map received > map epoch 1310 > Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: mgr handle_mgr_map active in > map: 1 active is 99148504 > Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: mgr handle_mgr_map Activating! > Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: mgr handle_mgr_map I am now > activating > Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: mgr ms_dispatch2 mgrmap(e > 1310) v1 > Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: mgrc handle_mgr_map Got map > version 1310 > Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: mgrc handle_mgr_map Active > mgr is now > Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: mgrc reconnect No active mgr > available yet > Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: mgr init waiting for OSDMap... > Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: monclient: _renew_subs > Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: monclient: _send_mon_message > to mon.idf-pprod-osd3 at v2:X.X.X.X:3300/0 <http://10.191.10.3:3300/0> > Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: monclient: _reopen_session > rank -1 > Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: *** Caught signal (Aborted) ** > in thread 7f9a07a27640 > thread_name:mgr-fin > > ceph version > 17.2.6-170.el9cp (59bbeb8815ec3aeb3c8bba1e1866f8f6729eb840) quincy (stable) > 1: > /lib64/libc.so.6(+0x54db0) [0x7f9a2364ddb0] > 2: > /lib64/libc.so.6(+0xa154c) [0x7f9a2369a54c] > 3: raise() > 4: abort() > 5: > /usr/lib64/ceph/libceph-common.so.2(+0x1c1fa8) [0x7f9a23ce2fa8] > 6: > /usr/lib64/ceph/libceph-common.so.2(+0x444425) [0x7f9a23f65425] > 7: > /usr/lib64/ceph/libceph-common.so.2(+0x4442e0) [0x7f9a23f652e0] > 8: > /usr/lib64/ceph/libceph-common.so.2(+0x4442e0) [0x7f9a23f652e0] > 9: > (MonClient::_add_conns()+0x242) [0x7f9a23f5fa42] > 10: > (MonClient::_reopen_session(int)+0x428) [0x7f9a23f60518] > 11: (Mgr::init()+0x384) > [0x5604667a6434] > 12: > /usr/bin/ceph-mgr(+0x1af271) [0x5604667ae271] > 13: > /usr/bin/ceph-mgr(+0x11364d) [0x56046671264d] > 14: > (Finisher::finisher_thread_entry()+0x175) [0x7f9a23d10645] > 15: > /lib64/libc.so.6(+0x9f802) [0x7f9a23698802] > 16: > /lib64/libc.so.6(+0x3f450) [0x7f9a23638450] > NOTE: a copy of the > executable, or `objdump -rdS <executable>` is needed to interpret this. > > I have the impression that the MGRs are ejected by the monitors, however > after debugging monitor, I don't see anything abnormal on the monitor side > (if I haven't missed something). > > All we can see is that we get an exception on the "_add_conn" method ( > https://github.com/ceph/ceph/blob/v17.2.6/src/mon/MonClient.cc #L775) > > Version : 17.2.6-170.el9cp (RHCS6) > > > > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx