Ok, got it : [root@pprod-admin:/var/lib/ceph/<UUID>]# ceph mon dump -f json-pretty |egrep "name|weigh" dumped monmap epoch 14 "min_mon_release_name": "quincy", "name": "pprod-mon2", "weight": 10, "name": "pprod-mon3", "weight": 10, "name": "pprod-osd2", "weight": 0, "name": "pprod-osd1", "weight": 0, "name": "pprod-osd3", "weight": 0, ceph mon set-weight pprod-mon2 0 ceph mon set-weight pprod-mon3 0 And restart ceph-mgr Le jeu. 7 mars 2024 à 18:25, David C. <david.casier@xxxxxxxx> a écrit : > I took the wrong ligne => > https://github.com/ceph/ceph/blob/v17.2.6/src/mon/MonClient.cc#L822 > > > Le jeu. 7 mars 2024 à 18:21, David C. <david.casier@xxxxxxxx> a écrit : > >> >> Hello everybody, >> >> I'm encountering strange behavior on an infrastructure (it's >> pre-production but it's very ugly). After a "drain" on monitor (and a >> manager). MGRs all crash on startup: >> >> Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: mgr ms_dispatch2 standby >> mgrmap(e 1310) v1 >> Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: mgr handle_mgr_map received >> map epoch 1310 >> Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: mgr handle_mgr_map active in >> map: 1 active is 99148504 >> Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: mgr handle_mgr_map >> Activating! >> Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: mgr handle_mgr_map I am now >> activating >> Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: mgr ms_dispatch2 mgrmap(e >> 1310) v1 >> Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: mgrc handle_mgr_map Got map >> version 1310 >> Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: mgrc handle_mgr_map Active >> mgr is now >> Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: mgrc reconnect No active mgr >> available yet >> Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: mgr init waiting for >> OSDMap... >> Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: monclient: _renew_subs >> Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: monclient: _send_mon_message >> to mon.idf-pprod-osd3 at v2:X.X.X.X:3300/0 <http://10.191.10.3:3300/0> >> Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: monclient: _reopen_session >> rank -1 >> Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: *** Caught signal (Aborted) >> ** >> in thread 7f9a07a27640 >> thread_name:mgr-fin >> >> ceph version >> 17.2.6-170.el9cp (59bbeb8815ec3aeb3c8bba1e1866f8f6729eb840) quincy (stable) >> 1: >> /lib64/libc.so.6(+0x54db0) [0x7f9a2364ddb0] >> 2: >> /lib64/libc.so.6(+0xa154c) [0x7f9a2369a54c] >> 3: raise() >> 4: abort() >> 5: >> /usr/lib64/ceph/libceph-common.so.2(+0x1c1fa8) [0x7f9a23ce2fa8] >> 6: >> /usr/lib64/ceph/libceph-common.so.2(+0x444425) [0x7f9a23f65425] >> 7: >> /usr/lib64/ceph/libceph-common.so.2(+0x4442e0) [0x7f9a23f652e0] >> 8: >> /usr/lib64/ceph/libceph-common.so.2(+0x4442e0) [0x7f9a23f652e0] >> 9: >> (MonClient::_add_conns()+0x242) [0x7f9a23f5fa42] >> 10: >> (MonClient::_reopen_session(int)+0x428) [0x7f9a23f60518] >> 11: (Mgr::init()+0x384) >> [0x5604667a6434] >> 12: >> /usr/bin/ceph-mgr(+0x1af271) [0x5604667ae271] >> 13: >> /usr/bin/ceph-mgr(+0x11364d) [0x56046671264d] >> 14: >> (Finisher::finisher_thread_entry()+0x175) [0x7f9a23d10645] >> 15: >> /lib64/libc.so.6(+0x9f802) [0x7f9a23698802] >> 16: >> /lib64/libc.so.6(+0x3f450) [0x7f9a23638450] >> NOTE: a copy of the >> executable, or `objdump -rdS <executable>` is needed to interpret this. >> >> I have the impression that the MGRs are ejected by the monitors, however >> after debugging monitor, I don't see anything abnormal on the monitor side >> (if I haven't missed something). >> >> All we can see is that we get an exception on the "_add_conn" method ( >> https://github.com/ceph/ceph/blob/v17.2.6/src/mon/MonClient.cc #L775) >> >> Version : 17.2.6-170.el9cp (RHCS6) >> >> >> >> _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx