some monitors have existed for many years (weight 10) others have been added (weight 0) => https://github.com/ceph/ceph/commit/2d113dedf851995e000d3cce136b69 bfa94b6fe0 Le jeudi 7 mars 2024, Eugen Block <eblock@xxxxxx> a écrit : > I’m curious how the weights might have been changed. I’ve never touched a > mon weight myself, do you know how that happened? > > Zitat von "David C." <david.casier@xxxxxxxx>: > > Ok, got it : >> >> [root@pprod-admin:/var/lib/ceph/<UUID>]# ceph mon dump -f json-pretty >> |egrep "name|weigh" >> dumped monmap epoch 14 >> "min_mon_release_name": "quincy", >> "name": "pprod-mon2", >> "weight": 10, >> "name": "pprod-mon3", >> "weight": 10, >> "name": "pprod-osd2", >> "weight": 0, >> "name": "pprod-osd1", >> "weight": 0, >> "name": "pprod-osd3", >> "weight": 0, >> >> ceph mon set-weight pprod-mon2 0 >> ceph mon set-weight pprod-mon3 0 >> >> And restart ceph-mgr >> >> Le jeu. 7 mars 2024 à 18:25, David C. <david.casier@xxxxxxxx> a écrit : >> >> I took the wrong ligne => >>> https://github.com/ceph/ceph/blob/v17.2.6/src/mon/MonClient.cc#L822 >>> >>> >>> Le jeu. 7 mars 2024 à 18:21, David C. <david.casier@xxxxxxxx> a écrit : >>> >>> >>>> Hello everybody, >>>> >>>> I'm encountering strange behavior on an infrastructure (it's >>>> pre-production but it's very ugly). After a "drain" on monitor (and a >>>> manager). MGRs all crash on startup: >>>> >>>> Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: mgr ms_dispatch2 standby >>>> mgrmap(e 1310) v1 >>>> Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: mgr handle_mgr_map received >>>> map epoch 1310 >>>> Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: mgr handle_mgr_map active >>>> in >>>> map: 1 active is 99148504 >>>> Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: mgr handle_mgr_map >>>> Activating! >>>> Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: mgr handle_mgr_map I am now >>>> activating >>>> Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: mgr ms_dispatch2 mgrmap(e >>>> 1310) v1 >>>> Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: mgrc handle_mgr_map Got map >>>> version 1310 >>>> Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: mgrc handle_mgr_map Active >>>> mgr is now >>>> Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: mgrc reconnect No active >>>> mgr >>>> available yet >>>> Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: mgr init waiting for >>>> OSDMap... >>>> Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: monclient: _renew_subs >>>> Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: monclient: >>>> _send_mon_message >>>> to mon.idf-pprod-osd3 at v2:X.X.X.X:3300/0 <http://10.191.10.3:3300/0> >>>> Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: monclient: _reopen_session >>>> rank -1 >>>> Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: *** Caught signal (Aborted) >>>> ** >>>> in thread 7f9a07a27640 >>>> thread_name:mgr-fin >>>> >>>> ceph version >>>> 17.2.6-170.el9cp (59bbeb8815ec3aeb3c8bba1e1866f8f6729eb840) quincy >>>> (stable) >>>> 1: >>>> /lib64/libc.so.6(+0x54db0) [0x7f9a2364ddb0] >>>> 2: >>>> /lib64/libc.so.6(+0xa154c) [0x7f9a2369a54c] >>>> 3: raise() >>>> 4: abort() >>>> 5: >>>> /usr/lib64/ceph/libceph-common.so.2(+0x1c1fa8) [0x7f9a23ce2fa8] >>>> 6: >>>> /usr/lib64/ceph/libceph-common.so.2(+0x444425) [0x7f9a23f65425] >>>> 7: >>>> /usr/lib64/ceph/libceph-common.so.2(+0x4442e0) [0x7f9a23f652e0] >>>> 8: >>>> /usr/lib64/ceph/libceph-common.so.2(+0x4442e0) [0x7f9a23f652e0] >>>> 9: >>>> (MonClient::_add_conns()+0x242) [0x7f9a23f5fa42] >>>> 10: >>>> (MonClient::_reopen_session(int)+0x428) [0x7f9a23f60518] >>>> 11: >>>> (Mgr::init()+0x384) >>>> [0x5604667a6434] >>>> 12: >>>> /usr/bin/ceph-mgr(+0x1af271) [0x5604667ae271] >>>> 13: >>>> /usr/bin/ceph-mgr(+0x11364d) [0x56046671264d] >>>> 14: >>>> (Finisher::finisher_thread_entry()+0x175) [0x7f9a23d10645] >>>> 15: >>>> /lib64/libc.so.6(+0x9f802) [0x7f9a23698802] >>>> 16: >>>> /lib64/libc.so.6(+0x3f450) [0x7f9a23638450] >>>> NOTE: a copy of the >>>> executable, or `objdump -rdS <executable>` is needed to interpret this. >>>> >>>> I have the impression that the MGRs are ejected by the monitors, however >>>> after debugging monitor, I don't see anything abnormal on the monitor >>>> side >>>> (if I haven't missed something). >>>> >>>> All we can see is that we get an exception on the "_add_conn" method ( >>>> https://github.com/ceph/ceph/blob/v17.2.6/src/mon/MonClient.cc #L775) >>>> >>>> Version : 17.2.6-170.el9cp (RHCS6) >>>> >>>> >>>> >>>> >>>> _______________________________________________ >> ceph-users mailing list -- ceph-users@xxxxxxx >> To unsubscribe send an email to ceph-users-leave@xxxxxxx >> > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > -- ________________________________________________________ Cordialement, *David CASIER* *Ligne directe: +33(0) 9 72 61 98 29* ________________________________________________________ _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx