I’m curious how the weights might have been changed. I’ve never
touched a mon weight myself, do you know how that happened?
Zitat von "David C." <david.casier@xxxxxxxx>:
Ok, got it :
[root@pprod-admin:/var/lib/ceph/<UUID>]# ceph mon dump -f json-pretty
|egrep "name|weigh"
dumped monmap epoch 14
"min_mon_release_name": "quincy",
"name": "pprod-mon2",
"weight": 10,
"name": "pprod-mon3",
"weight": 10,
"name": "pprod-osd2",
"weight": 0,
"name": "pprod-osd1",
"weight": 0,
"name": "pprod-osd3",
"weight": 0,
ceph mon set-weight pprod-mon2 0
ceph mon set-weight pprod-mon3 0
And restart ceph-mgr
Le jeu. 7 mars 2024 à 18:25, David C. <david.casier@xxxxxxxx> a écrit :
I took the wrong ligne =>
https://github.com/ceph/ceph/blob/v17.2.6/src/mon/MonClient.cc#L822
Le jeu. 7 mars 2024 à 18:21, David C. <david.casier@xxxxxxxx> a écrit :
Hello everybody,
I'm encountering strange behavior on an infrastructure (it's
pre-production but it's very ugly). After a "drain" on monitor (and a
manager). MGRs all crash on startup:
Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: mgr ms_dispatch2 standby
mgrmap(e 1310) v1
Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: mgr handle_mgr_map received
map epoch 1310
Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: mgr handle_mgr_map active in
map: 1 active is 99148504
Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: mgr handle_mgr_map
Activating!
Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: mgr handle_mgr_map I am now
activating
Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: mgr ms_dispatch2 mgrmap(e
1310) v1
Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: mgrc handle_mgr_map Got map
version 1310
Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: mgrc handle_mgr_map Active
mgr is now
Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: mgrc reconnect No active mgr
available yet
Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: mgr init waiting for
OSDMap...
Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: monclient: _renew_subs
Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: monclient: _send_mon_message
to mon.idf-pprod-osd3 at v2:X.X.X.X:3300/0 <http://10.191.10.3:3300/0>
Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: monclient: _reopen_session
rank -1
Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: *** Caught signal (Aborted)
**
in thread 7f9a07a27640
thread_name:mgr-fin
ceph version
17.2.6-170.el9cp (59bbeb8815ec3aeb3c8bba1e1866f8f6729eb840) quincy (stable)
1:
/lib64/libc.so.6(+0x54db0) [0x7f9a2364ddb0]
2:
/lib64/libc.so.6(+0xa154c) [0x7f9a2369a54c]
3: raise()
4: abort()
5:
/usr/lib64/ceph/libceph-common.so.2(+0x1c1fa8) [0x7f9a23ce2fa8]
6:
/usr/lib64/ceph/libceph-common.so.2(+0x444425) [0x7f9a23f65425]
7:
/usr/lib64/ceph/libceph-common.so.2(+0x4442e0) [0x7f9a23f652e0]
8:
/usr/lib64/ceph/libceph-common.so.2(+0x4442e0) [0x7f9a23f652e0]
9:
(MonClient::_add_conns()+0x242) [0x7f9a23f5fa42]
10:
(MonClient::_reopen_session(int)+0x428) [0x7f9a23f60518]
11: (Mgr::init()+0x384)
[0x5604667a6434]
12:
/usr/bin/ceph-mgr(+0x1af271) [0x5604667ae271]
13:
/usr/bin/ceph-mgr(+0x11364d) [0x56046671264d]
14:
(Finisher::finisher_thread_entry()+0x175) [0x7f9a23d10645]
15:
/lib64/libc.so.6(+0x9f802) [0x7f9a23698802]
16:
/lib64/libc.so.6(+0x3f450) [0x7f9a23638450]
NOTE: a copy of the
executable, or `objdump -rdS <executable>` is needed to interpret this.
I have the impression that the MGRs are ejected by the monitors, however
after debugging monitor, I don't see anything abnormal on the monitor side
(if I haven't missed something).
All we can see is that we get an exception on the "_add_conn" method (
https://github.com/ceph/ceph/blob/v17.2.6/src/mon/MonClient.cc #L775)
Version : 17.2.6-170.el9cp (RHCS6)
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx