All MGR loop crash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



some monitors have existed for many years (weight 10) others have been
added (weight 0)

=> https://github.com/ceph/ceph/commit/2d113dedf851995e000d3cce136b69
bfa94b6fe0

Le jeudi 7 mars 2024, Eugen Block <eblock@xxxxxx> a écrit :

> I’m curious how the weights might have been changed. I’ve never touched a
> mon weight myself, do you know how that happened?
>
> Zitat von "David C." <david.casier@xxxxxxxx>:
>
> Ok, got it :
>>
>> [root@pprod-admin:/var/lib/ceph/<UUID>]# ceph mon dump -f json-pretty
>> |egrep "name|weigh"
>> dumped monmap epoch 14
>>     "min_mon_release_name": "quincy",
>>             "name": "pprod-mon2",
>>             "weight": 10,
>>             "name": "pprod-mon3",
>>             "weight": 10,
>>             "name": "pprod-osd2",
>>             "weight": 0,
>>             "name": "pprod-osd1",
>>             "weight": 0,
>>             "name": "pprod-osd3",
>>             "weight": 0,
>>
>> ceph mon set-weight pprod-mon2 0
>> ceph mon set-weight pprod-mon3 0
>>
>> And restart ceph-mgr
>>
>> Le jeu. 7 mars 2024 à 18:25, David C. <david.casier@xxxxxxxx> a écrit :
>>
>> I took the wrong ligne =>
>>> https://github.com/ceph/ceph/blob/v17.2.6/src/mon/MonClient.cc#L822
>>>
>>>
>>> Le jeu. 7 mars 2024 à 18:21, David C. <david.casier@xxxxxxxx> a écrit :
>>>
>>>
>>>> Hello everybody,
>>>>
>>>> I'm encountering strange behavior on an infrastructure (it's
>>>> pre-production but it's very ugly). After a "drain" on monitor (and a
>>>> manager). MGRs all crash on startup:
>>>>
>>>> Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: mgr ms_dispatch2 standby
>>>> mgrmap(e 1310) v1
>>>> Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: mgr handle_mgr_map received
>>>> map epoch 1310
>>>> Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: mgr handle_mgr_map active
>>>> in
>>>> map: 1 active is 99148504
>>>> Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: mgr handle_mgr_map
>>>> Activating!
>>>> Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: mgr handle_mgr_map I am now
>>>> activating
>>>> Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: mgr ms_dispatch2 mgrmap(e
>>>> 1310) v1
>>>> Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: mgrc handle_mgr_map Got map
>>>> version 1310
>>>> Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: mgrc handle_mgr_map Active
>>>> mgr is now
>>>> Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: mgrc reconnect No active
>>>> mgr
>>>> available yet
>>>> Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: mgr init waiting for
>>>> OSDMap...
>>>> Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: monclient: _renew_subs
>>>> Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: monclient:
>>>> _send_mon_message
>>>> to mon.idf-pprod-osd3 at v2:X.X.X.X:3300/0 <http://10.191.10.3:3300/0>
>>>> Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: monclient: _reopen_session
>>>> rank -1
>>>> Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: *** Caught signal (Aborted)
>>>> **
>>>>                                                   in thread 7f9a07a27640
>>>> thread_name:mgr-fin
>>>>
>>>>                                                   ceph version
>>>> 17.2.6-170.el9cp (59bbeb8815ec3aeb3c8bba1e1866f8f6729eb840) quincy
>>>> (stable)
>>>>                                                   1:
>>>> /lib64/libc.so.6(+0x54db0) [0x7f9a2364ddb0]
>>>>                                                   2:
>>>> /lib64/libc.so.6(+0xa154c) [0x7f9a2369a54c]
>>>>                                                   3: raise()
>>>>                                                   4: abort()
>>>>                                                   5:
>>>> /usr/lib64/ceph/libceph-common.so.2(+0x1c1fa8) [0x7f9a23ce2fa8]
>>>>                                                   6:
>>>> /usr/lib64/ceph/libceph-common.so.2(+0x444425) [0x7f9a23f65425]
>>>>                                                   7:
>>>> /usr/lib64/ceph/libceph-common.so.2(+0x4442e0) [0x7f9a23f652e0]
>>>>                                                   8:
>>>> /usr/lib64/ceph/libceph-common.so.2(+0x4442e0) [0x7f9a23f652e0]
>>>>                                                   9:
>>>> (MonClient::_add_conns()+0x242) [0x7f9a23f5fa42]
>>>>                                                   10:
>>>> (MonClient::_reopen_session(int)+0x428) [0x7f9a23f60518]
>>>>                                                   11:
>>>> (Mgr::init()+0x384)
>>>> [0x5604667a6434]
>>>>                                                   12:
>>>> /usr/bin/ceph-mgr(+0x1af271) [0x5604667ae271]
>>>>                                                   13:
>>>> /usr/bin/ceph-mgr(+0x11364d) [0x56046671264d]
>>>>                                                   14:
>>>> (Finisher::finisher_thread_entry()+0x175) [0x7f9a23d10645]
>>>>                                                   15:
>>>> /lib64/libc.so.6(+0x9f802) [0x7f9a23698802]
>>>>                                                   16:
>>>> /lib64/libc.so.6(+0x3f450) [0x7f9a23638450]
>>>>                                                   NOTE: a copy of the
>>>> executable, or `objdump -rdS <executable>` is needed to interpret this.
>>>>
>>>> I have the impression that the MGRs are ejected by the monitors, however
>>>> after debugging monitor, I don't see anything abnormal on the monitor
>>>> side
>>>> (if I haven't missed something).
>>>>
>>>> All we can see is that we get an exception on the "_add_conn" method (
>>>> https://github.com/ceph/ceph/blob/v17.2.6/src/mon/MonClient.cc #L775)
>>>>
>>>> Version : 17.2.6-170.el9cp (RHCS6)
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>


-- 
________________________________________________________

Cordialement,

*David CASIER*




*Ligne directe: +33(0) 9 72 61 98 29*
________________________________________________________
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux