Re: 14.2.22 dashboard periodically dies and didn't failover

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Yes, it's enabled, just died again, this is in the log now:

2022-01-13 13:15:59.330 7fe7e085e700 -1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 2022-01-13 12:15:59.330970)
2022-01-13 13:16:05.706 7fe7e2862700 -1 received  signal: Terminated from /usr/lib/systemd/systemd --switched-root --system --deserialize 22  (PID: 1) UID: 0
2022-01-13 13:16:05.706 7fe7e2862700 -1 mgr handle_signal *** Got signal Terminated ***
2022-01-13 13:16:05.868 7f28ccc7fe40  0 set uid:gid to 167:167 (ceph:ceph)
2022-01-13 13:16:05.868 7f28ccc7fe40  0 ceph version 14.2.22 (ca74598065096e6fcbd8433c8779a2be0c889351) nautilus (stable), process ceph-mgr, pid 1634471
2022-01-13 13:16:05.868 7f28ccc7fe40  0 pidfile_write: ignore empty --pid-file
2022-01-13 13:16:05.908 7f28ccc7fe40  1 mgr[py] Loading python module 'alerts'
2022-01-13 13:16:05.946 7f28ccc7fe40  1 mgr[py] Loading python module 'ansible'
2022-01-13 13:16:06.023 7f28ccc7fe40  1 mgr[py] Loading python module 'balancer'
2022-01-13 13:16:06.038 7f28ccc7fe40  1 mgr[py] Loading python module 'crash'
2022-01-13 13:16:06.063 7f28ccc7fe40  1 mgr[py] Loading python module 'dashboard'
2022-01-13 13:16:06.243 7f28ccc7fe40  1 mgr[py] Loading python module 'deepsea'
2022-01-13 13:16:06.319 7f28ccc7fe40  1 mgr[py] Loading python module 'devicehealth'
2022-01-13 13:16:06.336 7f28ccc7fe40  1 mgr[py] Loading python module 'influx'
2022-01-13 13:16:06.350 7f28ccc7fe40  1 mgr[py] Loading python module 'insights'
2022-01-13 13:16:06.364 7f28ccc7fe40  1 mgr[py] Loading python module 'iostat'
2022-01-13 13:16:06.378 7f28ccc7fe40  1 mgr[py] Loading python module 'localpool'
2022-01-13 13:16:06.391 7f28ccc7fe40  1 mgr[py] Loading python module 'orchestrator_cli'
2022-01-13 13:16:06.424 7f28ccc7fe40  1 mgr[py] Loading python module 'pg_autoscaler'
2022-01-13 13:16:06.468 7f28ccc7fe40  1 mgr[py] Loading python module 'progress'
2022-01-13 13:16:06.499 7f28ccc7fe40  1 mgr[py] Loading python module 'prometheus'
2022-01-13 13:16:06.598 7f28ccc7fe40  1 mgr[py] Loading python module 'rbd_support'
2022-01-13 13:16:06.634 7f28ccc7fe40  1 mgr[py] Loading python module 'restful'
2022-01-13 13:16:06.780 7f28ccc7fe40  1 mgr[py] Loading python module 'selftest'
2022-01-13 13:16:06.794 7f28ccc7fe40  1 mgr[py] Loading python module 'status'
2022-01-13 13:16:06.820 7f28ccc7fe40  1 mgr[py] Loading python module 'telegraf'
2022-01-13 13:16:06.843 7f28ccc7fe40  1 mgr[py] Loading python module 'telemetry'
2022-01-13 13:16:06.980 7f28ccc7fe40  1 mgr[py] Loading python module 'test_orchestrator'
2022-01-13 13:16:07.022 7f28ccc7fe40  1 mgr[py] Loading python module 'volumes'
2022-01-13 13:16:07.073 7f28ccc7fe40  1 mgr[py] Loading python module 'zabbix'
2022-01-13 13:16:07.091 7f28b9201700  1 mgr load Constructed class from module: dashboard
2022-01-13 13:16:07.091 7f28b9201700  1 mgr load Constructed class from module: prometheus
2022-01-13 13:16:07.092 7f28b8a00700  0 ms_deliver_dispatch: unhandled message 0x55af1ac5e800 mon_map magic: 0 v1 from mon.0 v2:10.121.58.220:3300/0
2022-01-13 13:16:07.093 7f28b8a00700  0 client.0 ms_handle_reset on v2:10.121.58.222:6800/1141825
2022-01-13 13:31:08.099 7f28b8a00700  0 client.0 ms_handle_reset on v2:10.121.58.222:6800/1141825
2022-01-13 13:46:08.104 7f28b8a00700  0 client.0 ms_handle_reset on v2:10.121.58.222:6800/1141825
2022-01-13 14:01:08.113 7f28b8a00700  0 client.0 ms_handle_reset on v2:10.121.58.222:6800/1141825
2022-01-13 14:16:08.119 7f28b8a00700  0 client.0 ms_handle_reset on v2:10.121.58.222:6800/1141825
2022-01-13 14:31:08.125 7f28b8a00700  0 client.0 ms_handle_reset on v2:10.121.58.222:6800/1141825
2022-01-13 14:46:08.132 7f28b8a00700  0 client.0 ms_handle_reset on v2:10.121.58.222:6800/1141825
2022-01-13 15:01:08.136 7f28b8a00700  0 client.0 ms_handle_reset on v2:10.121.58.222:6800/1141825

Istvan Szabo
Senior Infrastructure Engineer
---------------------------------------------------
Agoda Services Co., Ltd.
e: istvan.szabo@xxxxxxxxx
---------------------------------------------------

-----Original Message-----
From: Peter Lieven <pl@xxxxxxx> 
Sent: Thursday, January 13, 2022 2:54 PM
To: Szabo, Istvan (Agoda) <Istvan.Szabo@xxxxxxxxx>; Ceph Users <ceph-users@xxxxxxx>
Subject: Re:  14.2.22 dashboard periodically dies and didn't failover

Email received from the internet. If in doubt, don't click any link nor open any attachment !
________________________________

Am 13.01.22 um 08:37 schrieb Szabo, Istvan (Agoda):
> Hi,
>
> I can see a lot of message regarding the rotating key, but not sure this is the root cause.
>
> 2022-01-13 03:21:57.156 7fe7e085e700 -1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 2022-01-13 02:21:57.156836)
> 2022-01-13 03:22:01.484 7fe7e2862700 -1 received  signal: Hangup from killall -q -1 ceph-mon ceph-mgr ceph-mds ceph-osd ceph-fuse radosgw rbd-mirror  (PID: 1572574) UID: 0
>
> I have 3 mon with 3 mgr and on al mgr the dashboard installed.
>
> When the mgr dies on the first node, it didn't failover to the other 2, only the service restart can solve the issue.
>
> Any idea?


We have seen a similar issue starting with 14.2.22. We have a slightly different situation. The mgr gets stuck and the cluster elects another mgr as primary, but

the original primary does not recover. The process is stuck. I have a (large) backtrace if someone is interested.

For us it seems that the prometheus exporter module is the cause. Do you have it enabled?


Peter



_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux