Re: Octopus OSDs dropping out of cluster: _check_auth_rotating possible clock skew, rotating keys expired way too early

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

Today I found the same error messages on the logs:
-1 monclient: _check_auth_rotating possible clock skew, rotating keys
expired way too early

However, I found out after realising that Ceph was running without active
manager:
  cluster:
    health: HEALTH_WARN
            no active mgr

This pointed me to https://tracker.ceph.com/issues/39264

We are running Octopus version 15.2.5. In our case NTP was alright as well
but I did not find OSDs dropping out.

Restarting the containers got the mgr back up and running but it would be
great to know the root cause of the issue.

Best regards,
Sebastian

El mar., 9 jun. 2020 a las 23:47, Wido den Hollander (<wido@xxxxxxxx>)
escribió:

> Hi,
>
> On a recently deployed Octopus (15.2.2) cluster (240 OSDs) we are seeing
> OSDs randomly drop out of the cluster.
>
> Usually it's 2 to 4 OSDs spread out over different nodes. Each node has
> 16 OSDs and not all the failing OSDs are on the same node.
>
> The OSDs are marked as down and all they keep print in their logs:
>
> monclient: _check_auth_rotating possible clock skew, rotating keys
> expired way too early (before 2020-06-04T07:57:17.706529-0400)
>
> Looking at their status through the admin socket:
>
> {
>     "cluster_fsid": "68653193-9b84-478d-bc39-1a811dd50836",
>     "osd_fsid": "87231b5d-ae5f-4901-93c5-18034381e5ec",
>     "whoami": 206,
>     "state": "active",
>     "oldest_map": 73697,
>     "newest_map": 75795,
>     "num_pgs": 19
> }
>
> The message brought me to my own ticket I created 2 years ago:
> https://tracker.ceph.com/issues/23460
>
> The first thing I've checked is NTP/time. Double, triple check this. All
> the times are in sync on the cluster. Nothing wrong there.
>
> Again, it's not all the OSDs on a node failing. Just 1 or 2 dropping out.
>
> Restarting them brings them back right away and then within 24h some
> other OSDs will drop out.
>
> Has anybody seen this behavior with Octopus as well?
>
> Wido
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux