Hi, I have the same situation with some OSD on Octopus 15.2.5 (Ubuntu 20,04). But, I have no problem with MGR. Any clue about this? Best regards, Date: Tue, 9 Jun 2020 23:47:24 +0200 > From: Wido den Hollander <wido@xxxxxxxx> > Subject: Octopus OSDs dropping out of cluster: > _check_auth_rotating possible clock skew, rotating keys expired way > too early > To: "ceph-users@xxxxxxx" <ceph-users@xxxxxxx> > Message-ID: <be7aadc4-2142-ea31-caa8-28ca6db03d15@xxxxxxxx> > Content-Type: text/plain; charset=utf-8 > > Hi, > > On a recently deployed Octopus (15.2.2) cluster (240 OSDs) we are seeing > OSDs randomly drop out of the cluster. > > Usually it's 2 to 4 OSDs spread out over different nodes. Each node has > 16 OSDs and not all the failing OSDs are on the same node. > > The OSDs are marked as down and all they keep print in their logs: > > monclient: _check_auth_rotating possible clock skew, rotating keys > expired way too early (before 2020-06-04T07:57:17.706529-0400) > > Looking at their status through the admin socket: > > { > "cluster_fsid": "68653193-9b84-478d-bc39-1a811dd50836", > "osd_fsid": "87231b5d-ae5f-4901-93c5-18034381e5ec", > "whoami": 206, > "state": "active", > "oldest_map": 73697, > "newest_map": 75795, > "num_pgs": 19 > } > > The message brought me to my own ticket I created 2 years ago: > https://tracker.ceph.com/issues/23460 > > The first thing I've checked is NTP/time. Double, triple check this. All > the times are in sync on the cluster. Nothing wrong there. > > Again, it's not all the OSDs on a node failing. Just 1 or 2 dropping out. > > Restarting them brings them back right away and then within 24h some > other OSDs will drop out. > > Has anybody seen this behavior with Octopus as well? > > Wido > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx