Hi, On a few clusters I've seen this happen randomly and I haven't been able to reproduce it nor trace back where it came from. Luminous clusters ranging from 12.2.1 to 12.2.4 have issues where MGRs go down with these messages in their logs: Mar 23 09:18:22 mon01 ceph-mgr[2324150]: 2018-03-23 09:18:22.451311 7fb9e8ac7700 -1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 2018-03-23 08:18:22.451287) The first things you check is time. But in all cases where I've seen this happen the time is in sync on all systems. Health of the clusters are HEALTH_OK and nothing is going on. As this happens randomly I have no idea on where to start debugging it nor do I have any clue of how this might happen. Starting the mgr afterwards resolves the issues. It keeps functioning fine and might go down again after 24 to 48 hours. The clusters where I've seen this happen were running CentOS 7 or Ubuntu 16.04. I can't pinpoint it to a specific distro or version. Searching I found some tracker issues with the same messages, but none of them were recent. - http://tracker.ceph.com/issues/17170 - http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-October/021707.html Any ideas on where to start debugging this? Wido -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html