On Fri, Apr 26, 2019 at 10:55 AM Jan Pekař - Imatic <jan.pekar@xxxxxxxxx> wrote: > > Hi, > > yesterday my cluster reported slow request for minutes and after restarting OSDs (reporting slow requests) it stuck with peering PGs. Whole > cluster was not responding and IO stopped. > > I also notice, that problem was with cephx - all OSDs were reporting the same (even the same number of secret_id) > > cephx: verify_authorizer could not get service secret for service osd secret_id=14086 > ...... conn(0x559e15a50000 :6800 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg: got bad authorizer > auth: could not find secret_id=14086 > > My questions are: > > Why happened that? > Can I prevent cluster from stopping to work (with cephx enabled)? > How quickly are keys rotating/expiring and can I check problems with that anyhow? > > I'm running NTP on nodes (and also ceph monitors), so time should not be the issue. I noticed, that some monitor nodes has no timezone set, > but I hope MONs are using UTC to distribute keys to clients. Or different timezone between MON and OSD can cause the problem? Hmm yeah, it's probably not using UTC. (Despite it being good practice, it's actually not an easy default to adhere to.) cephx requires synchronized clocks and probably the same timezone (though I can't swear to that.) > > I "fixed" the problem by restarting monitors. > > It happened for the second time during last 3 months, so I'm reporting it as issue, that can happen. > > I also noticed in all OSDs logs > > 2019-04-25 10:06:55.652239 7faf00096700 -1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before > 2019-04-25 09:06:55.652222) > > approximately 7 hours before problem occurred. I can see, that it related to the issue. But why 7 hours? Is there some timeout or grace > period of old keys usage before they are invalidated? 7 hours shouldn't be directly related. IIRC by default a new rotating key is issued every hour, it gives out the current and next key on request, and daemons accept keys within a half-hour offset of what they believe the current time to be. Something like that. -Greg > Thank you > > With regards > > Jan Pekar > > -- > ============ > Ing. Jan Pekař > jan.pekar@xxxxxxxxx > ---- > Imatic | Jagellonská 14 | Praha 3 | 130 00 > http://www.imatic.cz > ============ > -- > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com