Re: PG stuck peering - OSD cephx: verify_authorizer key problem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Apr 26, 2019 at 10:55 AM Jan Pekař - Imatic <jan.pekar@xxxxxxxxx> wrote:
>
> Hi,
>
> yesterday my cluster reported slow request for minutes and after restarting OSDs (reporting slow requests) it stuck with peering PGs. Whole
> cluster was not responding and IO stopped.
>
> I also notice, that problem was with cephx - all OSDs were reporting the same (even the same number of secret_id)
>
> cephx: verify_authorizer could not get service secret for service osd secret_id=14086
> ...... conn(0x559e15a50000 :6800 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg: got bad authorizer
> auth: could not find secret_id=14086
>
> My questions are:
>
> Why happened that?
> Can I prevent cluster from stopping to work (with cephx enabled)?
> How quickly are keys rotating/expiring and can I check problems with that anyhow?
>
> I'm running NTP on nodes (and also ceph monitors), so time should not be the issue. I noticed, that some monitor nodes has no timezone set,
> but I hope MONs are using UTC to distribute keys to clients. Or different timezone between MON and OSD can cause the problem?

Hmm yeah, it's probably not using UTC. (Despite it being good
practice, it's actually not an easy default to adhere to.) cephx
requires synchronized clocks and probably the same timezone (though I
can't swear to that.)

>
> I "fixed" the problem by restarting monitors.
>
> It happened for the second time during last 3 months, so I'm reporting it as issue, that can happen.
>
> I also noticed in all OSDs logs
>
> 2019-04-25 10:06:55.652239 7faf00096700 -1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before
> 2019-04-25 09:06:55.652222)
>
> approximately 7 hours before problem occurred. I can see, that it related to the issue. But why 7 hours? Is there some timeout or grace
> period of old keys usage before they are invalidated?

7 hours shouldn't be directly related. IIRC by default a new rotating
key is issued every hour, it gives out the current and next key on
request, and daemons accept keys within a half-hour offset of what
they believe the current time to be. Something like that.
-Greg

> Thank you
>
> With regards
>
> Jan Pekar
>
> --
> ============
> Ing. Jan Pekař
> jan.pekar@xxxxxxxxx
> ----
> Imatic | Jagellonská 14 | Praha 3 | 130 00
> http://www.imatic.cz
> ============
> --
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux