Re: unable to obtain rotating service keys

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Raymond,

I'm pinging this old thread because we hit the same issue last week.

Is it possible that when you upgraded to nautilus you ran `ceph osd
require-osd-release nautilus` but did not run `ceph mon enable-msgr2`
?

We were in that state (intentionally), and started getting the `unable
to obtain rotating service keys` after around half the osds were
restarted with require_osd_release=nautilus.
Those restarted osds bind on the v2 port, and they seemingly get
confused how to communicate with the mons.

As soon as we did `ceph mon enable-msgr2` to enable v2 on the mons the
osds could boot without issue.

I guess this is a heads up not to skip any step of the nautilus
upgrade, even though the docs make `ceph mon enable-msgr2` look
optional.

Cheers, Dan


-- Dan


On Tue, Jan 28, 2020 at 8:12 PM Raymond Clotfelter <ray@xxxxxxx> wrote:
>
> I have a server with 12 OSDs on it. Five of them are unable to start, and give the following error message in the their logs:
>
> 2020-01-28 13:00:41.760 7f61fb490c80  0 monclient: wait_auth_rotating timed out after 30
> 2020-01-28 13:00:41.760 7f61fb490c80 -1 osd.178 411005 unable to obtain rotating service keys; retrying
>
> These OSDs were up and running when they initially just died on me. I tried to restart them and they failed to come up. I rebooted the node and they did not recover. All 5 died within a few hours and were all 5 down by time I started poking them. I previously had this happen with 2 other OSDs, one each on 2 servers each with 12 OSDs. I ended up just purging and recreating those OSDs. I would really like to find a solution to fix this problem that does not involve purging the OSDs.
>
> I have tried stopping and starting all monitors and managers, one at a time, and all at the same time. Additionally, all servers in the cluster have been restarted over the past couple of days for various other reasons.
>
> I am on Ceph 14.2.6, Debian buster and am using the Debian packages. All of my servers are kept in the time sync via ntp, and this has been verified multiple times that everything remains in time sync.
>
> I have googled the error message and tried all of the solutions offered from that, but nothing makes any difference.
>
> I would appreciate any constructive advice.
>
> Thanks.
>
> -- ray
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux