Rotating Service Keys

William Law <wlaw@xxxxxxxxxxxx> · Fri, 29 Jan 2021 15:58:26 +0000

Hi -

We keep on getting errors like these on specific OSDs with Nautilus (14.2.16):
2021-01-29 06:14:19.174 7fbeaab92c00 -1 osd.8 12568359 unable to obtain rotating service keys; retrying
2021-01-29 06:14:49.173 7fbeaab92c00  0 monclient: wait_auth_rotating timed out after 30
2021-01-29 06:14:49.173 7fbeaab92c00 -1 osd.8 12568359 unable to obtain rotating service keys; retrying
2021-01-29 06:15:19.173 7fbeaab92c00  0 monclient: wait_auth_rotating timed out after 30
2021-01-29 06:15:19.173 7fbeaab92c00 -1 osd.8 12568359 unable to obtain rotating service keys; retrying
2021-01-29 06:15:49.174 7fbeaab92c00  0 monclient: wait_auth_rotating timed out after 30
2021-01-29 06:15:49.174 7fbeaab92c00 -1 osd.8 12568359 unable to obtain rotating service keys; retrying
2021-01-29 06:15:49.174 7fbeaab92c00 -1 osd.8 12568359 init wait_auth_rotating timed out

>From googling it seems like it could be a variety of things.  We do think time is in sync.  It is particularly perplexing as we'll have a single OSD get this error while all other OSDs on the same node are fine.

It seems exactly like this:
https://tracker.ceph.com/issues/17170

Stopping the managers and restarting the mons fixes it temporarily.

>From this old thread we do have msgr2 enabled:
https://www.spinics.net/lists/ceph-users/msg60631.html

This blog seems to point to storage slowness being the root cause in there env:
http://www.florentflament.com/blog/ceph-monitor-status-switching-due-to-slow-ios.html

Any advice for sorting out what is causing this?

Thanks,

Will

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx