Re: Upgrade from Luminous to Nautilus now one MDS with could not get service secret

Dan van der Ster <dan@xxxxxxxxxxxxxx> · Tue, 30 Mar 2021 17:17:08 +0200

Hi Robert,

We get a handful of verify_authorizer warnings on some of our clusters
too but they don't seem to pose any problems.
I've tried without success to debug this in the past -- IIRC I started
to suspect it was coming from old cephfs kernel clients but got
distracted and never reached the bottom of it.

Below in the PS is what it looks like on an osd with debug_auth=20 and
debug_ms=1 in case this sparks any ideas.

-- dan

2021-03-30 17:11:55.015 7f2a178a6700  1 --2-
[v2:128.142.xx:6816/465972,v1:128.142.xx:6817/465972] >>
conn(0x5608ef7c7000 0x5608d6be1c00 unknown :-1 s=BANNER_ACCEPTING
pgs=0 cs=0 l=0 rev1=0 rx=0 tx=0)._handle_peer_banner_payload
supported=0 required=0
2021-03-30 17:11:55.016 7f2a178a6700 20 AuthRegistry(0x7fff71e9bf58)
get_handler peer_type 8 method 2 cluster_methods [2] service_methods
[2] client_methods [2]
2021-03-30 17:11:55.016 7f2a178a6700 10 cephx: verify_authorizer
decrypted service osd secret_id=58900
2021-03-30 17:11:55.016 7f2a178a6700  0 auth: could not find secret_id=58900
2021-03-30 17:11:55.016 7f2a178a6700 10 auth: dump_rotating:
2021-03-30 17:11:55.016 7f2a178a6700 10 auth:  id 61926 AQxxx==
expires 2021-03-30 16:11:57.193945
2021-03-30 17:11:55.016 7f2a178a6700 10 auth:  id 61927 AQyyy==
expires 2021-03-30 17:11:58.331600
2021-03-30 17:11:55.016 7f2a178a6700 10 auth:  id 61928 AQzzz==
expires 2021-03-30 18:11:59.341208
2021-03-30 17:11:55.016 7f2a178a6700  0 cephx: verify_authorizer could
not get service secret for service osd secret_id=58900
2021-03-30 17:11:55.016 7f2a178a6700  1 --2-
[v2:128.142.xx:6816/465972,v1:128.142.xx:6817/465972] >>
conn(0x5608ef7c7000 0x5608d6be1c00 crc :-1 s=AUTH_ACCEPTING pgs=0 cs=0
l=1 rev1=0 rx=0 tx=0)._auth_bad_method auth_method 2 r (13) Permission
denied, allowed_methods [2], allowed_modes [1,2]

On Sun, Mar 28, 2021 at 8:17 PM Robert LeBlanc <robert@xxxxxxxxxxxxx> wrote:
>
> We just upgraded our cluster from Lumious to Nautilus and after a few
> days one of our MDS servers is getting:
>
> 2021-03-28 18:06:32.304 7f57c37ff700  5 mds.beacon.sun-gcs01-mds02
> Sending beacon up:standby seq 16
> 2021-03-28 18:06:32.304 7f57c37ff700 20 mds.beacon.sun-gcs01-mds02
> sender thread waiting interval 4s
> 2021-03-28 18:06:32.308 7f57c8809700  5 mds.beacon.sun-gcs01-mds02
> received beacon reply up:standby seq 16 rtt 0.00400001
> 2021-03-28 18:06:36.308 7f57c37ff700  5 mds.beacon.sun-gcs01-mds02
> Sending beacon up:standby seq 17
> 2021-03-28 18:06:36.308 7f57c37ff700 20 mds.beacon.sun-gcs01-mds02
> sender thread waiting interval 4s
> 2021-03-28 18:06:36.308 7f57c8809700  5 mds.beacon.sun-gcs01-mds02
> received beacon reply up:standby seq 17 rtt 0
> 2021-03-28 18:06:37.788 7f57c900a700  0 auth: could not find secret_id=34586
> 2021-03-28 18:06:37.788 7f57c900a700  0 cephx: verify_authorizer could
> not get service secret for service mds secret_id=34586
> 2021-03-28 18:06:37.788 7f57c6004700  5 mds.sun-gcs01-mds02
> ms_handle_reset on v2:10.65.101.13:46566/0
> 2021-03-28 18:06:40.308 7f57c37ff700  5 mds.beacon.sun-gcs01-mds02
> Sending beacon up:standby seq 18
> 2021-03-28 18:06:40.308 7f57c37ff700 20 mds.beacon.sun-gcs01-mds02
> sender thread waiting interval 4s
> 2021-03-28 18:06:40.308 7f57c8809700  5 mds.beacon.sun-gcs01-mds02
> received beacon reply up:standby seq 18 rtt 0
> 2021-03-28 18:06:44.304 7f57c37ff700  5 mds.beacon.sun-gcs01-mds02
> Sending beacon up:standby seq 19
> 2021-03-28 18:06:44.304 7f57c37ff700 20 mds.beacon.sun-gcs01-mds02
> sender thread waiting interval 4s
>
> I've tried removing the /var/lib/ceph/mds/ directory and getting the
> key again. I've removed the key and generated a new one, I've checked
> the clocks between all the nodes. From what I can tell, everything is
> good.
>
> We did have an issue where the monitor cluster fell over and would not
> boot. We reduced the monitors to a single monitor, disabled cephx,
> pulled it off the network and restarted the service a few times which
> allowed it to come up. We then expanded back to three mons and
> reenabled cephx and everything has been good until this. No other
> services seem to be suffering from this and it even appears that the
> MDS works okay even with these messages. We would like to figure out
> how to resolve this.
>
> Thank you,
> Robert LeBlanc
>
> ----------------
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx