17.2.6 Dashboard/RGW Signature Mismatch

Chris Palmer <chris.palmer@xxxxxxxxx> · Thu, 13 Apr 2023 17:20:27 +0100

Hi

I have 3 Ceph clusters, all configured similarly, which have been happy 
for some months on 17.2.5:

1. A test cluster
2. A small production cluster
3. A larger production cluster

All are debian 11 built from packages - no cephadm.

I upgraded (1) to 17.2.6 without any problems at all. In particular the 
Object Gateway sections of the dashboard work as usual.

I then upgraded (2). Nothing seemed amiss, and everything seems to work 
except... when I try to access the Object Gateway sections of the 
dashboard I always get:

     *The Object Gateway Service is not configured*

       Error connecting to Object Gateway: RGW REST API failed request
       with status code 403
       (b'{"Code":"SignatureDoesNotMatch","RequestId":"tx0000022ba920e82ac4a9c-0064381'
       b'934-10e73385-default","HostId":"10e73385-default-default"}')

(Just the RequestId changes each time). Before the upgrade it worked 
just fine.

Other info:

 * RGW requests using awscli and rclone all work with normal RGW
   accounts. It just seems to be the dashboard that's died.
 * Just the one zonegroup, no multisite/replication
 * "radosgw-admin user info --uid=rgwadmin" gives the correct output
   with the right access_key & secret_key. The other fields are as in (1).
 * "ceph dashboard get-rgw-api-access-key/get-rgw-api-secret-key" both
   give the right values.

The rgw logs from (2) which fails show:

2023-04-13T16:36:28.720+0100 7fcc7966a700  1 ====== starting new request req=0x7fcd88c10720 =====
2023-04-13T16:36:28.720+0100 7fcc80e79700  1 req 8090309398268968541 0.000000000s op->ERRORHANDLER: err_no=-2027 new_err_no=-2027
2023-04-13T16:36:28.724+0100 7fcc80e79700  1 ====== req done req=0x7fcd88c10720 op status=0 http_status=403 latency=0.003999980s ======
2023-04-13T16:36:28.724+0100 7fcc80e79700  1 beast: 0x7fcd88c10720: 192.168.xx.xx - - [13/Apr/2023:16:36:28.720 +0100] "GET /admin/metadata/user?myself HTTP/1.1" 403 134 - "python-requests/2.25.1" - latency=0.003999980s

(Note this does not have rgwadmin as the user, and is always the same URL)

Whereas the rgw logs from (1) which works show things like:

2023-04-13T15:44:19.396+0000 7f8478da1700  1 ====== starting new request req=0x7f86284f5720 =====
2023-04-13T15:44:19.412+0000 7f8478da1700  1 ====== req done req=0x7f86284f5720 op status=0 http_status=200 latency=0.016000060s ======
2023-04-13T15:44:19.412+0000 7f8478da1700  1 beast: 0x7f86284f5720: 10.xx.xx.xx - rgwadmin [13/Apr/2023:15:44:19.396 +0000] "GET /admin/realm?list HTTP/1.1" 200 31 - "python-requests/2.25.1" - latency=0.016000060s

(Note this has rgwadmin as the user, and various URLs)

The only thing I can see in the release notes that looks even vaguely 
related is https://github.com/ceph/ceph/pull/47547, but it doesn't seem 
likely.

I am really stumped on this, with no idea what has gone wrong on (2), 
and what the difference is between (1) and (2). I'm not going to touch 
(3) until I have resolved this.

Grateful for any help...

And thanks for all the good work.

Regards, Chris

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx