Hi
I have 3 Ceph clusters, all configured similarly, which have been happy
for some months on 17.2.5:
1. A test cluster
2. A small production cluster
3. A larger production cluster
All are debian 11 built from packages - no cephadm.
I upgraded (1) to 17.2.6 without any problems at all. In particular the
Object Gateway sections of the dashboard work as usual.
I then upgraded (2). Nothing seemed amiss, and everything seems to work
except... when I try to access the Object Gateway sections of the
dashboard I always get:
*The Object Gateway Service is not configured*
Error connecting to Object Gateway: RGW REST API failed request
with status code 403
(b'{"Code":"SignatureDoesNotMatch","RequestId":"tx0000022ba920e82ac4a9c-0064381'
b'934-10e73385-default","HostId":"10e73385-default-default"}')
(Just the RequestId changes each time). Before the upgrade it worked
just fine.
Other info:
* RGW requests using awscli and rclone all work with normal RGW
accounts. It just seems to be the dashboard that's died.
* Just the one zonegroup, no multisite/replication
* "radosgw-admin user info --uid=rgwadmin" gives the correct output
with the right access_key & secret_key. The other fields are as in (1).
* "ceph dashboard get-rgw-api-access-key/get-rgw-api-secret-key" both
give the right values.
The rgw logs from (2) which fails show:
2023-04-13T16:36:28.720+0100 7fcc7966a700 1 ====== starting new request req=0x7fcd88c10720 =====
2023-04-13T16:36:28.720+0100 7fcc80e79700 1 req 8090309398268968541 0.000000000s op->ERRORHANDLER: err_no=-2027 new_err_no=-2027
2023-04-13T16:36:28.724+0100 7fcc80e79700 1 ====== req done req=0x7fcd88c10720 op status=0 http_status=403 latency=0.003999980s ======
2023-04-13T16:36:28.724+0100 7fcc80e79700 1 beast: 0x7fcd88c10720: 192.168.xx.xx - - [13/Apr/2023:16:36:28.720 +0100] "GET /admin/metadata/user?myself HTTP/1.1" 403 134 - "python-requests/2.25.1" - latency=0.003999980s
(Note this does not have rgwadmin as the user, and is always the same URL)
Whereas the rgw logs from (1) which works show things like:
2023-04-13T15:44:19.396+0000 7f8478da1700 1 ====== starting new request req=0x7f86284f5720 =====
2023-04-13T15:44:19.412+0000 7f8478da1700 1 ====== req done req=0x7f86284f5720 op status=0 http_status=200 latency=0.016000060s ======
2023-04-13T15:44:19.412+0000 7f8478da1700 1 beast: 0x7f86284f5720: 10.xx.xx.xx - rgwadmin [13/Apr/2023:15:44:19.396 +0000] "GET /admin/realm?list HTTP/1.1" 200 31 - "python-requests/2.25.1" - latency=0.016000060s
(Note this has rgwadmin as the user, and various URLs)
The only thing I can see in the release notes that looks even vaguely
related is https://github.com/ceph/ceph/pull/47547, but it doesn't seem
likely.
I am really stumped on this, with no idea what has gone wrong on (2),
and what the difference is between (1) and (2). I'm not going to touch
(3) until I have resolved this.
Grateful for any help...
And thanks for all the good work.
Regards, Chris
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx