15.2.17: RGW deploy through cephadm exits immediately with exit code 5/NOTINSTALLED

Michel Jouvin <jouvin@xxxxxxxxxxxx> · Wed, 28 Sep 2022 19:39:43 +0200

Hi,

We have a cephadm-based Octopus (upgraded to 15.2.17 today but the 
problem started with 15.2.16) cluster where we try to deploy a RGW in 
multisite configuration. We followed the documentation at 
https://docs.ceph.com/en/octopus/radosgw/multisite/ to do the basic 
realm, zonegroup, zone and pool configuration. We then deployed a RGW 
with "ceph orch apply rgw...". The Ceph image is loaded (the first 
time), the container starts and immediately exits with the status code 
5/NOTINSTALLED. I was unabled to find any error message in the logs that 
could be the cause of the problem. I reinstalled the machine hosting the 
RGW (running yesterday's build of CentOS Stream 8 but the problem 
started with a build from early August) and removed the pools including 
.rgw.root, recreated everything and the problem remains the same. A 
start sequence in /var/log/messages is:

-----

Sep 28 18:55:49 valvd-rgw1 systemd[1]: Starting Ceph 
rgw.eros.eros.valvd-rgw1.eaafgz for cce5ffb0-9124-40e5-a55c-3e5cc8660d47...
Sep 28 18:55:50 valvd-rgw1 systemd[1]: 
var-lib-containers-storage-overlay.mount: Succeeded.
Sep 28 18:55:50 valvd-rgw1 systemd[1]: 
var-lib-containers-storage-overlay.mount: Succeeded.
Sep 28 18:55:50 valvd-rgw1 systemd[1]: Started libcontainer container 
86707b0c2658f21a09f229a2e049c922dd697475768d6fa3a31bfc223a1eda48.
Sep 28 18:55:50 valvd-rgw1 bash[54606]: 
86707b0c2658f21a09f229a2e049c922dd697475768d6fa3a31bfc223a1eda48
Sep 28 18:55:50 valvd-rgw1 systemd[1]: Started Ceph 
rgw.eros.eros.valvd-rgw1.eaafgz for cce5ffb0-9124-40e5-a55c-3e5cc8660d47.
Sep 28 18:55:51 valvd-rgw1 systemd[1]: 
libpod-86707b0c2658f21a09f229a2e049c922dd697475768d6fa3a31bfc223a1eda48.scope: 
Succeeded.
Sep 28 18:55:51 valvd-rgw1 systemd[1]: 
libpod-86707b0c2658f21a09f229a2e049c922dd697475768d6fa3a31bfc223a1eda48.scope: 
Consumed 297ms CPU time
Sep 28 18:55:51 valvd-rgw1 systemd[1]: 
var-lib-containers-storage-overlay-d323c9431c7f488696f7e467397a94192a022b2b13155fb6a34f80236330dff3-merged.mount: 
Succeeded.
Sep 28 18:55:51 valvd-rgw1 systemd[1]: 
var-lib-containers-storage-overlay.mount: Succeeded.
Sep 28 18:55:51 valvd-rgw1 systemd[1]: 
ceph-cce5ffb0-9124-40e5-a55c-3e5cc8660d47@xxxxxxxxxxxxx.valvd-rgw1.eaafgz.service: 
Main process exited, code=exited, status=5/NOTINSTALLED
Sep 28 18:55:51 valvd-rgw1 systemd[1]: 
ceph-cce5ffb0-9124-40e5-a55c-3e5cc8660d47@xxxxxxxxxxxxx.valvd-rgw1.eaafgz.service: 
Failed with result 'exit-code'.
----

I googled and found a couple of similar issues reported to the list, in 
particular

- https://www.mail-archive.com/ceph-users@xxxxxxx/msg09680.html but it 
is with Pacific and not a cephadm-based cluster so may be something 
different and the workaround doesn't apply as the config option doesn't 
exist in Octopus

- 
https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/UDUDIHZN5NKTUFQQV7OK2FYYNFVTL2XS/ 
which is with Octopus but was related to a configuration with multiple 
realms when I have only one defined as the default one (called eros)

Both mention that they found the error "Couldn't init storage provider 
(RADOS", something I have not seen. I may have missed it as I don't know 
exactly in which log file I should find it.

I certainly did a trivial mistake but I'm stuck with this problem for 
quite some time, without any clue about where is the issue. Thanks in 
advance for your help.

Cheers,

Michel

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx