Redeploy iSCSI Gateway fail - 167 returned from docker run

"Paul Giralt (pgiralt)" <pgiralt@xxxxxxxxx> · Wed, 2 Jun 2021 00:05:26 +0000

CEPH 16.2.4. I was having an issue where I put a server into maintenance mode and after doing so, the containers for the iSCSI gateway were not running, so I decided to do a redeploy of the service. This caused all the servers running iSCSI to get in a state where it looks like ceph orch was trying to delete the container, but it was stuck. My only recourse was to reboot the servers. I ended up doing a ‘ceph orch rm iscsi.iscsi’ to just remove the services and then tried to redeploy. When I do this, I’m seeing the following in the cephadm logs on the servers where the iscsi gateway is being deployed: 

2021-06-01 19:48:15,110 INFO Deploy daemon iscsi.iscsi.cxcto-c240-j27-02.zeypah ...
2021-06-01 19:48:15,111 DEBUG Running command: /bin/docker run --rm --ipc=host --net=host --entrypoint stat --init -e CONTAINER_IMAGE=docker.io/ceph/ceph@sha256:54e95ae1e11404157d7b329d0bef866ebbb214b195a009e87aae4eba9d282949 -e NODE_NAME=cxcto-c240-j27-02.cisco.com -e CEPH_USE_RANDOM_NONCE=1 docker.io/ceph/ceph@sha256:54e95ae1e11404157d7b329d0bef866ebbb214b195a009e87aae4eba9d282949 -c %u %g /var/lib/ceph
2021-06-01 19:48:15,529 DEBUG stat: 167 167

Later in the logs I see: 

2021-06-01 19:48:25,933 DEBUG Running command: /bin/docker inspect --format {{.Id}},{{.Config.Image}},{{.Image}},{{.Created}},{{index .Config.Labels "io.ceph.version"}} ceph-a67d529e-ba7f-11eb-940b-5c838f8013a5-iscsi.iscsi.cxcto-c240-j27-02.zeypah
2021-06-01 19:48:25,984 DEBUG /bin/docker:
2021-06-01 19:48:25,984 DEBUG /bin/docker: Error: No such object: ceph-a67d529e-ba7f-11eb-940b-5c838f8013a5-iscsi.iscsi.cxcto-c240-j27-02.zeypah

Obviously no such object because the container creation failed. 

If I try to run that command that is in the logs manually, I get: 

[root@cxcto-c240-j27-02 ceph]# /bin/docker run --rm --ipc=host --net=host --entrypoint stat --init -e CONTAINER_IMAGE=docker.io/ceph/ceph@sha256:54e95ae1e11404157d7b329d0bef866ebbb214b195a009e87aae4eba9d282949 -e NODE_NAME=cxcto-c240-j27-02.cisco.com -e CEPH_USE_RANDOM_NONCE=1 docker.io/ceph/ceph@sha256:54e95ae1e11404157d7b329d0bef866ebbb214b195a009e87aae4eba9d282949 -c %u %g /var/lib/ceph
stat: cannot stat '%g': No such file or directory
167

So the 167 seems to line up with what’s showing up in the script. I’m not clear on what the deal is with the %g. What is supposed to be in that placeholder? Any thoughts on why this is failing? 

Right now all my iSCSI gateways are down and basically my whole environment is down as a result 🙁 

-Paul

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx