Thanks for the tip. I’ve just been using ‘docker exec -it <container id> /bin/bash’ to get into the containers, but those commands sound useful. I think I’ll install cephadm on all nodes just for this. Thanks again, -Paul > On Sep 8, 2021, at 10:11 AM, Eugen Block <eblock@xxxxxx> wrote: > > Okay, I'm glad it worked! > > >> At first I tried cephadm rm-daemon on the bootstrap node that I usually do all management from and it indicated that it could not remove the daemon: >> >> [root@cxcto-c240-j27-01 ~]# cephadm rm-daemon --name iscsi.cxcto-c240-j27-04.lgqtxo --fsid 4a29e724-c4a6-11eb-b14a-5c838f8013a5 >> ERROR: Daemon not found: iscsi.cxcto-c240-j27-04.lgqtxo. See `cephadm ls` >> >> When I would do ‘cephadm ls’ I only saw services running locally on that server, not the whole cluster. I’m not sure if this is expected or not. > > As far as I can tell this is expected, yes. I have only a lab environment with containers (we're still hesitating to upgrade to Octopus) but all virtual nodes have cephadm installed, I thought that was a requirement, I may be wrong though. But it definitely helps you to debug, for example with 'cephadm enter --name <daemon>' you get a shell for that container or 'cephadm logs --name <daemon>' you can inspect specific logs. > > > Zitat von "Paul Giralt (pgiralt)" <pgiralt@xxxxxxxxx>: > >> Thanks Eugen. >> >> At first I tried cephadm rm-daemon on the bootstrap node that I usually do all management from and it indicated that it could not remove the daemon: >> >> [root@cxcto-c240-j27-01 ~]# cephadm rm-daemon --name iscsi.cxcto-c240-j27-04.lgqtxo --fsid 4a29e724-c4a6-11eb-b14a-5c838f8013a5 >> ERROR: Daemon not found: iscsi.cxcto-c240-j27-04.lgqtxo. See `cephadm ls` >> >> When I would do ‘cephadm ls’ I only saw services running locally on that server, not the whole cluster. I’m not sure if this is expected or not. I installed cephadm on the cxcto-c240-j27-04 server and issued the command and it worked. It looks like when I did this, suddenly the containers on the other two servers that were not supposed to be running the iscsi gateway were removed and everything appeared to be back to normal. I then added back one server to the yaml file and applied it on the original bootstrap node and it got deployed properly, so it appears that everything is working again. Somehow deleting that daemon on the 04 server got everything working again. >> >> Still not exactly sure why that fixed it, but at least it’s working again. Thanks for the suggestion. >> >> -Paul >> >> >>> On Sep 8, 2021, at 4:12 AM, Eugen Block <eblock@xxxxxx> wrote: >>> >>> If you only configured 1 iscsi gw but you see 3 running, have you tried to destroy them with 'cephadm rm-daemon --name ...'? On the active MGR host run 'journalctl -f' and you'll see plenty of information, it should also contain information about the iscsi deployment. Or run 'cephadm logs --name <iscsi-gw>'. >>> >>> >>> Zitat von "Paul Giralt (pgiralt)" <pgiralt@xxxxxxxxx>: >>> >>>> This was working until recently and now seems to have stopped working. Running Pacific 16.2.5. When I modify the deployment YAML file for my iscsi gateways, the services are not being added or removed as requested. It’s as if the state is “stuck”. >>>> >>>> At one point I had 4 iSCSI gateways: 02, 03, 04 and 05. Through some back and forth of deploying and undeploying, I ended up in a state where the services are running on servers 02, 03, and 05 no matter what I tell cephadm to do. For example, right now I have the following configuration: >>>> >>>> service_type: iscsi >>>> service_id: iscsi >>>> placement: >>>> hosts: >>>> - cxcto-c240-j27-03.cisco.com >>>> spec: >>>> pool: iscsi-config >>>> … removed the rest of this file …. >>>> >>>> However ceph orch ls shows this: >>>> >>>> [root@cxcto-c240-j27-01 ~]# ceph orch ls >>>> NAME PORTS RUNNING REFRESHED AGE PLACEMENT >>>> alertmanager ?:9093,9094 1/1 9m ago 3M count:1 >>>> crash 15/15 10m ago 3M * >>>> grafana ?:3000 1/1 9m ago 3M count:1 >>>> iscsi.iscsi 3/1 10m ago 11m cxcto-c240-j27-03.cisco.com >>>> mgr 2/2 9m ago 3M count:2 >>>> mon 5/5 9m ago 12d cxcto-c240-j27-01.cisco.com;cxcto-c240-j27-06.cisco.com;cxcto-c240-j27-08.cisco.com;cxcto-c240-j27-10.cisco.com;cxcto-c240-j27-12.cisco.com >>>> node-exporter ?:9100 15/15 10m ago 3M * >>>> osd.dashboard-admin-1622750977792 0/15 - 3M * >>>> osd.dashboard-admin-1622751032319 326/341 10m ago 3M * >>>> prometheus ?:9095 1/1 9m ago 3M count:1 >>>> >>>> Notice it shows 3/1 because the service is still running on 3 servers even though I’ve told it to only run on one. If I configure all 4 servers and apply (ceph orch apply) then I end up with 3/4 because server 04 never deploys. It’s as if something is “stuck”. >>>> >>>> Any ideas where to look / log files that might help figure out what’s happening? >>>> >>>> -Paul >>>> >>>> _______________________________________________ >>>> ceph-users mailing list -- ceph-users@xxxxxxx >>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx >>> >>> >>> >>> _______________________________________________ >>> ceph-users mailing list -- ceph-users@xxxxxxx >>> To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx