Okay, I'm glad it worked!
At first I tried cephadm rm-daemon on the bootstrap node that I
usually do all management from and it indicated that it could not
remove the daemon:
[root@cxcto-c240-j27-01 ~]# cephadm rm-daemon --name
iscsi.cxcto-c240-j27-04.lgqtxo --fsid
4a29e724-c4a6-11eb-b14a-5c838f8013a5
ERROR: Daemon not found: iscsi.cxcto-c240-j27-04.lgqtxo. See `cephadm ls`
When I would do ‘cephadm ls’ I only saw services running locally on
that server, not the whole cluster. I’m not sure if this is expected
or not.
As far as I can tell this is expected, yes. I have only a lab
environment with containers (we're still hesitating to upgrade to
Octopus) but all virtual nodes have cephadm installed, I thought that
was a requirement, I may be wrong though. But it definitely helps you
to debug, for example with 'cephadm enter --name <daemon>' you get a
shell for that container or 'cephadm logs --name <daemon>' you can
inspect specific logs.
Zitat von "Paul Giralt (pgiralt)" <pgiralt@xxxxxxxxx>:
Thanks Eugen.
At first I tried cephadm rm-daemon on the bootstrap node that I
usually do all management from and it indicated that it could not
remove the daemon:
[root@cxcto-c240-j27-01 ~]# cephadm rm-daemon --name
iscsi.cxcto-c240-j27-04.lgqtxo --fsid
4a29e724-c4a6-11eb-b14a-5c838f8013a5
ERROR: Daemon not found: iscsi.cxcto-c240-j27-04.lgqtxo. See `cephadm ls`
When I would do ‘cephadm ls’ I only saw services running locally on
that server, not the whole cluster. I’m not sure if this is expected
or not. I installed cephadm on the cxcto-c240-j27-04 server and
issued the command and it worked. It looks like when I did this,
suddenly the containers on the other two servers that were not
supposed to be running the iscsi gateway were removed and everything
appeared to be back to normal. I then added back one server to the
yaml file and applied it on the original bootstrap node and it got
deployed properly, so it appears that everything is working again.
Somehow deleting that daemon on the 04 server got everything working
again.
Still not exactly sure why that fixed it, but at least it’s working
again. Thanks for the suggestion.
-Paul
On Sep 8, 2021, at 4:12 AM, Eugen Block <eblock@xxxxxx> wrote:
If you only configured 1 iscsi gw but you see 3 running, have you
tried to destroy them with 'cephadm rm-daemon --name ...'? On the
active MGR host run 'journalctl -f' and you'll see plenty of
information, it should also contain information about the iscsi
deployment. Or run 'cephadm logs --name <iscsi-gw>'.
Zitat von "Paul Giralt (pgiralt)" <pgiralt@xxxxxxxxx>:
This was working until recently and now seems to have stopped
working. Running Pacific 16.2.5. When I modify the deployment YAML
file for my iscsi gateways, the services are not being added or
removed as requested. It’s as if the state is “stuck”.
At one point I had 4 iSCSI gateways: 02, 03, 04 and 05. Through
some back and forth of deploying and undeploying, I ended up in a
state where the services are running on servers 02, 03, and 05 no
matter what I tell cephadm to do. For example, right now I have
the following configuration:
service_type: iscsi
service_id: iscsi
placement:
hosts:
- cxcto-c240-j27-03.cisco.com
spec:
pool: iscsi-config
… removed the rest of this file ….
However ceph orch ls shows this:
[root@cxcto-c240-j27-01 ~]# ceph orch ls
NAME PORTS RUNNING REFRESHED
AGE PLACEMENT
alertmanager ?:9093,9094 1/1 9m ago
3M count:1
crash 15/15 10m ago 3M *
grafana ?:3000 1/1 9m ago
3M count:1
iscsi.iscsi 3/1 10m ago
11m cxcto-c240-j27-03.cisco.com
mgr 2/2 9m ago
3M count:2
mon 5/5 9m ago
12d
cxcto-c240-j27-01.cisco.com;cxcto-c240-j27-06.cisco.com;cxcto-c240-j27-08.cisco.com;cxcto-c240-j27-10.cisco.com;cxcto-c240-j27-12.cisco.com
node-exporter ?:9100 15/15 10m ago 3M *
osd.dashboard-admin-1622750977792 0/15 - 3M *
osd.dashboard-admin-1622751032319 326/341 10m ago 3M *
prometheus ?:9095 1/1 9m ago
3M count:1
Notice it shows 3/1 because the service is still running on 3
servers even though I’ve told it to only run on one. If I
configure all 4 servers and apply (ceph orch apply) then I end up
with 3/4 because server 04 never deploys. It’s as if something is
“stuck”.
Any ideas where to look / log files that might help figure out
what’s happening?
-Paul
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx