Re: Cephadm not properly adding / removing iscsi services anymore

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Okay, I'm glad it worked!


At first I tried cephadm rm-daemon on the bootstrap node that I usually do all management from and it indicated that it could not remove the daemon:

[root@cxcto-c240-j27-01 ~]# cephadm rm-daemon --name iscsi.cxcto-c240-j27-04.lgqtxo --fsid 4a29e724-c4a6-11eb-b14a-5c838f8013a5
ERROR: Daemon not found: iscsi.cxcto-c240-j27-04.lgqtxo. See `cephadm ls`

When I would do ‘cephadm ls’ I only saw services running locally on that server, not the whole cluster. I’m not sure if this is expected or not.

As far as I can tell this is expected, yes. I have only a lab environment with containers (we're still hesitating to upgrade to Octopus) but all virtual nodes have cephadm installed, I thought that was a requirement, I may be wrong though. But it definitely helps you to debug, for example with 'cephadm enter --name <daemon>' you get a shell for that container or 'cephadm logs --name <daemon>' you can inspect specific logs.


Zitat von "Paul Giralt (pgiralt)" <pgiralt@xxxxxxxxx>:

Thanks Eugen.

At first I tried cephadm rm-daemon on the bootstrap node that I usually do all management from and it indicated that it could not remove the daemon:

[root@cxcto-c240-j27-01 ~]# cephadm rm-daemon --name iscsi.cxcto-c240-j27-04.lgqtxo --fsid 4a29e724-c4a6-11eb-b14a-5c838f8013a5
ERROR: Daemon not found: iscsi.cxcto-c240-j27-04.lgqtxo. See `cephadm ls`

When I would do ‘cephadm ls’ I only saw services running locally on that server, not the whole cluster. I’m not sure if this is expected or not. I installed cephadm on the cxcto-c240-j27-04 server and issued the command and it worked. It looks like when I did this, suddenly the containers on the other two servers that were not supposed to be running the iscsi gateway were removed and everything appeared to be back to normal. I then added back one server to the yaml file and applied it on the original bootstrap node and it got deployed properly, so it appears that everything is working again. Somehow deleting that daemon on the 04 server got everything working again.

Still not exactly sure why that fixed it, but at least it’s working again. Thanks for the suggestion.

-Paul


On Sep 8, 2021, at 4:12 AM, Eugen Block <eblock@xxxxxx> wrote:

If you only configured 1 iscsi gw but you see 3 running, have you tried to destroy them with 'cephadm rm-daemon --name ...'? On the active MGR host run 'journalctl -f' and you'll see plenty of information, it should also contain information about the iscsi deployment. Or run 'cephadm logs --name <iscsi-gw>'.


Zitat von "Paul Giralt (pgiralt)" <pgiralt@xxxxxxxxx>:

This was working until recently and now seems to have stopped working. Running Pacific 16.2.5. When I modify the deployment YAML file for my iscsi gateways, the services are not being added or removed as requested. It’s as if the state is “stuck”.

At one point I had 4 iSCSI gateways: 02, 03, 04 and 05. Through some back and forth of deploying and undeploying, I ended up in a state where the services are running on servers 02, 03, and 05 no matter what I tell cephadm to do. For example, right now I have the following configuration:

service_type: iscsi
service_id: iscsi
placement:
 hosts:
   - cxcto-c240-j27-03.cisco.com
spec:
 pool: iscsi-config
… removed the rest of this file ….

However ceph orch ls shows this:

[root@cxcto-c240-j27-01 ~]# ceph orch ls
NAME PORTS RUNNING REFRESHED AGE PLACEMENT alertmanager ?:9093,9094 1/1 9m ago 3M count:1
crash                                             15/15  10m ago    3M   *
grafana ?:3000 1/1 9m ago 3M count:1 iscsi.iscsi 3/1 10m ago 11m cxcto-c240-j27-03.cisco.com mgr 2/2 9m ago 3M count:2 mon 5/5 9m ago 12d cxcto-c240-j27-01.cisco.com;cxcto-c240-j27-06.cisco.com;cxcto-c240-j27-08.cisco.com;cxcto-c240-j27-10.cisco.com;cxcto-c240-j27-12.cisco.com
node-exporter                      ?:9100         15/15  10m ago    3M   *
osd.dashboard-admin-1622750977792                  0/15  -          3M   *
osd.dashboard-admin-1622751032319               326/341  10m ago    3M   *
prometheus ?:9095 1/1 9m ago 3M count:1

Notice it shows 3/1 because the service is still running on 3 servers even though I’ve told it to only run on one. If I configure all 4 servers and apply (ceph orch apply) then I end up with 3/4 because server 04 never deploys. It’s as if something is “stuck”.

Any ideas where to look / log files that might help figure out what’s happening?

-Paul

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux