I am trying to debug an issue with ceph orch host add
Is there a way to debug the specific ssh commands being issued or add
debugging code to a python script?
There is nothing useful in my syslog or /var/log/ceph/cephadm.log
Is there a way to get the command to log, or can someone point me in the
direction of the source code so I can have a look?
I've run tcpdump on port 22 to listen for outgoing packets and also for
traffic going to the target IP, and there is nothing going out when I
run ceph orch host add If I run ssh inside the cephadm shell then I see
the packets go out and it works as I document below.
I was going to upgrade to Quincy from Pacific 16.2.5 and decided to
upgrade from ceph-deploy to cephadm
I initially had problems because I run ssh on a non-standard port.
Allowing port 22 has allowed me to run the command below on every node
except one.
ceph orch host add [short hostname] [ip address]
That one host fails, inexplicably with the error:
Error EINVAL: Failed to connect to cephstorage-rs01 (103.XXX.YY.ZZ).
If I run cephadm shell (without --no-hosts as that gives the error:
unknown flag: --no-hosts) it works as expected.
# cephadm shell
Inferring fsid 525ec8aa-b401-4ddf-aa8f-4493727dac02
Inferring config
/var/lib/ceph/525ec8aa-b401-4ddf-aa8f-4493727dac02/mon.cephstorage-ig03/config
Using recent ceph image
ceph/daemon-base@sha256:a038c6dc35064edff40bb7e824783f1bbd325c888e722ec5e814671406216ad5
root@cephstorage-ig03:/# ceph cephadm get-ssh-config > ssh_config
root@cephstorage-ig03:/# ceph config-key get
mgr/cephadm/ssh_identity_key > ~/cephadm_private_key
root@cephstorage-ig03:/# chmod 0600 ~/cephadm_private_key
root@cephstorage-ig03:/# ssh -F ssh_config -i ~/cephadm_private_key
root@xxxxxxxxxxxxx
Warning: Permanently added '103.XXX.YY.ZZ' (ECDSA) to the list of known
hosts.
Welcome to XXXXX
I was mucking around with custom ssh-config files to get around the port
issue, but it did not seem to work so I and reverted back to the vanilla
version with: ceph cephadm clear-ssh-config
So when I am inside the shell it works, but it doesn't work properly via
ceph orch host add
There is one thing that is unusual that I think is worth mentioning.
When I was adding the servers with custom ssh config files, I had a bad
entry in the hosts file for cephstorage-rs01 on that server, resolving
to 127.0.0.1 When I added it, it said it added the IP as 127.0.0.127#
ceph orch host ls
HOST ADDR LABELS STATUS
...
cephstorage-rs01 127.0.0.127 Offline
...
I then ran
ceph orch host rm cephstorage-rs01
I have tried an iptables re-route in the vain idea that if there was
some kind of host to IP cache it would route to localhost and tell me
that the host name didn't match. That did not work.
Right now, I am a sad panda as my ceph cluster is half transitioned. My
next port of call is probably to try and adopt what I can into cephadm
and make sure the cluster is ok, and then finally drop the problem node
and then re-add it.
Any help will be appreciated.
Regards,
David
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx