Hi David,
Normally cephadm connection issue are not that difficult to solve. It is
just the matter of having the appropriate SSH configuration in the root
account. Mainly the public key used by cephadm (extracted with the
command you used in a shell) added in the root account .ssh/authorized_keys.
Normally `ceph -s` should report that there is a cephadm SSH problem and
'ceph health detail' should tell you what are the hosts having a
problem. To rerun the test (that runs not very frequently), use ' ceph
cephadm check-host host_name".
Good luck!
Michel
Le 02/06/2023 à 20:01, David Barton a écrit :
I am trying to debug an issue with ceph orch host add
Is there a way to debug the specific ssh commands being issued or add
debugging code to a python script?
There is nothing useful in my syslog or /var/log/ceph/cephadm.log
Is there a way to get the command to log, or can someone point me in
the direction of the source code so I can have a look?
I've run tcpdump on port 22 to listen for outgoing packets and also
for traffic going to the target IP, and there is nothing going out
when I run ceph orch host add If I run ssh inside the cephadm shell
then I see the packets go out and it works as I document below.
I was going to upgrade to Quincy from Pacific 16.2.5 and decided to
upgrade from ceph-deploy to cephadm
I initially had problems because I run ssh on a non-standard port.
Allowing port 22 has allowed me to run the command below on every node
except one.
ceph orch host add [short hostname] [ip address]
That one host fails, inexplicably with the error:
Error EINVAL: Failed to connect to cephstorage-rs01 (103.XXX.YY.ZZ).
If I run cephadm shell (without --no-hosts as that gives the error:
unknown flag: --no-hosts) it works as expected.
# cephadm shell
Inferring fsid 525ec8aa-b401-4ddf-aa8f-4493727dac02
Inferring config
/var/lib/ceph/525ec8aa-b401-4ddf-aa8f-4493727dac02/mon.cephstorage-ig03/config
Using recent ceph image
ceph/daemon-base@sha256:a038c6dc35064edff40bb7e824783f1bbd325c888e722ec5e814671406216ad5
root@cephstorage-ig03:/# ceph cephadm get-ssh-config > ssh_config
root@cephstorage-ig03:/# ceph config-key get
mgr/cephadm/ssh_identity_key > ~/cephadm_private_key
root@cephstorage-ig03:/# chmod 0600 ~/cephadm_private_key
root@cephstorage-ig03:/# ssh -F ssh_config -i ~/cephadm_private_key
root@xxxxxxxxxxxxx
Warning: Permanently added '103.XXX.YY.ZZ' (ECDSA) to the list of
known hosts.
Welcome to XXXXX
I was mucking around with custom ssh-config files to get around the
port issue, but it did not seem to work so I and reverted back to the
vanilla version with: ceph cephadm clear-ssh-config
So when I am inside the shell it works, but it doesn't work properly
via ceph orch host add
There is one thing that is unusual that I think is worth mentioning.
When I was adding the servers with custom ssh config files, I had a
bad entry in the hosts file for cephstorage-rs01 on that server,
resolving to 127.0.0.1 When I added it, it said it added the IP as
127.0.0.127# ceph orch host ls
HOST ADDR LABELS STATUS
...
cephstorage-rs01 127.0.0.127 Offline
...
I then ran
ceph orch host rm cephstorage-rs01
I have tried an iptables re-route in the vain idea that if there was
some kind of host to IP cache it would route to localhost and tell me
that the host name didn't match. That did not work.
Right now, I am a sad panda as my ceph cluster is half transitioned.
My next port of call is probably to try and adopt what I can into
cephadm and make sure the cluster is ok, and then finally drop the
problem node and then re-add it.
Any help will be appreciated.
Regards,
David
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx