Re: Converting to cephadm : Error EINVAL: Failed to connect

David Barton <dave@xxxxxxxxxxxx> · Sat, 3 Jun 2023 14:06:17 +0800

Thanks, Michel.

ceph -s reports it as a stray host (since I haven't been able to add it).

ceph health detail reiterates that it is a stray host

# ceph cephadm check-host cephstorage-rs01
check-host failed:
Host 'cephstorage-rs01' not found. Use 'ceph orch host ls' to see all 
managed hosts.

I tried running a more complete tcpdump and I can see that when I run

ceph orch host add cephstorage-rs01 103.XXX.YY.ZZ

There are direct connection attempts to port 3300 and 6804 which both 
succeed.

There are no DNS requests when I run the command (as expected since I 
provided the IP address).

The rest of my adoption is going smoothly (touch wood), so I think it 
will be simpler to replace the node.  It's a pity though.

Regards,

David

On 3/6/23 02:19, Michel Jouvin wrote:
Hi David,

Normally cephadm connection issue are not that difficult to solve. It 
is just the matter of having the appropriate SSH configuration in the 
root account. Mainly the public key used by cephadm (extracted with 
the command you used in a shell) added in the root account 
.ssh/authorized_keys.

Normally `ceph -s` should report that there is a cephadm SSH problem 
and 'ceph health detail' should tell you what are the hosts having a 
problem. To rerun the test (that runs not very frequently), use ' ceph 
cephadm check-host host_name".

Good luck!

Michel

Le 02/06/2023 à 20:01, David Barton a écrit :
I am trying to debug an issue with ceph orch host add

Is there a way to debug the specific ssh commands being issued or add 
debugging code to a python script?

There is nothing useful in my syslog or /var/log/ceph/cephadm.log

Is there a way to get the command to log, or can someone point me in 
the direction of the source code so I can have a look?

I've run tcpdump on port 22 to listen for outgoing packets and also 
for traffic going to the target IP, and there is nothing going out 
when I run ceph orch host add If I run ssh inside the cephadm shell 
then I see the packets go out and it works as I document below.

I was going to upgrade to Quincy from Pacific 16.2.5 and decided to 
upgrade from ceph-deploy to cephadm

I initially had problems because I run ssh on a non-standard port.  
Allowing port 22 has allowed me to run the command below on every 
node except one.

ceph orch host add [short hostname] [ip address]

That one host fails, inexplicably with the error:

Error EINVAL: Failed to connect to cephstorage-rs01 (103.XXX.YY.ZZ).

If I run cephadm shell (without --no-hosts as that gives the error: 
unknown flag: --no-hosts) it works as expected.

# cephadm shell
Inferring fsid 525ec8aa-b401-4ddf-aa8f-4493727dac02
Inferring config 
/var/lib/ceph/525ec8aa-b401-4ddf-aa8f-4493727dac02/mon.cephstorage-ig03/config
Using recent ceph image 
ceph/daemon-base@sha256:a038c6dc35064edff40bb7e824783f1bbd325c888e722ec5e814671406216ad5
root@cephstorage-ig03:/# ceph cephadm get-ssh-config > ssh_config
root@cephstorage-ig03:/# ceph config-key get 
mgr/cephadm/ssh_identity_key > ~/cephadm_private_key
root@cephstorage-ig03:/# chmod 0600 ~/cephadm_private_key
root@cephstorage-ig03:/# ssh -F ssh_config -i ~/cephadm_private_key 
root@xxxxxxxxxxxxx
Warning: Permanently added '103.XXX.YY.ZZ' (ECDSA) to the list of 
known hosts.
Welcome to XXXXX

I was mucking around with custom ssh-config files to get around the 
port issue, but it did not seem to work so I and reverted back to the 
vanilla version with: ceph cephadm clear-ssh-config

So when I am inside the shell it works, but it doesn't work properly 
via ceph orch host add

There is one thing that is unusual that I think is worth mentioning.  
When I was adding the servers with custom ssh config files, I had a 
bad entry in the hosts file for cephstorage-rs01 on that server, 
resolving to 127.0.0.1  When I added it, it said it added the IP as 
127.0.0.127# ceph orch host ls

HOST              ADDR            LABELS STATUS
...
cephstorage-rs01  127.0.0.127             Offline
...

I then ran

ceph orch host rm cephstorage-rs01

I have tried an iptables re-route in the vain idea that if there was 
some kind of host to IP cache it would route to localhost and tell me 
that the host name didn't match.  That did not work.

Right now, I am a sad panda as my ceph cluster is half transitioned.  
My next port of call is probably to try and adopt what I can into 
cephadm and make sure the cluster is ok, and then finally drop the 
problem node and then re-add it.

Any help will be appreciated.

Regards,

David

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx