Re: Converting to cephadm : Error EINVAL: Failed to connect

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks, Michel.

ceph -s reports it as a stray host (since I haven't been able to add it).

ceph health detail reiterates that it is a stray host

# ceph cephadm check-host cephstorage-rs01
check-host failed:
Host 'cephstorage-rs01' not found. Use 'ceph orch host ls' to see all managed hosts.

I tried running a more complete tcpdump and I can see that when I run

ceph orch host add cephstorage-rs01 103.XXX.YY.ZZ

There are direct connection attempts to port 3300 and 6804 which both succeed.

There are no DNS requests when I run the command (as expected since I provided the IP address).


The rest of my adoption is going smoothly (touch wood), so I think it will be simpler to replace the node.  It's a pity though.

Regards,

David



On 3/6/23 02:19, Michel Jouvin wrote:
Hi David,

Normally cephadm connection issue are not that difficult to solve. It is just the matter of having the appropriate SSH configuration in the root account. Mainly the public key used by cephadm (extracted with the command you used in a shell) added in the root account .ssh/authorized_keys.

Normally `ceph -s` should report that there is a cephadm SSH problem and 'ceph health detail' should tell you what are the hosts having a problem. To rerun the test (that runs not very frequently), use ' ceph cephadm check-host host_name".

Good luck!

Michel

Le 02/06/2023 à 20:01, David Barton a écrit :
I am trying to debug an issue with ceph orch host add

Is there a way to debug the specific ssh commands being issued or add debugging code to a python script?

There is nothing useful in my syslog or /var/log/ceph/cephadm.log

Is there a way to get the command to log, or can someone point me in the direction of the source code so I can have a look?

I've run tcpdump on port 22 to listen for outgoing packets and also for traffic going to the target IP, and there is nothing going out when I run ceph orch host add If I run ssh inside the cephadm shell then I see the packets go out and it works as I document below.


I was going to upgrade to Quincy from Pacific 16.2.5 and decided to upgrade from ceph-deploy to cephadm

I initially had problems because I run ssh on a non-standard port.  Allowing port 22 has allowed me to run the command below on every node except one.

ceph orch host add [short hostname] [ip address]

That one host fails, inexplicably with the error:

Error EINVAL: Failed to connect to cephstorage-rs01 (103.XXX.YY.ZZ).

If I run cephadm shell (without --no-hosts as that gives the error: unknown flag: --no-hosts) it works as expected.

# cephadm shell
Inferring fsid 525ec8aa-b401-4ddf-aa8f-4493727dac02
Inferring config /var/lib/ceph/525ec8aa-b401-4ddf-aa8f-4493727dac02/mon.cephstorage-ig03/config Using recent ceph image ceph/daemon-base@sha256:a038c6dc35064edff40bb7e824783f1bbd325c888e722ec5e814671406216ad5
root@cephstorage-ig03:/# ceph cephadm get-ssh-config > ssh_config
root@cephstorage-ig03:/# ceph config-key get mgr/cephadm/ssh_identity_key > ~/cephadm_private_key
root@cephstorage-ig03:/# chmod 0600 ~/cephadm_private_key
root@cephstorage-ig03:/# ssh -F ssh_config -i ~/cephadm_private_key root@xxxxxxxxxxxxx Warning: Permanently added '103.XXX.YY.ZZ' (ECDSA) to the list of known hosts.
Welcome to XXXXX

I was mucking around with custom ssh-config files to get around the port issue, but it did not seem to work so I and reverted back to the vanilla version with: ceph cephadm clear-ssh-config

So when I am inside the shell it works, but it doesn't work properly via ceph orch host add

There is one thing that is unusual that I think is worth mentioning.  When I was adding the servers with custom ssh config files, I had a bad entry in the hosts file for cephstorage-rs01 on that server, resolving to 127.0.0.1  When I added it, it said it added the IP as 127.0.0.127# ceph orch host ls

HOST              ADDR            LABELS STATUS
...
cephstorage-rs01  127.0.0.127             Offline
...

I then ran

ceph orch host rm cephstorage-rs01

I have tried an iptables re-route in the vain idea that if there was some kind of host to IP cache it would route to localhost and tell me that the host name didn't match.  That did not work.


Right now, I am a sad panda as my ceph cluster is half transitioned.  My next port of call is probably to try and adopt what I can into cephadm and make sure the cluster is ok, and then finally drop the problem node and then re-add it.

Any help will be appreciated.

Regards,

David






_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux