Re: [EXTERNAL] Re: Converting to cephadm : Error EINVAL: Failed to connect

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I usually find the most useful errors for troubleshooting orch/cephadm connection issues in:
ceph log last 50 cephadm

Thank you,
Josh Beaman

From: Michel Jouvin <michel.jouvin@xxxxxxxxxxxxxxx>
Date: Friday, June 2, 2023 at 1:19 PM
To: ceph-users@xxxxxxx <ceph-users@xxxxxxx>
Subject: [EXTERNAL]  Re: Converting to cephadm : Error EINVAL: Failed to connect
Hi David,

Normally cephadm connection issue are not that difficult to solve. It is
just the matter of having the appropriate SSH configuration in the root
account. Mainly the public key used by cephadm (extracted with the
command you used in a shell) added in the root account .ssh/authorized_keys.

Normally `ceph -s` should report that there is a cephadm SSH problem and
'ceph health detail' should tell you what are the hosts having a
problem. To rerun the test (that runs not very frequently), use ' ceph
cephadm check-host host_name".

Good luck!

Michel

Le 02/06/2023 à 20:01, David Barton a écrit :
> I am trying to debug an issue with ceph orch host add
>
> Is there a way to debug the specific ssh commands being issued or add
> debugging code to a python script?
>
> There is nothing useful in my syslog or /var/log/ceph/cephadm.log
>
> Is there a way to get the command to log, or can someone point me in
> the direction of the source code so I can have a look?
>
> I've run tcpdump on port 22 to listen for outgoing packets and also
> for traffic going to the target IP, and there is nothing going out
> when I run ceph orch host add If I run ssh inside the cephadm shell
> then I see the packets go out and it works as I document below.
>
>
> I was going to upgrade to Quincy from Pacific 16.2.5 and decided to
> upgrade from ceph-deploy to cephadm
>
> I initially had problems because I run ssh on a non-standard port.
> Allowing port 22 has allowed me to run the command below on every node
> except one.
>
> ceph orch host add [short hostname] [ip address]
>
> That one host fails, inexplicably with the error:
>
> Error EINVAL: Failed to connect to cephstorage-rs01 (103.XXX.YY.ZZ).
>
> If I run cephadm shell (without --no-hosts as that gives the error:
> unknown flag: --no-hosts) it works as expected.
>
> # cephadm shell
> Inferring fsid 525ec8aa-b401-4ddf-aa8f-4493727dac02
> Inferring config
> /var/lib/ceph/525ec8aa-b401-4ddf-aa8f-4493727dac02/mon.cephstorage-ig03/config
> Using recent ceph image
> ceph/daemon-base@sha256:a038c6dc35064edff40bb7e824783f1bbd325c888e722ec5e814671406216ad5
> root@cephstorage-ig03:/# ceph cephadm get-ssh-config > ssh_config
> root@cephstorage-ig03:/# ceph config-key get
> mgr/cephadm/ssh_identity_key > ~/cephadm_private_key
> root@cephstorage-ig03:/# chmod 0600 ~/cephadm_private_key
> root@cephstorage-ig03:/# ssh -F ssh_config -i ~/cephadm_private_key
> root@xxxxxxxxxxxxx
> Warning: Permanently added '103.XXX.YY.ZZ' (ECDSA) to the list of
> known hosts.
> Welcome to XXXXX
>
> I was mucking around with custom ssh-config files to get around the
> port issue, but it did not seem to work so I and reverted back to the
> vanilla version with: ceph cephadm clear-ssh-config
>
> So when I am inside the shell it works, but it doesn't work properly
> via ceph orch host add
>
> There is one thing that is unusual that I think is worth mentioning.
> When I was adding the servers with custom ssh config files, I had a
> bad entry in the hosts file for cephstorage-rs01 on that server,
> resolving to 127.0.0.1  When I added it, it said it added the IP as
> 127.0.0.127# ceph orch host ls
>
> HOST              ADDR            LABELS STATUS
> ...
> cephstorage-rs01  127.0.0.127             Offline
> ...
>
> I then ran
>
> ceph orch host rm cephstorage-rs01
>
> I have tried an iptables re-route in the vain idea that if there was
> some kind of host to IP cache it would route to localhost and tell me
> that the host name didn't match.  That did not work.
>
>
> Right now, I am a sad panda as my ceph cluster is half transitioned.
> My next port of call is probably to try and adopt what I can into
> cephadm and make sure the cluster is ok, and then finally drop the
> problem node and then re-add it.
>
> Any help will be appreciated.
>
> Regards,
>
> David
>
>
>
>
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux