Answering to myself... I hesitated to send this email to the list as the
problem didn't seem to be related to Ceph itself but rather a
configuration problem that Ceph was a victim of. I managed to find the
problem: we are using jumbo frames on all servers but the VLAN shared by
the servers and the RGWs is going through an intermediate (campus)
network that doesn't seem to support jumbo frames (we were not aware of
this). The problem was not appearing when using the intranet address
because the Ceph servers don't use jumbo frames on this
network/interface (it is a 1 Gb management network so no point to use
Jumbo frames). I cannot think of anything that Ceph could have mentioned
to help diagnose this.
Best regards,
Michel
Le 25/10/2023 à 14:42, Michel Jouvin a écrit :
Hi,
I'm struggling with a problem to add cephadm some hosts in our Quincy
cluster. "ceph orch host add host addr" fails with the famous "missing
2 required positional arguments: 'hostname' and 'addr'" because of bug
https://tracker.ceph.com/issues/59081 but looking at cephadm messages
with "ceph -W cephadm", I can see:
--------
Log: Opening SSH connection to 10.81.22.183, port 22
[conn=736] Connected to SSH server at 10.81.22.183, port 22
[conn=736] Local address: 10.81.22.151, port 53640
[conn=736] Peer address: 10.81.22.183, port 22
[conn=736] Login timeout expired
[conn=736] Aborting connection
Traceback (most recent call last): (removed)
cephadm.ssh.HostConnectionError: Failed to connect to jc-rgw3
(10.81.22.183). Login timeout expired
Log: Opening SSH connection to 10.81.22.183, port 22
[conn=736] Connected to SSH server at 10.81.22.183, port 22
[conn=736] Local address: 10.81.22.151, port 53640
[conn=736] Peer address: 10.81.22.183, port 22
[conn=736] Login timeout expired
[conn=736] Aborting connection
--------
It is very strange for me because " ssh -i /tmp/cephadm_identity_xxx
10.81.22.183" is working fine |when executed in the active mgr container.
|
|The host I'm trying to add is a RGW that has 3 active network
connections: Ceph public network, our intranet network (used for
managing the server) and the network of the application that will use
the RGW. It seems to be somewhat related to this network configuration
as main cluster servers (MONs, OSDs) which have only the the 2 Ceph
networks and the intranet one don't suffer the same problem. In
particular, what is strange is that I can successfully add the host if
I use its intranet adress rather than the Ceph public network one
(|||10.81.22.183) in the cephadm command.
I have 3 hosts sharing the same network configuration and having the
same problem.
Any hint or suggestion to troubleshoot further this problem would be
highly appreciated!
Best regards,
Michel
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx